Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Reading the data 

...

Selecting variables can be done a number of ways including selection by column number or column name.  It is best practice to use the column name as the column number may vary between datasets.

Code Block
languagexml
subsetselect.1<-dataframe[,x] #assign the variable subset1select1 column number x in dataframe
subsetselect.2<-dataframe[,"x"] #assign the variable subset2select2 column named x in dataframe
subsetselect.3<-dataframe$x #assign the variable subset3select3 dataframe column x 

It is also possible to use operators to subset between a range of values.  See the help file for the subset function for further explanation

Code Block
languagexml
subset.4<-subset(dataframe, x < 5) #subset of the whole dataframe where x < 5
subset.4<-subset(dataframe, x == 5) #subset of the whole dataframe where x = 5
  • create a subset of sim.alspac for males called subset.male and for females called subset.female
  • How many participants are female and how many are male? HINT: Use dim to check the dimensions of subset.male and subset.female.  

Changing class

  • The data dictionary tells us that the variable "male" variable male is categorical.  Check the class of sim.alspac$male male using the class function.

...

titleTip: dataframe and column notation

...




Descriptive / summary stats in R

  • contingency table (a summary table of 3+ variables) gender, age, BMI

table( )

ftable( )


  • summary stats  mean, min, max and quantiles

summary()


  • histogram - to identify  types of distributions of a variable

hist()


  • box and whisker plot summarizes graphically the min, max, 25-75 percentiles

boxplot()


Rounding numbers

signif(x, digits = 6)

...