Reading the data
...
Selecting variables can be done a number of ways including selection by column number or column name. It is best practice to use the column name as the column number may vary between datasets.
Code Block | ||
---|---|---|
| ||
subsetselect.1<-dataframe[,x] #assign the variable subset1select1 column number x in dataframe subsetselect.2<-dataframe[,"x"] #assign the variable subset2select2 column named x in dataframe subsetselect.3<-dataframe$x #assign the variable subset3select3 dataframe column x |
It is also possible to use operators to subset between a range of values. See the help file for the subset
function for further explanation
Code Block | ||
---|---|---|
| ||
subset.4<-subset(dataframe, x < 5) #subset of the whole dataframe where x < 5 subset.4<-subset(dataframe, x == 5) #subset of the whole dataframe where x = 5 |
- create a subset of
sim.alspac
for males calledsubset.male
and for females calledsubset.female
- How many participants are female and how many are male? HINT: Use
dim
to check the dimensions ofsubset.male
andsubset.female
.
Changing class
- The data dictionary tells us that the variable "male" variable
male
is categorical. Check the class ofsim.alspac$male
male
using theclass
function.
...
title | Tip: dataframe and column notation |
---|
...
Descriptive / summary stats in R
contingency table (a summary table of 3+ variables) gender, age, BMI
table( )
ftable( )
summary stats mean, min, max and quantiles
summary()
histogram - to identify types of distributions of a variable
hist()
box and whisker plot summarizes graphically the min, max, 25-75 percentiles
boxplot()
Rounding numbers
signif(x, digits = 6)
...