Reading the data

Download the D2K simulated ALSPAC dataset.

Read the data dictionary for this simulated dataset to familiarise yourself with the variables.

Information: the data dictionary

The names of all other variables end in either .7 or .11 (depending whether they were measured at the age 7 clinic or the age 11 clinic)

male codes sex: 1=male, 0=female

age.yrs and age.yrs are the age (in decimal years) on the day of the clinic at age 7 or 11

ht is height in cm

ht.sit is sitting height in cm

ws is waist circumference in cm

hp is waist circumference in cm

wt is weight in Kg

sbp is systolic blood pressure (the top of the blood pressure fluctuation) measured (as is conventional) in mm of Hg (mercury)

dbp is diastolic blood pressure (the bottom of the blood pressure fluctuation) measured (as is conventional) in mm of Hg (mercury)

pulse is pulse rate measured in beats per minute

BMI is body mass index derived as wt/(ht/100)² The height variable is divided by 100 to express it in metres rather than centimeters

start a new script and save it as a .R file in an appropriate location
comment in some header information: what is the script for? who is it written by? what data set is being used? etc
set the working directory using setwd and read the dataset into R and assign it the variable sim.alspac using the read.csv function
look up the colnames function in the help file and apply it to sim.alspac to list all the column headings in the data.
look up the dim function in the help file and apply it to to sim.alspac to get the dimensions of the dataset. Number of columns is the number of variables, number of rows is the number of participants.

Selecting and subsetting

Selecting variables can be done a number of ways including selection by column number or column name. It is best practice to use the column name as the column number may vary between datasets.

subset.1<-dataframe[,x] #assign the variable subset1 column number x in dataframe
subset.2<-dataframe[,"x"] #assign the variable subset2 column named x in dataframe
subset.3<-dataframe$x #assign the variable subset3 dataframe column x

It is also possible to use operators to subset between a range of values. See the help file for the subset function for further explanation

subset.4<-subset(dataframe, x < 5) #subset of the whole dataframe where x < 5
subset.4<-subset(dataframe, x == 5) #subset of the whole dataframe where x = 5

create a subset of sim.alspac for males called subset.male and for females called subset.female
How many participants are female and how many are male? HINT: Use dim to check the dimensions of subset.male and subset.female.

Changing class

The data dictionary tells us that the variable "male" is categorical. Check the class of sim.alspac$male using the class function.

Tip: dataframe and column notation

Descriptive / summary stats in R

contingency table (a summary table of 3+ variables) gender, age, BMI

table( )

ftable( )

summary stats mean, min, max and quantiles

summary()

histogram - to identify types of distributions of a variable

hist()

box and whisker plot summarizes graphically the min, max, 25-75 percentiles

boxplot()

Rounding numbers

signif(x, digits = 6)

# set how many significant figures using digits =

or use

format(round(x, 2), nsmall = 2)

# for two d.p

Adding text to graphs

text(70,12, labels=paste("y=", RegM11$coefficients[2], "+", RegM11$coefficients[1]), col="orange")

R Practical: Analysing simulated ALSPAC data

Selecting and subsetting

Changing class

Descriptive / summary stats in R

Rounding numbers

Adding text to graphs