...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
Table of Contents |
---|
Reading the data
...
...
Expand | ||
---|---|---|
| ||
We have simulated some data from the ALSPAC study (see: Avraam, Wilson and Burton. 2018 for data synthesis details) located in your work folder as |
Expand | ||
---|---|---|
| ||
We have simulated some data from the ALSPAC study (see: Avraam, Wilson and Burton. 2018 for data synthesis details)
The data dictionary for this simulated dataset |
...
is in the panel below. |
Info | ||
---|---|---|
| ||
The names of all other variables end in either .7 or .11 (depending whether they were measured at the age 7 clinic or the age 11 clinic) male codes sex: 1=male, 0=female age.yrs and age.yrs are the age (in decimal years) on the day of the clinic at age 7 or 11 ht is height in cm ht.sit is sitting height in cm ws is waist circumference in cm hp is waist hip circumference in cm wt is weight in Kg sbp is systolic blood pressure (the top of the blood pressure fluctuation) measured (as is conventional) in mm of Hg (mercury) dbp is diastolic blood pressure (the bottom of the blood pressure fluctuation) measured (as is conventional) in mm of Hg (mercury) pulse is pulse rate measured in beats per minute BMI is body mass index derived as wt/(ht/100)2 The height variable is divided by 100 to express it in metres rather than centimeters |
- set the working directory using
setwd
and start a new script and save it as a .R file in an appropriate location - comment in some header information: what is the script for? who is it written by? what data set is being used? etc
- read the dataset into R and assign it the variable
sim.alspac
using theread.csv
function - Look look up the
colnames
function in the help file and apply it tosim.alspac
to list all the column headings in the data. - Look look up the
dim
function in the help file and apply it to tosim.alspac
to get the dimensions of the dataset. Number of columns is the number of variables, number of rows is the number of participants.
...
Selecting and
...
Descriptive / summary stats in R
contingency table (a summary table of 3+ variables) gender, age, BMI
table( )
ftable( )
summary stats mean, min, max and quantiles
summary()
histogram - to identify types of distributions of a variable
hist()
box and whisker plot summarizes graphically the min, max, 25-75 percentiles
boxplot()
Rounding numbers
signif(x, digits = 6)
# set how many significant figures using digits =
or use
format(round(x, 2), nsmall = 2)
# for two d.p
Adding text to graphs
text(70,12, labels=paste("y=", RegM11$coefficients[2], "+", RegM11$coefficients[1]), col="orange")
...
subsetting
Selecting variables can be done a number of ways including selection by column number or column name. It is best practice to use the column name as the column number may vary between datasets.
Code Block | ||
---|---|---|
| ||
select.1<-dataframe[,x] #assign the variable select1 column number x in dataframe
select.2<-dataframe[,"x"] #assign the variable select2 column named x in dataframe
select.3<-dataframe$x #assign the variable select3 dataframe column x |
It is also possible to use operators to subset between a range of values. See the help file for the subset
function for further explanation
Code Block | ||
---|---|---|
| ||
subset.4<-subset(dataframe, x < 5) #subset of the whole dataframe where x < 5
subset.4<-subset(dataframe, x == 5) #subset of the whole dataframe where x = 5 |
- create a subset of
sim.alspac
for males calledsubset.male
and for females calledsubset.female
- How many participants are female and how many are male? HINT: Use
dim
to check the dimensions ofsubset.male
andsubset.female
.
Exploring the data
- Get object summary statistics by using the
summary
function onsubset.male
andsubset.female
- Use the
boxplot
function to plot BMI at age 7 against gender. HINT: You will only need to use the argumentsformula=
anddata=
- Output your boxplot as a .png file using the
png
function. - Use the
hist
function to plot histograms of BMI age 7 for females and males. HINT: You can layer graphs over one another by using the argumentadd=T
in the second histogram. Line colour of the histogram can be set using the argument e.g.border="red"
- Make the plot more readable by using the
legend
to add an appropriate key. - Output your histogram as a .png file using the
png
function. - Use the
plot
function to create a scatter plot of height and weight age 7 for males. - Use
lm
function to generate a linear model calledlm1
for the two variables. HINT: R uses formula notation in formula argument e.g.formula=y~x
- Use the
summary
function on lm1 to get the coefficients. - You can add your regression line to the scatterplot by running the
abline
function on lm1 after yourplot
function
Modelling
- Apply a generalised linear model (glm) using the
glm
function to investigate the relationships between the variables
Info | ||
---|---|---|
| ||
Your R script should be similar to the example R answer script. Try uploading your own dataset and repeat the practical. |