Session 3: DataSHIELD Practical

This extended practical session aims the implementation of some basic DataSHIELD functions and the interpretation of the results, this is possible to do independently having completed the DataSHIELD user tutorial. 

Login into the DASIM data tables

# If they haven't been loaded yet, load all the DataSHIELD libraries.

# build a new dataframe by login to the table "DASIM" which is included in three cloud based Opals
server <- c("study1", "study2", "study3")
url <- c("http://XXXXXX:8080")
logindata <- data.frame(server,url,user="administrator",password="datashield_test&",table)

# login and assign the whole dataset
opals <- datashield.login(logins=logindata,assign=TRUE)


Use functions provided in DataSHIELD packages to solve the following problems:

Subsets and Statistics

Calculate the mean and the variance of the continuous variable BMI of obese males. 

Check the levels for the variables PM_BMI_CATEGORICAL and GENDER using ds.levels(). BMI is categorised in three levels: 1=normal, 2=overweight, 3=obese and gender is categorised in two levels: 0=male, 1=female.

Assign and Plots

  • Find the quantile mean and plot a histogram of pooled data for the exponent and for the logarithm of LAB_HDL measurement.

2-dimensional contingency tables

  • What percentage of females (pooled data) are diabetics?
  • What percentage of males in each study separately have stroke (DIS_CVA)?

Generalized Linear Models

  • Apply a generalised linear model that predicts the level of glucose between males and females. What is the predicted average level of glucose for males? What is this value for females?
  • Apply a GLM to predict the level of glucose using gender and continuous BMI. How much the level of glucose is increasing with the increase of bmi by one unit? What is the predicted glucose level of a female with bmi=22?


When you complete the questions, check your answers and script.