DataSHIELD Training Part 1: Introduction and logging in


Introduction

This tutorial introduces users to DataSHIELD commands and syntax. Throughout this document we refer to R, but all commands are run in the same way in Rstudio. This tutorial contains a limited number of examples; further examples are available in each DataSHIELD function manual page that can be accessed via the help function.


The DataSHIELD approach: aggregate and assign functions



Start your training Virtual Machines

Please follow instructions to Start the Opal VMs.

Recall from the installation instructions, the Opal web interface:

is a simple check to tell if the VMs have started.

Start R/RStudio

Load Packages

  • The following relevant R packages are required for analysis:

DSI to login and logout.

DSOpal used by DSI to access the Opal server.

dsBaseClient containing all DataSHIELD functions referred to in this tutorial.

  • To load the R packages, type the library function into the command line as given in the example below:

Build your login dataframe 

Login Dataframe

  • Build login dataframe.

Log onto the remote Opal training servers

  • Assign to "connections" the DSI function to log into the desired Opal servers. In the DataSHIELD test environment logindata is our login dataframe for the Opal training servers.
  • The output below indicates that each of the two Opal training servers dstutorial-100 and dstutorial-101 contain the same 11 variables listed in capital letters under Variables assigned:

  • Command to logout:

Assign individual variables on login

Users can specify individual variables to assign to the server-side R session. It is best practice to first create a list of the Opal variables you want to analyse.

  • The example below creates a new variable myvar that lists the Opal variables required for analysis: LAB_HDL and GENDER
  • The variables argument in the function datashield.login uses myvar, which then will call only this list.
  • The example below uses the argument symbol in the datashield.login function to change the name of the data frame from D to mytable

Conclusion

The other parts in this DataSHIELD tutorial series are: