Beginners Tutorial (DataSHIELD v6.1)

Welcome to the DataSHIELD beginners tutorial vignette!

Please use the 6 part vignette below to try using DataSHIELD for the first time - you will need:

  • an up to date version of R and RStudio installed on your local machine

  • The rights to install packages on your R session

  • access to an internet connection

You will be running a local DataSHIELD analysis session that connects to the data stored remotely on the “Opal demo” server in France.

Training Vignette:

There are six approachable parts to the training environment of DataSHIELD, designed to last 10-15 minutes each. At the start of every one is a reminder of how to log in (look for the drop down button) in case you wish to start where you left off earlier.

 

Link to R Script containing all code, to download and follow yourself:

DEPRECATED, wait for update

Legacy Training: using a local Virtual Machine running a DataSHIELD server:

Choose the appropriate instructions for the Operating System you run on your computer:

Once the machine is downloaded, installed and launched on your local machine’s operating system, you will need to run a slightly different login script than what is described in the vignette above: click the dropdown to view.

Start your training Virtual Machines

Please follow instructions to Start the Opal VMs.

Recall from the installation instructions, attempting to access the Opal web interface is a simple check to tell if the VMs have started:

Login Dataframe

builder <- DSI::newDSLoginBuilder() builder$append(server = "server1", url = "http://192.168.56.100:8080/", user = "administrator", password = "datashield_test&", driver = "OpalDriver") builder$append(server = "server2", url = "http://192.168.56.101:8080/", user = "administrator", password = "datashield_test&", driver = "OpalDriver") logindata <- builder$build() connections <- DSI::datashield.login(logins = logindata, assign = TRUE)

Beginner FAQs:

  • Where can I find out what data I am looking at? The column names derived from ds.colnames() aren’t very descriptive!

    • This is what the data dictionary is for! In the tutorial for DataSHIELD above, we connect to the CNSIM dataset. The dictionaries are stored in the Opal server (which, for the tutorial, can be accessed at the IP address http://192.168.56.100:8080) , logging in with the username & password (as in the tutorial), and navigating:

      • from the homepage, select “Projects” 3rd from left on the top bar;

      • on the table of projects select the one you are interested in (e.g. CNSIM);

      • on the next page, select any of the studies in the table (usually all the same parameters) (e.g. CNSIM1);

      • on the next page the data dictionary is stored with descriptions of each variable.

  • Why can I not connect to the data after starting my VM?

    • It often takes 2 minutes for your VM to start, longer if you are loading two VMs simultaneously. It also is dependent on your computer’s processor power and the available RAM. Please be patient if it is being slow, and check back after 5 minutes, your VMs should have powered up successfully and be ready for use!

  • A ds._function_ has disappeared! DataSHIELD doesn’t recognise it exists! The help for it won’t load!

    • Try using devtools::check() to force RStudio to recognise it. To do this, you need to have the R Package “Devtools” installed. Instructions for installing it are on the CRAN. The installation does take 15 minutes, beware if you are under time pressure!

 

N.B. This material is kept up to date for the current version of DataSHIELD release (currently v6.1, see here).

 

DataSHIELD Wiki by DataSHIELD is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Based on a work at http://www.datashield.ac.uk/wiki