...
This tutorial introduces users to DataSHIELD commands and syntax. Throughout this document we refer to R, but all commands are run in the same way in Rstudio. This tutorial contains a limited number of examples; further examples are available in each DataSHIELD function manual page that can be accessed via the help function.
The DataSHIELD approach: aggregate and assign functions
Anchor | ||||
---|---|---|---|---|
|
Tip | ||
---|---|---|
| ||
DataSHIELD commands call functions that range from carrying out pre-requisite tasks such as login to the datasources, to generating basic descriptive statistics, plots and tabulations. More advance functions allow for users to fit generalized linear models and generalized estimating equations models. R can list all functions available in DataSHIELD. This section explains the functions we will call during this tutorial. Although this knowledge is not required to run DataSHIELD analyses it helps to understand the output of the commands. It can explain why some commands call functions that return nothing to the user, but rather store the output on the server of the data provider for use in a second function. In DataSHIELD the person running an analysis (the client) uses client-side functions to issue commands (instructions). These commands initiate the execution (running) of server-side functions that run the analysis server-side (behind the firewall of the data provider). There are two types of server-side function: assign functions and aggregate functions. Assign functions do not return an output to the client, with the exception of error or status messages. Assign functions create new objects and store them server-side either because the objects are potentially disclosive, or because they consist of the individual-level data which, in DataSHIELD, is never seen by the analyst. These new objects can include:
Assign functions return no output to the client except to indicate an error or useful messages about the object store on server-side. Aggregate functions analyse the data server-side and return an output in the form of aggregate data (summary statistics that are not disclosive) to the client. The help page for each function tells us what is returned and when not to expect an output on client-side. |
...
Please follow instructions to Start the Opal VMs.
Recall from the installation instructions, the Opal web interface:
is a simple check to tell if the VMs have started.
...
Start R/RStudio
...
|
Start R/RStudio
Start R, RGui, or RStudio, which you will be using for this analysis training exercise.
Install Packages
The following relevant R packages are required for analysis:
- DSI to login and logout.
- DSOpal used by DSI to access the Opal server.
- dsBaseClient containing all DataSHIELD functions referred to in this tutorial.
Code Block |
---|
install.packages('DSI') install.packages('DSOpal', dependencies=TRUE) install.packages('dsBaseClient', repos=c(getOption('repos'), 'http://cran.datashield.org'), dependencies=TRUE) |
Load Packages
To load the R packages, type the library
function into the command line as given in the example below:
Code Block | ||||
---|---|---|---|---|
| ||||
#load libraries library(DSI) library(DSOpal) library(dsBaseClient) |
...
Tip | ||
---|---|---|
| ||
The DataSHIELD cloud training environment does not use fixed IP addresses, the client and opal training server addresses change each training session. As part of the user tutorial you learn how to build a DataSHIELD login dataframe. In a real world instance of DataSHIELD this is populated with secure certificates not text based usernames and passwords. |
...
The login dataframe is an R object that is created to store all of the login information necessary to access a DataSHIELD server, and save it (as an R script) for future logins, without having to gather the information each time. It is done by using DataSHIELD functions from the DSI package. It is then assigned to a local object, in the case below called "logindata", to be passed into the function for logging in to servers.
Code Block | ||||
---|---|---|---|---|
| ||||
# Build your login dataframe builder <- DSI::newDSLoginBuilder() builder$append(server = "server1", url = "httphttps://192opal-demo.168.56.100:8080obiba.org/", user = "administratordsuser", password = "datashield_test&P@ssw0rd", driver = "OpalDriver", options='list(ssl_verifyhost=0, ssl_verifypeer=0)') builder$append(server = "server2", url = "httphttps://192opal-demo.168.56.101:8080obiba.org/", user = "administratordsuser", password = "datashield_test&P@ssw0rd", driver = "OpalDriver", options='list(ssl_verifyhost=0, ssl_verifypeer=0)') logindata <- builder$build() |
Login Command
Assign to a local object called "connections" the DSI function to log in to the desired Opal servers. In the DataSHIELD test environment logindata
is our login dataframe for the Opal training servers.
Code Block | ||||
---|---|---|---|---|
| ||||
connections <- DSI::datashield.login(logins = logindata, assign = TRUE) |
The output below indicates that each of the two Opal training servers
...
"server1" and "server2" contain the same 11 variables listed in capital letters under Variables assigned
:
Code Block | ||||
---|---|---|---|---|
| ||||
Logging into the collaborating servers Logged in all servers [================================================================] 100% /24s |
...
Expand | |||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||||||||||||||||||||
Assign individual variables on loginUsers can specify individual variables to assign to the server-side R session. It is best practice to first create a list of the Opal variables you want to analyse.
|
...
The other parts in this DataSHIELD tutorial series are:
6: Modelling
Tip |
---|
Also remember you can:
|
...