...
Tip |
---|
title | DataSHIELD cloud IP addresses |
---|
|
The DataSHIELD cloud training environment does not use fixed IP addresses, the client and opal training server addresses change each training session. As part of the user tutorial you learn how to build a DataSHIELD login dataframe. In a real world instance of DataSHIELD this is populated with secure certificates not text based usernames and passwords. |
Login Log in Dataframe
...
The login dataframe is an R object that is created to store all of the login information necessary to access a DataSHIELD server, and save it for future logins, without having to gather the information each time. It is done by using a dedicated DataSHIELD function in the DSI package. It is then assigned to a local object, in the case below called "logindata", to be passed into the function for logging in to servers.
Code Block |
---|
language | xml |
---|
title | Build your login dataframe |
---|
|
builder <- DSI::newDSLoginBuilder()
builder$append(server = "study1server1", url = "http://192.168.56.100:8080/",
user = "administrator", password = "datashield_test&", table = "CNSIM.CNSIM1", driver = "OpalDriver")
builder$append(server = "study2server2", url = "http://192.168.56.101:8080/",
user = "administrator", password = "datashield_test&",
table = "CNSIM.CNSIM2", driver = "OpalDriver")
logindata <- builder$build() |
Log
...
in Command
Assign to a local object called "connections" the DSI
function to log into in to the desired Opal servers. In the DataSHIELD test environment logindata
is our login dataframe for the Opal training servers.
Code Block |
---|
|
connections <- DSI::datashield.login(logins = logindata, assign = TRUE, symbol = "D")
|
The output below indicates that each of the two Opal training servers dstutorial-100 and dstutorial-101 contain the same 11 variables listed in capital letters under Variables assigned
:
Code Block |
---|
|
Logging into the collaborating servers
Logged in all servers [================================================================] 100% /14s
No variables have been specified.
All the variables in the table
(the whole dataset) will be assigned to R!
Assigning table data...
Assigned all tables [==================================================================] 100% /13s
Variables assigned:
study1 -- LAB_TSC, LAB_TRIG, LAB_HDL, LAB_GLUC_ADJUSTED, PM_BMI_CONTINUOUS, DIS_CVA, MEDI_LPD, DIS_DIAB, DIS_AMI, GENDER, PM_BMI_CATEGORICAL
study2 -- LAB_TSC, LAB_TRIG, LAB_HDL, LAB_GLUC_ADJUSTED, PM_BMI_CONTINUOUS, DIS_CVA, MEDI_LPD, DIS_DIAB, DIS_AMI, GENDER, PM_BMI_CATEGORICAL |
Code Block |
---|
|
DSI::datashield.logout(connections) |
Note |
---|
In Horizontal DataSHIELD pooled analysis, the data are harmonized and the variables given the same names across the studies, as agreed by all data providers. |
Tip |
---|
title | How datashield.login works |
---|
|
The datashield.login function from the R package opal allows users to login and assign data to analyse from the Opal server in a server-side R session created behind the firewall of the data provider. All the commands sent after login are processed within the server-side R instance only allows a specific set of commands to run (see the details of a typical horizontal DataSHIELD process). The server-side R session is wiped after logging out. If we do not specify individual variables to assign to the server-side R session, all variables held in the Opal servers are assigned. Assigned data are kept in a data frame named D DST by default. Each column of that data frame represents one variable and the rows are the individual records. |
Assign tables command
Finally, after successfully making a connection with the server, you must specify which studies, stored in tables, you wish to connect to. This is done with another of the DSI package functions, "datashield.assign.table".
Code Block |
---|
|
DSI::datashield.assign.table(conns = connections, symbol = "DST", table = c("CNSIM.CNSIM1","CNSIM.CNSIM2")) |
- The "conns" argument is to create a name for a DSConnection-class object, which will be used by statistical commands to refer to particular studies.
- The "symbol" argument is to create a name by which to refer to the dataframes in each study. Here we have opted for "DST" , an initialism for "DataSHIELD Table"
Info |
---|
- (you may have seen "D" for "Dataframe" being used historically, but we are now phasing this out as it sometimes causes problems with other functions beginning with "D")
|
- the "table" argument is to specify the names of the tables you wish to connect to as they appear on the server you are on. The structure, "AAAA.BBBB", "AAAA.CCCC" means that within project AAAA there are tables BBBB and CCCC which we connect to both of, by listing them in an R vector.
Note |
---|
In Horizontal DataSHIELD pooled analysis, the data are harmonized and the variables given the same names across the studies, as agreed by all data providers. |
An example of the printout after the login process has finished:
Code Block |
---|
|
Assigned all table (DST <- ...) [======================================================] 100% /25s |
Tip |
---|
title | How datashield.login works |
---|
|
At this point, you are logged in and ready to proceed! However, let's quickly review some other tips and tricks about using the login dataframe. |
Command to logout:
You should get into the habit of putting this command at the end of your scripts, and running it after you are finished. It is particularly important to do so when connecting to shared DataSHIELD servers, to save resources for the analyses of others.
Code Block |
---|
|
DSI::datashield.logout(connections) |
In a later tutorial in this series, you will find the option of saving your workspace before logging out, to be able to log in another day and have all your variables intact and ready to go without having to run everything again!
Anchor |
---|
| assign_variables |
---|
| assign_variables |
---|
|
...
The other parts in this DataSHIELD tutorial series are:
...