Having opened the training Opal VMs, it is necessary to leave them running for a couple of minutes before trying to login. If you do not wait long enough you will find
When the training Opal servers are ready the login window appears very rapidly. Once you have entered the user name and password the Opal web interface appears almost immediately. |
DataSHIELD Training Environment | Description | IP Address |
---|---|---|
VMs | DataSHIELD training Opal on your local machine | |
Cloud | DataSHIELD training Opal in the cloud | Ask your trainer. |
Training Environment | Default Username | Default Password |
---|---|---|
VMs | administrator | datashield_test& |
Cloud | administrator | datashield_test& |
Simulated datasets are provided in the training Opal servers. However, users may wish to conduct analysis on their own simulated data. The instructions on this page will allow you to upload your own simulated data to your training Opal servers. You can split one simulated dataset into two - and upload half the dataset onto one Opal server and the rest on the second server. |
Opal servers require that a formal data dictionary is specified and uploaded in order that a data set can be properly imported. |
Opal accepts various formats but in this tutorial a .csv
is used for the data file and Microsoft Excel
for the data dictionary file (.xls or .xlsx).
Variable name | Description | Categorical |
---|---|---|
MALE | codes sex | 1 = male 0 = female |
AGE_YEARS | age in decimal years on the day of the clinic | |
HEIGHT | height age 7 (cm) | |
HEIGHT_SIT | sitting height age 7 (cm) | |
WAIST | waist circumference age 7 (cm) | |
HIP | hip circumference age 7 (cm) | |
WEIGHT | weight age 7 (Kg) | |
SBP | systolic blood pressure age 7 (top of the blood pressure fluctuation) (mm of Hg) | |
DPB | diastolic blood pressure age 7 (bottom of the blood pressure fluctuation) (mm of Hg) | |
PULSE | pulse rate age 7 (beats per minute) | |
BMI | Body Mass Index derived as wt/(ht/100)2 The height variable is divided by 100 to express it in metres rather than centimeters |
The data dictionary file requires formatting as an Excel spreadsheet (.xls or .xlsx) with two tabs Variables
and Categories
.
The image below shows the variables tab for the simulated dataset CNSIM used within the v4 Tutorial for DataSHIELD users.
The table below summarises the column names in the variables tab, including examples from the test data built into the training environment in the spreadsheet image above.
Column Names | Description | Default value | Example value in the test data | Notes | |
---|---|---|---|---|---|
table | the table name the variable will be added to | Table | Column A (CNSIM) | This is the table name you refer to in your DataSHIELD login details.
| |
name | the variable name | Column B (e.g. LAB_TSC) | Mandatory field. | ||
valueType | the value type of the variable | text | Column C (e.g. decimal, integer) | ||
entityType | Opal can store data on different entities | Participant | Column D (e.g. Participant) | Examples: Participant (each row corresponds to a different participant), Instrument, Area, Drug | |
referencedEntityType | if the variable values are entity identifiers, this is the type of the entities that are referenced | Column E | Can be left blank | ||
mimeType | the mime type of the variable to help applications to display documents | Column F | Examples: image/jpeg, application/excel. Can be left blank | ||
unit | the unit in which variables are expressed | Column G (e.g. Participant) | Examples: cm, kg, ml etc. Can be left blank | ||
repeatable | repeatable measurements | 0 | Column H (0) | 1 if repeatable, 0 if not (e.g. Three measures of blood pressure) | |
occurrenceGroup | name of a repeatable variable group | Column I | Example: [ | ||
label:en | label of the variable. | Column J | Can be localized by language e.g. label:en in english, label:fr for french) | ||
alias | Alternative name for the variable, usually used for defining a shorter name for the variable | Column K |
Edit the variable
tab of your data dictionary template to reflect the variables in alspacsim.csv (or your own data).
The image below shows the categories tab for the simulated dataset CNSIM used within the v4 Tutorial for DataSHIELD users. Each category for each variable is represented by a single row in the spreadsheet. For example, in the dictionary file below, 3 rows (rows 12-14 inclusive) are for PM_BMI_CATEGORICAL as it has 3 categories.
The table below summarises the column names in the categories tab, including examples from the simulated datasets built into the DataSHIELD training environments in the spreadsheet image above.
Column Names | Description | Default value | Example value in the test data | Notes | |
---|---|---|---|---|---|
table | the table name the variable will be added to | Table | Column A (CNSIM) | This is the table name you refer to in your DataSHIELD login details.
| |
variable | the variable name (mandatory field) | Column B (e.g. DIS_CVA) | mandatory field. One row per category for each variable. | ||
name | the variable category | integer | Column C (e.g. 1) | mandatory field. One row per category for each variable | |
code | can be left blank | Column D | Can be left blank | ||
missing | Some categories are interpreted as missing answers (e.g. 'Don't know', 'Prefer not to answer'). | 0 | Column E | Use 1 for missing and 0 for not missing (normal answer). | |
label:en | label of the variable category | Column F | Human readable text description of the category. Can be localized by language e.g. label:en in english, label:fr for french) |
categories
tab of your data dictionary template to reflect the variables in alspacsim.csv (or your own data). .csv
(comma delimited) file,,
Missing values can not be represented as white space or NA |
Data are held in Opal in what is called a
Opal holds all relevant data tables in a |
The first step to uploading your data to Opal is to indicate within which project you want to site the new data table. This may either be an existing project or a new one:
Project
tab in the top left (after clicking it appears in green on the dark blue horizontal bar) Add Project
.Fill in the details of your project:
In DataSHIELD in order to refer uniquely to a table held in Opal you must specify both the Opal |
To make data available in Opal, you need to upload the data dictionary (.xls) and the data (.csv) files you have created:
To upload data files from your local computer, click on Dashboard
from the top menu bar (the word changes to green)
Manage Files
from the left hand menuOpal Home folder
. This is where your dictionary and data files will be saved. If you want to save the data in a project-specific directory (which is often recommended) then click You can also create a new folder by navigating to wherever you want the new folder to be (e.g. you may want to navigate to the Project folder and then create a new subfolder within that Project folder). Once you are there click If, when you have finalised where you want to keep the data, you find that the .xls and/or .csv file already exist in that location, you need to decide whether the pre-existing files are current or whether you need to over-write them with the up-to-date version(s). Rather than making a default assumption about this, Opal explicitly asks you what decision you would like to take. |
To upload the data dictionary (.xls file) and the data file (.csv file):
Upload
button from the top tabChoose file
open
(you can simply double-click the file in Windows)upload
at the bottom of the windowYour data dictionary (.xls) and data (.csv) files should now be uploaded into Opal. However, at present they simply exist as stored files, the data cannot be used in Opal until you have converted them into an Opal data table
.
Projects
tab from the top menu (it will turn green) and click on your project name (CNSIM
in the example below)+Add Table
button that sits above the list of tables in the project you have specifiedAdd/update tables from dictionary ...
from the drop down menuBrowse.
HOMES
and SYSTEM
on your left menu to navigate to the folder that holds the .xls data dictionary fileSelect
button towards the bottom rightNext
buttonThe table name must match the first column of the two tabs in the .xls data dictionary file. |
Finish
button from the bottom rightProjects
tab from the top menu and click on your project name (CNSIM
in this particular example)Entities
(indicating it is empty)Import
button from the tabs above the table. This opens a window to define file format.CSV
. If it does not, choose CSV fromthe drop down menu.Next
button to open the Import Data
window.Browse
button. Use the left menu to navigate to the folder that holds the required file.Select
button on the bottom rightAs soon as you start typing the table name, Opal will list you all of the valid Opal tables it currently holds, you only have to click on the correct one rather than typing the whole name. |
Variables
tab in the data dictionary .xls file. The default is Participant
. You can use Entity Type: Participant in the data dictionary column D even if, as in some survival models, each row in the data set corresponds to something other than a single participant. |
Next>
button to open up the Configure Data Import
window which you can ignore. Next>
button to open up the Review and select the data dictionaries you wish to import
window. Next>
button to open up the Review the data that will be imported window
. This shows you the first few rows and columns of the data in the .csv file you selected to read in. You can navigate the review data table by clicking the green < and > buttons on the header line. If you want to go down or up the file looking for rows below or above what you can see, use the white on grey DVD-like buttons above the table. If you hover the cursor over each button an explanation will appear. |
Next>
button to bring up the data table location window. If you have started this process by creating the table in the folder you wanted it in, then you can simply click leave it where it is
move it to another folder
then click the grey Browse
button and navigate to the folder you want to use.Finish
button from the bottom.The data (.csv) file now populates the Opal data table which may take several minutes. If it is successful, when you navigate to the table has been saved, you will find Entities
is no longer 0, but equal to the number of rows of data that have been imported.
If the Common reasons for import failure include specifying a table name that is different to that held in the first column of both tabs of the .xls dictionary file. |
Your data has now successfully been uploaded into an Opal server. You will need to repeat the process for each Opal server you wish to use. To start using the DataSHIELD training environment sit our Tutorial for DataSHIELD users using your own data. The tutorial teaches you the basics of DataSHIELD including how to:
Assistance with DataSHIELD can be found:
|
It is simple to delete a file once it has been uploaded to Opal, you can practice by selecting the alspacsim.csv file (or your own data file) you have just uploaded.
Rubbish Bin
icon from the top tab (the farthest right icon above the list of file names in the folder)Projects
tab from the top menu (it will turn green) and click on your project name (CNSIM
in the example below)Export
button (top right)You can choose to export as a .csv file or as a compressed Opal archive file.
The .csv is just the data file. The Opal archive file is a .zip file containing the data file (.csv) and the data dictionary for that file all ready formatted to be imported into Opal. |
User
> export
and the archived file will be in this folder where you can download it for secure storage outside of the system should you require it.You can also use the Opal API to upload and manage data. See: Importing data into Opal with the API |
You can manage and update DataSHIELD packages and functions though the Opal Management Interface.
Administration
in the top right of the horizontal menu barDataSHIELD
located in the Data Analysis column on the right. The following DataSHIELD R packages are installed by default in the DataSHIELD training environments.
To install any of the existing DataSHIELD packages you will first need to remove one or more DataSHIELD packages by clicking the |
Add Package
and select install a specific DataSHIELD package
. Type the name of the package. In our example will use dsBetaTest
to install new DataSHIELD functions currently in beta test. Click Install
.Functions in dsBetaTest have not been fully audited for non-disclosure. They are functions that have been newly developed but not fully tested. Following testing, functions in dsBetaTest will be released in one of the standard DataSHIELD packages. |
Publish methods
for the package and click yes to confirm. dsBase
by clicking the remove
button adjacent to each package. Confirm the removal by selecting yes
dsBase
and click on the repository named dsBase
. Add Package
and select install a specific DataSHIELD package
. Type the name of the package e.g dsBase.+ Advanced Options
and type the Git reference into the box. Click Install
.By installing packages from a Git reference it is possible to roll back to a previous version of a DataSHIELD. If you are a DataSHIELD developer, it is possible to install development branches in this way. |
To install all the DataSHIELD packages, click on the button |
remove
button adjacent to each package. Confirm the removal by selecting yes
Add Package
and select install a specific DataSHIELD package
. Type the name of the package and click Install
. Publish methods
for each package.DataSHIELD privacy levels are set in Opal and correspond to the minimum cell count for calculations. By default the DataSHIELD privacy level is set to 5, returning no results if data from <5 participants has been used for the calculation as the result may potentially be disclosive. DataSHIELD privacy level is applied to all tables held on the Opal. |
Options
. By default it is set to 5. ################################################################################ # 1. build your login in data frame. ################################################################################ server <- c("name-of-server", "name-of-server") url <- c("http://XXX.XXX.X.XXX:8080", "http://XXX.XXX.X.XXX:8080") user <- "administrator" password <- "datashield_test&" table <- c("CNSIM.CNSIM1","CNSIM.CNSIM2") my_logindata <- data.frame(server,url,user,password,table) ################################################################################ # 2. Load the DataSHIELD Client Libraries ################################################################################ library(opal) library(dsBaseClient) library(dsStatsClient) library(dsGraphicsClient) library(dsModellingClient) ################################################################################ # 3. Login to DataSHIELD ################################################################################ opals <- datashield.login(logins=my_logindata,assign=TRUE) |
ds.table2D("D$GENDER","D$DIS_CVA") |
datashield.logout(opals) |
################################################################################ # 1. build your login in data frame. ################################################################################ server <- c("name-of-server", "name-of-server") url <- c("http://XXX.XXX.X.XXX:8080", "http://XXX.XXX.X.XXX:8080") user <- "administrator" password <- "datashield_test&" table <- c("CNSIM.CNSIM1","CNSIM.CNSIM2") my_logindata <- data.frame(server,url,user,password,table) ################################################################################ # 2. Load the DataSHIELD Client Libraries ################################################################################ library(opal) library(dsBaseClient) library(dsStatsClient) library(dsGraphicsClient) library(dsModellingClient) ################################################################################ # 3. Login to DataSHIELD ################################################################################ opals <- datashield.login(logins=my_logindata,assign=TRUE) |
datashield.logout
function to disconnect from the Opal server.The DataSHIELD privacy level is applied to all tables held on the Opal. Should a study belong to multiple consortia requiring different privacy levels, it is recommended the data tables be held in a separate Opal instance. |
DataSHIELD news and support is available by the DataSHIELD community in the DataSHIELD forum. Tailored support and training in DataSHIELD is provided on a fee basis, please email us with your enquiry.
Opal is supported by the software creators at Obiba. Opal support is available on the Obiba-users mailing list, where support questions can be posted for free. Opal general enquiries can be sent to info@obiba.org.