Opal management
Logging onto the Opal management interface
- If using the cloud training environment your trainer will have switched the training Opal servers on.
- If using local VMs Start the Opal servers
Starting the Opal training VMs
Having opened the training Opal VMs, it is necessary to leave them running for a couple of minutes before trying to login. If you do not wait long enough you will find
- you may not be given the login window or
- after entering username and password the window just hangs
When the training Opal servers are ready the login window appears very rapidly. Once you have entered the user name and password the Opal web interface appears almost immediately.
- Your opal servers will sit at an IP address on port :8080 (http) or :8443 (https)
DataSHIELD Training Environment | Description | IP Address |
---|---|---|
VMs | DataSHIELD training Opal on your local machine | |
Cloud | DataSHIELD training Opal in the cloud | Ask your trainer. |
- Navigate to the address in your web browser
- Enter the username and password - default training details given below
Training Environment | Default Username | Default Password |
---|---|---|
VMs | administrator | datashield_test& |
Cloud | administrator | datashield_test& |
Managing data in Opal
Simulated datasets are provided in the training Opal servers. However, users may wish to conduct analysis on their own simulated data. The instructions on this page will allow you to upload your own simulated data to your training Opal servers. You can split one simulated dataset into two - and upload half the dataset onto one Opal server and the rest on the second server.
Preparing your data for Opal
Opal servers require that a formal data dictionary is specified and uploaded in order that a data set can be properly imported.
Opal accepts various formats but in this tutorial a .csv
is used for the data file and Microsoft Excel
for the data dictionary file (.xls or .xlsx).
The data dictionary file
- Download the sample dataset alspacsim.csv or use your own data table in .csv format. This is a simulated dataset from the ALSPAC clinic at participant age 7.
Variable name | Description | Categorical |
---|---|---|
MALE | codes sex | 1 = male 0 = female |
AGE_YEARS | age in decimal years on the day of the clinic | |
HEIGHT | height age 7 (cm) | |
HEIGHT_SIT | sitting height age 7 (cm) | |
WAIST | waist circumference age 7 (cm) | |
HIP | hip circumference age 7 (cm) | |
WEIGHT | weight age 7 (Kg) | |
SBP | systolic blood pressure age 7 (top of the blood pressure fluctuation) (mm of Hg) | |
DPB | diastolic blood pressure age 7 (bottom of the blood pressure fluctuation) (mm of Hg) | |
PULSE | pulse rate age 7 (beats per minute) | |
BMI | Body Mass Index derived as wt/(ht/100)2 The height variable is divided by 100 to express it in metres rather than centimeters |
- Download the Opal-dictionary-template.xls template. You will need to follow the instructions below to compile a data dictionary for the new dataset in order to upload it to Opal.
The data dictionary file requires formatting as an Excel spreadsheet (.xls or .xlsx) with two tabs
Variables
andCategories
.
The variables tab
The image below shows the variables tab for the simulated dataset CNSIM used within the v4 Tutorial for DataSHIELD users.
The table below summarises the column names in the variables tab, including examples from the test data built into the training environment in the spreadsheet image above.
Column Names | Description | Default value | Example value in the test data | Notes |
---|---|---|---|---|
table | the table name the variable will be added to | Table | Column A (CNSIM) | This is the table name you refer to in your DataSHIELD login details. It is critical that the table name appears in every row |
name | the variable name | Column B (e.g. LAB_TSC) | Mandatory field. | |
valueType | the value type of the variable | text | Column C (e.g. decimal, integer) | |
entityType | Opal can store data on different entities | Participant | Column D (e.g. Participant) | Examples: Participant (each row corresponds to a different participant), Instrument, Area, Drug |
referencedEntityType | if the variable values are entity identifiers, this is the type of the entities that are referenced | Column E | Can be left blank | |
mimeType | the mime type of the variable to help applications to display documents | Column F | Examples: image/jpeg, application/excel. Can be left blank | |
unit | the unit in which variables are expressed | Column G (e.g. Participant) | Examples: cm, kg, ml etc. Can be left blank | |
repeatable | repeatable measurements | 0 | Column H (0) | 1 if repeatable, 0 if not (e.g. Three measures of blood pressure) |
occurrenceGroup | name of a repeatable variable group | Column I | Example: [ | |
label:en | label of the variable. | Column J | Can be localized by language e.g. label:en in english, label:fr for french) | |
alias | Alternative name for the variable, usually used for defining a shorter name for the variable | Column K |
Edit the
variable
tab of your data dictionary template to reflect the variables in alspacsim.csv (or your own data).
The categories tab
The image below shows the categories tab for the simulated dataset CNSIM used within the v4 Tutorial for DataSHIELD users. Each category for each variable is represented by a single row in the spreadsheet. For example, in the dictionary file below, 3 rows (rows 12-14 inclusive) are for PM_BMI_CATEGORICAL as it has 3 categories.
The table below summarises the column names in the categories tab, including examples from the simulated datasets built into the DataSHIELD training environments in the spreadsheet image above.
Column Names | Description | Default value | Example value in the test data | Notes |
---|---|---|---|---|
table | the table name the variable will be added to | Table | Column A (CNSIM) | This is the table name you refer to in your DataSHIELD login details. It is critical that the table name appears in every row |
variable | the variable name (mandatory field) | Column B (e.g. DIS_CVA) | mandatory field. One row per category for each variable. | |
name | the variable category | integer | Column C (e.g. 1) | mandatory field. One row per category for each variable |
code | can be left blank | Column D | Can be left blank | |
missing | Some categories are interpreted as missing answers (e.g. 'Don't know', 'Prefer not to answer'). | 0 | Column E | Use 1 for missing and 0 for not missing (normal answer). |
label:en | label of the variable category | Column F | Human readable text description of the category. Can be localized by language e.g. label:en in english, label:fr for french) |
- Edit the
categories
tab of your data dictionary template to reflect the variables in alspacsim.csv (or your own data).
The data file
- In the training Opal servers, the data file is a
.csv
(comma delimited) file - Missing values are represented in the data file as two consecutive commas
,,
Missing values
NA
- There is no column name for the first column. The first column contains the row number for the data. "Participant" or Opal ID number setting in Opal is using the row number.
- See below for an example of what the .csv file looks like when opened in Excel.
Uploading your data and data dictionary files to Opal
Data structure in Opal
Data are held in Opal in what is called a table
.
- each row contains the data for one of the primary units of data collection (perhaps most commonly, a participant)
- each column represents a different variable.
Opal holds all relevant data tables in a project.
Creating a new Opal project
The first step to uploading your data to Opal is to indicate within which project you want to site the new data table. This may either be an existing project or a new one:
- To create a new project click on the
Project
tab in the top left (after clicking it appears in green on the dark blue horizontal bar) - click
Add Project
.
Fill in the details of your project:
- You must specify a name and this will be used to point to the data. For convenience do not use a very long name. The example below shows the name as CNSIM.
- The "title" can then be a longer explanatory label for the table
- The database currently defaults to mongodb but can also be MySQL
- You can give an additional description if you wish
Project and table names
In DataSHIELD in order to refer uniquely to a table held in Opal you must specify both the Opal project
and table
names. For example the table EMISS in the SURVSIM Project is referred to as SURVSIM.EMISS in your DataSHIELD login template. In the case of the training data, it happens to be that both the Project and the Table within it are called CNSIM - and so the table is referred to as CNSIM.CNSIM.
Upload data and data dictionary into Opal
To make data available in Opal, you need to upload the data dictionary (.xls) and the data (.csv) files you have created:
To upload data files from your local computer, click on
Dashboard
from the top menu bar (the word changes to green)- Click
Manage Files
from the left hand menu
- By default, you will see a list of files currently held in your
Opal Home folder
. This is where your dictionary and data files will be saved.
If you want to save the data in a project-specific directory (which is often recommended) then click Projects
from the left hand menu and choose the project you will be working in.
You can also create a new folder by navigating to wherever you want the new folder to be (e.g. you may want to navigate to the Project folder and then create a new subfolder within that Project folder). Once you are there click Add Folder
(button with blue background on top left) and specify the name of the new folder.
If, when you have finalised where you want to keep the data, you find that the .xls and/or .csv file already exist in that location, you need to decide whether the pre-existing files are current or whether you need to over-write them with the up-to-date version(s). Rather than making a default assumption about this, Opal explicitly asks you what decision you would like to take.
To upload the data dictionary (.xls file) and the data file (.csv file):
- Click the grey
Upload
button from the top tab
- This brings up a new window, click
Choose file
- Browse to the .xls file to upload from your your local machine
- Select that file and click
open
(you can simply double-click the file in Windows) - Clicking the dark blue
upload
at the bottom of the window - If that file already exists in that location then it will ask you whether to replace it or not (see above tip)
- Repeat for the .csv (data file)
Creating the Opal table: the data dictionary
Your data dictionary (.xls) and data (.csv) files should now be uploaded into Opal. However, at present they simply exist as stored files, the data cannot be used in Opal until you have converted them into an Opal data table
.
- Click on the
Projects
tab from the top menu (it will turn green) and click on your project name (CNSIM
in the example below) - Click on the large blue
+Add Table
button that sits above the list of tables in the project you have specified - Select
Add/update tables from dictionary ...
from the drop down menu
- Locate the data dictionary (.xls) file you have uploaded by clicking
Browse.
- Use
HOMES
andSYSTEM
on your left menu to navigate to the folder that holds the .xls data dictionary file - Click the small square box to the left of the file name (a tick appears) and then click the dark blue
Select
button towards the bottom right
- Click the blue
Next
button - Review the Opal table in the pop up window. If this is a new data table the information in the window should tell you the name of the Table you have asked to be created and the number of New Variables (corresponding to number of columns in your .csv data file).
- If the information is correct, click the small box to the left of the table name (CNSIM in the example below). A tick will appear.
- Click the dark blue
Finish
button from the bottom right
- This will take you back to the list of all available tables in the chosen project, and after a few seconds this will be refreshed to include the new table you have create
Creating the Opal table: the data
- Select your project again by clicking on the
Projects
tab from the top menu and click on your project name (CNSIM
in this particular example) - Your table states as holding 0
Entities
(indicating it is empty)
- Click on the small box at the left of the table name, and a tick will appear.
- Click the grey
Import
button from the tabs above the table. This opens a window to define file format.
- The data file is a .csv file. By default the window should state
CSV
. If it does not, choose CSV fromthe drop down menu. - Now click the blue
Next
button to open theImport Data
window.
- Specify data file location by selecting the grey
Browse
button. Use the left menu to navigate to the folder that holds the required file. - Click the small square box to the left of the file name (a tick appears) and then click the dark blue
Select
button on the bottom right - Destination Table: type the data table name. This needs to be the same name as your Opal data dictionary table, in this example CNSIM.
- Entity Type: This is the observational unit that contributes each row in the .csv data file and will be the same as in column D under the
Variables
tab in the data dictionary .xls file. The default isParticipant
.
Participant
in the data dictionary column D even if, as in some survival models, each row in the data set corresponds to something other than a single participant.- Click the blue
Next>
button to open up theConfigure Data Import
window which you can ignore. - Click the blue
Next>
button to open up theReview and select the data dictionaries you wish to import
window. - Select the box for the correct Table into which Opal will import the data. A tick will appear.
- Then click the blue
Next>
button to open up theReview the data that will be imported window
. This shows you the first few rows and columns of the data in the .csv file you selected to read in.
- When you have checked the data click the blue
Next>
button to bring up the data table location window. If you have started this process by creating the table in the folder you wanted it in, then you can simply clickleave it where it is
- If you want to move the table somewhere else, click
move it to another folder
then click the greyBrowse
button and navigate to the folder you want to use. - Click the dark blue
Finish
button from the bottom.
The data (.csv) file now populates the Opal data table which may take several minutes. If it is successful, when you navigate to the table has been saved, you will find Entities
is no longer 0, but equal to the number of rows of data that have been imported.
If the Entities
count remains 0, select Dashboard
from the top menu and click on the fourth icon (Tasks icon
) from the left menu (three green/white horizontal bars). Find the relevant task (it will be a task of Type
import). If it has failed the status
button will be red and if you click on Log
under Actions
you may find useful information as to the failure.
Common reasons for import failure include specifying a table name that is different to that held in the first column of both tabs of the .xls dictionary file.
Your data has now successfully been uploaded into an Opal server. You will need to repeat the process for each Opal server you wish to use.
To start using the DataSHIELD training environment sit our Tutorial for DataSHIELD users using your own data. The tutorial teaches you the basics of DataSHIELD including how to:
- login
- run commands to:
- generate descriptive statistics
- subset tables and vectors
- fit some regression models
Assistance with DataSHIELD can be found:
- searching on our DataSHIELD FAQ page
- posting in our DataSHIELD support forum
Delete tables from Opal
It is simple to delete a file once it has been uploaded to Opal, you can practice by selecting the alspacsim.csv file (or your own data file) you have just uploaded.
- Click on the small square box to the left of the file name and a tick will appear
- Then click on the grey
Rubbish Bin
icon from the top tab (the farthest right icon above the list of file names in the folder) - You will be asked to confirm the file deletion
- Click the left hand dark grey Yes button to confirm
- Other objects such as Projects can be deleted in an equivalent way
Archive and export tables from Opal
- Click on the
Projects
tab from the top menu (it will turn green) and click on your project name (CNSIM
in the example below) - Tick the data table you wish to archive e.g. CNSIM3 in the example below and click the
Export
button (top right)
You can choose to export as a .csv file or as a compressed Opal archive file.
Opal archive files
The .csv is just the data file. The Opal archive file is a .zip file containing the data file (.csv) and the data dictionary for that file all ready formatted to be imported into Opal.
- (optional) choose the location on the server to archive the file to.
- Press submit.
- If the archive is stored in the default location, click on the folder icon on the far left menu.
- Click on
User
>export
and the archived file will be in this folder where you can download it for secure storage outside of the system should you require it.
Opal API
You can also use the Opal API to upload and manage data. See: Importing data into Opal with the API
DataSHIELD Administration
You can manage and update DataSHIELD packages and functions though the Opal Management Interface.
Install DataSHIELD packages
- Click
Administration
in the top right of the horizontal menu bar
- Click
DataSHIELD
located in the Data Analysis column on the right.
DataSHIELD packages in the training environment
The following DataSHIELD R packages are installed by default in the DataSHIELD training environments.
Environment | |
---|---|
Local VMs | dsBase dsGraphics dsModelling dsStats dsBetaTest |
Cloud | dsBase dsGraphics dsModelling dsStats |
To install any of the existing DataSHIELD packages you will first need to remove one or more DataSHIELD packages by clicking the remove
button adjacent to each package. Confirm the removal by selecting yes
- To install one new DataSHIELD package, click on the button
Add Package
and selectinstall a specific DataSHIELD package
. Type the name of the package. In our example will usedsBetaTest
to install new DataSHIELD functions currently in beta test. ClickInstall
.
dsBetaTest
Functions in dsBetaTest have not been fully audited for non-disclosure. They are functions that have been newly developed but not fully tested. Following testing, functions in dsBetaTest will be released in one of the standard DataSHIELD packages.
- Newly installed packages will now appear in the Packages list.
- To make the functions available for all users, click
Publish methods
for the package and click yes to confirm. - It is also possible to use a Git reference to install a DataSHIELD package. First remove one of DataSHIELD packages in our example we will remove
dsBase
by clicking theremove
button adjacent to each package. Confirm the removal by selectingyes
- dsBase has now been removed from the Packages list.
- On the DataSHIELD github repository search for
dsBase
and click on the repository nameddsBase
. - Click on Releases tab - this shows all the package releases.
- Copy the git commit reference number from the latest release e.g. 804bc5e
- To install the package from a git reference click on the button
Add Package
and selectinstall a specific DataSHIELD package
. Type the name of the package e.g dsBase. - Click the green link
+ Advanced Options
and type the Git reference into the box. ClickInstall
.
- dsBase from that commit (the latest release) will now be installed
Installing DataSHIELD packages from a Git reference
By installing packages from a Git reference it is possible to roll back to a previous version of a DataSHIELD. If you are a DataSHIELD developer, it is possible to install development branches in this way.
Install all DataSHIELD packages
To install all the DataSHIELD packages, click on the button Add Package
and select install all DataSHIELD packages
. Click Install
Update all DataSHIELD server side packages
- First remove existing DataSHIELD packages by clicking the
remove
button adjacent to each package. Confirm the removal by selectingyes
- Click on the button
Add Package
and selectinstall a specific DataSHIELD package
. Type the name of the package and clickInstall
. - Make the functions available by clicking on
Publish methods
for each package.
Setting DataSHIELD privacy levels
DataSHIELD privacy level
DataSHIELD privacy levels are set in Opal and correspond to the minimum cell count for calculations. By default the DataSHIELD privacy level is set to 5, returning no results if data from <5 participants has been used for the calculation as the result may potentially be disclosive.
DataSHIELD privacy level is applied to all tables held on the Opal.
- In the DataSHIELD Administration window the existing DataSHIELD privacy level can be viewed in the table under
Options
. By default it is set to 5.
- To test the current privacy level connect to the Opal server from R studio using a log-in script template as below. You will need to edit your server name and IP address in section 1.
################################################################################ # 1. build your login in data frame. ################################################################################ server <- c("name-of-server", "name-of-server") url <- c("http://XXX.XXX.X.XXX:8080", "http://XXX.XXX.X.XXX:8080") user <- "administrator" password <- "datashield_test&" table <- c("CNSIM.CNSIM1","CNSIM.CNSIM2") my_logindata <- data.frame(server,url,user,password,table) ################################################################################ # 2. Load the DataSHIELD Client Libraries ################################################################################ library(opal) library(dsBaseClient) library(dsStatsClient) library(dsGraphicsClient) library(dsModellingClient) ################################################################################ # 3. Login to DataSHIELD ################################################################################ opals <- datashield.login(logins=my_logindata,assign=TRUE)
- Create a contingency table of GENDER and DIS_CVA
ds.table2D("D$GENDER","D$DIS_CVA")
- The results will be a table of NAs as cells in the table are calculated with the data from <5 participants
- Return to the Opal Management Interface, change the privacy level click Edit on the row for privacy level and set it to 1.
- The DataSHIELD privacy level will now be set to 1 - data from 1 or less participants required to return the calculation.
- Return to R studio and logout from the current opal connection
datashield.logout(opals)
- Reconnect to the Opal server as before. Reconnected, you will now have your privacy level set to 1.
################################################################################ # 1. build your login in data frame. ################################################################################ server <- c("name-of-server", "name-of-server") url <- c("http://XXX.XXX.X.XXX:8080", "http://XXX.XXX.X.XXX:8080") user <- "administrator" password <- "datashield_test&" table <- c("CNSIM.CNSIM1","CNSIM.CNSIM2") my_logindata <- data.frame(server,url,user,password,table) ################################################################################ # 2. Load the DataSHIELD Client Libraries ################################################################################ library(opal) library(dsBaseClient) library(dsStatsClient) library(dsGraphicsClient) library(dsModellingClient) ################################################################################ # 3. Login to DataSHIELD ################################################################################ opals <- datashield.login(logins=my_logindata,assign=TRUE)
- Repeat the contingency table function, you will now return tables that are not NA.
- Use the
datashield.logout
function to disconnect from the Opal server. - Return to the Opal Management interface and reset the DataSHIELD privacy level to 5.
DataSHIELD privacy
The DataSHIELD privacy level is applied to all tables held on the Opal. Should a study belong to multiple consortia requiring different privacy levels, it is recommended the data tables be held in a separate Opal instance.
Where to get support
DataSHIELD news and support is available by the DataSHIELD community in the DataSHIELD forum. Tailored support and training in DataSHIELD is provided on a fee basis, please email us with your enquiry.
Opal is supported by the software creators at Obiba. Opal support is available on the Obiba-users mailing list, where support questions can be posted for free. Opal general enquiries can be sent to info@obiba.org.
DataSHIELD Wiki by DataSHIELD is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Based on a work at http://www.datashield.ac.uk/wiki