Opal management

Logging onto the Opal management interface

  • If using the cloud training environment your trainer will have switched the training Opal servers on.
  • If using local VMs Start the Opal servers 

Starting the Opal training VMs

Having opened the training Opal VMs, it is necessary to leave them running for a couple of minutes before trying to login. If you do not wait long enough you will find

  • you may not be given the login window or 
  • after entering username and password the window just hangs

When the training Opal servers are ready the login window appears very rapidly. Once you have entered the user name and password the Opal web interface appears almost immediately. 

  • Your opal servers will sit at an IP address on port :8080 (http) or :8443 (https)
DataSHIELD Training EnvironmentDescriptionIP Address
VMsDataSHIELD training Opal on your local machine

http://192.168.56.100:8080

http://192.168.56.101:8080

CloudDataSHIELD training Opal in the cloudAsk your trainer.
  • Navigate to the address in your web browser 

  • Enter the username and password - default training details given below
Training EnvironmentDefault UsernameDefault Password
VMsadministratordatashield_test&
Cloudadministratordatashield_test&

Managing data in Opal

Simulated datasets are provided in the training Opal servers. However, users may wish to conduct analysis on their own simulated data. The instructions on this page will allow you to upload your own simulated data to your training Opal servers. You can split one simulated dataset into two - and upload half the dataset onto one Opal server and the rest on the second server.

Preparing your data for Opal

Opal servers require that a formal data dictionary is specified and uploaded in order that a data set can be properly imported.

Opal accepts various formats but in this tutorial a .csv is used for the data file and Microsoft Excel for the data dictionary file (.xls or .xlsx).  

The data dictionary file

  • Download the sample dataset alspacsim.csv or use your own data table in .csv format. This is a simulated dataset from the ALSPAC clinic at participant age 7.
Variable nameDescriptionCategorical
MALEcodes sex

1 = male

0 = female

AGE_YEARSage in decimal years on the day of the clinic
HEIGHTheight age 7 (cm)
HEIGHT_SITsitting height age 7 (cm)
WAISTwaist circumference age 7 (cm)
HIPhip circumference age 7 (cm)
WEIGHTweight age 7 (Kg)
SBPsystolic blood pressure age 7 (top of the blood pressure fluctuation) (mm of Hg)
DPBdiastolic blood pressure age 7 (bottom of the blood pressure fluctuation) (mm of Hg)
PULSEpulse rate age 7 (beats per minute)
BMIBody Mass Index derived as wt/(ht/100)2 The height variable is divided by 100
to express it in metres rather than centimeters

  • Download the Opal-dictionary-template.xls template. You will need to follow the instructions below to compile a data dictionary for the new dataset in order to upload it to Opal.
  • The data dictionary file requires formatting as an Excel spreadsheet (.xls or .xlsx) with two tabs Variables and Categories.

The variables tab

The image below shows the variables tab for the simulated dataset CNSIM used within the v4 Tutorial for DataSHIELD users.

The table below summarises the column names in the variables tab, including examples from the test data built into the training environment in the spreadsheet image above.

Column Names

Description

Default value

Example value in the test data

Notes

table

the table name the variable will be added to

Table

Column A (CNSIM)

This is the table name you refer to in your DataSHIELD login details.

It is critical that the table name appears in every row

name

the variable name


Column B (e.g. LAB_TSC)

Mandatory field.
Becomes the the column name in Opal for that variable

valueType

the value type of the variable

text

Column C (e.g. decimal, integer)

See further information on variable types and classes

entityType

Opal can store data on different entities

Participant

Column D (e.g. Participant)

Examples: Participant (each row corresponds to a different participant), Instrument, Area, Drug

referencedEntityType

if the variable values are entity identifiers, this is the type of the entities that are referenced


Column E

Can be left blank

mimeType

the mime type of the variable to help applications to display documents


Column F

Examples: image/jpeg, application/excel. Can be left blank

unit

the unit in which variables are expressed


Column G (e.g. Participant)

Examples: cm, kg, ml etc. Can be left blank

repeatable

repeatable measurements

0

Column H (0)

1 if repeatable, 0 if not (e.g. Three measures of blood pressure)

occurrenceGroup

name of a repeatable variable group


Column I

Example: [measure value, measure date] is a group of variables that can be repeated. Can be left blank

label:en

label of the variable.


Column J

Can be localized by language e.g. label:en in english, label:fr for french)

aliasAlternative name for the variable, usually used for defining a shorter name for the variable
Column K
  • Edit the variable tab of your data dictionary template to reflect the variables in alspacsim.csv (or your own data). 

The categories tab

The image below shows the categories tab for the simulated dataset CNSIM used within the v4 Tutorial for DataSHIELD users.  Each category for each variable is represented by a single row in the spreadsheet.  For example, in the dictionary file below, 3 rows (rows 12-14 inclusive) are for PM_BMI_CATEGORICAL as it has 3 categories.

The table below summarises the column names in the categories tab, including examples from the simulated datasets built into the DataSHIELD training environments in the spreadsheet image above.

Column Names

Description

Default value

Example value in the test data

Notes

table

the table name the variable will be added to

Table

Column A (CNSIM)

This is the table name you refer to in your DataSHIELD login details.

It is critical that the table name appears in every row

variable

the variable name (mandatory field)


Column B (e.g. DIS_CVA)

mandatory field. One row per category for each variable.

name

the variable category

integer

Column C (e.g. 1)

mandatory field. One row per category for each variable

code

can be left blank


Column D

Can be left blank

missing

Some categories are interpreted as missing answers (e.g. 'Don't know', 'Prefer not to answer').  

0

Column E

Use 1 for missing and 0 for not missing (normal answer).

label:en

label of the variable category


Column F

Human readable text description of the category. Can be localized by language e.g. label:en in english, label:fr for french)

  • Edit the categories tab of your data dictionary template to reflect the variables in alspacsim.csv (or your own data). 

The data file

  • In the training Opal servers, the data file is a .csv (comma delimited) file
  • Missing values are represented in the data file as two consecutive commas ,,

Missing values

Missing values can not be represented as white space or NA
  • There is no column name for the first column.  The first column contains the row number for the data. "Participant" or Opal ID number setting in Opal is using the row number. 
  • See below for an example of what the .csv file looks like when opened in Excel.

Uploading your data and data dictionary files to Opal

Data structure in Opal

Data are held in Opal in what is called a table.

  • each row contains the data for one of the primary units of data collection (perhaps most commonly, a participant)
  • each column represents a different variable.

Opal holds all relevant data tables in a project.

Creating a new Opal project

The first step to uploading your data to Opal is to indicate within which project you want to site the new data table. This may either be an existing project or a new one:

  • To create a new project click on the Project tab in the top left (after clicking it appears in green on the dark blue horizontal bar) 
  •  click Add Project.

Fill in the details of your project:

  • You must specify a name and this will be used to point to the data. For convenience do not use a very long name. The example below shows the name as CNSIM.
  • The "title" can then be a longer explanatory label for the table
  • The database currently defaults to mongodb but can also be MySQL
  • You can give an additional description if you wish

Project and table names

In DataSHIELD in order to refer uniquely to a table held in Opal you must specify both the Opal project and table names. For example the table EMISS in the SURVSIM Project is referred to as SURVSIM.EMISS in your DataSHIELD login template. In the case of the training data, it happens to be that both the Project and the Table within it are called CNSIM - and so the table is referred to as CNSIM.CNSIM.

Upload data and data dictionary into Opal 

To make data available in Opal, you need to upload the data dictionary (.xls) and the data (.csv) files you have created:

  • To upload data files from your local computer,  click on Dashboard from the top menu bar (the word changes to green)

  • Click Manage Files from the left hand menu

  • By default, you will see a list of files currently held in your Opal Home folder. This is where your dictionary and data files will be saved.  

If you want to save the data in a project-specific directory (which is often recommended) then click Projects from the left hand menu and choose the project you will be working in.

You can also create a new folder by navigating to wherever you want the new folder to be (e.g. you may want to navigate to the Project folder and then create a new subfolder within that Project folder). Once you are there click Add Folder (button with blue background on top left) and specify the name of the new folder. 

If, when you have finalised where you want to keep the data, you find that the .xls and/or .csv file already exist in that location, you need to decide whether the pre-existing files are current or whether you need to over-write them with the up-to-date version(s). Rather than making a default assumption about this, Opal explicitly asks you what decision you would like to take.

To upload the data dictionary (.xls file) and the data file (.csv file):

  • Click the grey Upload button from the top tab

  • This brings up a new window, click Choose file

  • Browse to the .xls file to upload from your your local machine 
  • Select that file and click open (you can simply double-click the file in Windows)
  • Clicking the dark blue upload at the bottom of the window
  • If that file already exists in that location then it will ask you whether to replace it or not (see above tip)
  • Repeat for the .csv (data file)

Creating the Opal table: the data dictionary 

Your data dictionary (.xls) and data (.csv) files should now be uploaded into Opal. However, at present they simply exist as stored files, the data cannot be used in Opal until you have converted them into an Opal data table.

  • Click on the Projects tab from the top menu (it will turn green) and click on your project name (CNSIM in the example below)
  • Click on the large blue +Add Table button that sits  above the list of tables in the project you have specified
  • Select Add/update tables from dictionary ... from the drop down menu

  • Use HOMES and SYSTEM on your left menu to navigate to the folder that holds the .xls data dictionary file
  • Click the small square box to the left of the file name (a tick appears) and then click the dark blue Select button towards the bottom right

  • Click the blue Next button
  • Review the Opal table in the pop up window. If this is a new data table the information in the window should tell you the name of the Table you have asked to be created and the number of New Variables (corresponding to number of columns in your .csv data file).
The table name must match the first column of the two tabs in the .xls data dictionary file.
  • If the information is correct, click the small box to the left of the table name (CNSIM in the example below). A tick will appear. 
  • Click the dark blue Finish button from the bottom right

  • This will take you back to the list of all available tables in the chosen project, and after a few seconds this will be refreshed to include the new table you have create

Creating the Opal table: the data

  • Select your project again by clicking on the Projects tab from the top menu and click on your project name (CNSIM in this particular example)
  • Your table states as holding 0 Entities (indicating it is empty)

  • Click on the small box at the left of the table name, and a tick will appear.
  • Click the grey Import button from the tabs above the table.  This opens a window to define file format.

  • The data file is a .csv file. By default the window should state CSV. If it does not, choose CSV fromthe drop down menu.
  • Now click the blue Next button to open the Import Data window.

  • Specify data file location by selecting the grey Browse button. Use the left menu to navigate to the folder that holds the required file.
  • Click the small square box to the left of the file name (a tick appears) and then click the dark blue Select button on the bottom right
  • Destination Table: type the data table name.  This needs to be the same name as your Opal data dictionary table, in this example CNSIM. 
As soon as you start typing the table name, Opal will list you all of the valid Opal tables it currently holds, you only have to click on the correct one rather than typing the whole name.
  • Entity Type:  This is the observational unit that contributes each row in the .csv data file and will be the same as in column D under the Variables tab in the data dictionary .xls file.  The default is Participant
You can use Entity Type: Participant in the data dictionary column D even if, as in some survival models, each row in the data set corresponds to something other than a single participant.
  • Click the blue Next> button to open up the Configure Data Import window which you can ignore. 
  • Click the blue Next> button to open up the Review and select the data dictionaries you wish to import window. 
  • Select the box for the correct Table into which Opal will import the data.  A tick will appear.

  • Then click the blue Next> button to open up the Review the data that will be imported window .  This shows you the first few rows and columns of the data in the .csv file you selected to read in. 

You can navigate the review data table by clicking the green < and > buttons on the header line. If you want to go down or up the file looking for rows below or above what you can see, use the white on grey DVD-like buttons above the table. If you hover the cursor over each button an explanation will appear.
  • When you have checked the data click the blue Next> button to bring up the data table location window. If you have started this process by creating the table in the folder you wanted it in, then you can simply click leave it where it is 
  • If you want to move the table somewhere else, click move it to another folder then click the grey Browse button and navigate to the folder you want to use.
  • Click the dark blue Finish button from the bottom.

The data (.csv) file now populates the Opal data table which may take several minutes. If it is successful, when you navigate to the table has been saved, you will find Entities is no longer 0, but equal to the number of rows of data that have been imported. 

If the Entities count remains  0, select Dashboard from the top menu and click on the fourth icon (Tasks icon) from the left menu (three green/white horizontal bars). Find the relevant task (it will be a task of Type import). If it has failed the status button will be red and if you click on Log under Actions you may find useful information as to the failure.  

Common reasons for import failure include specifying a table name that is different to that held in the first column of both tabs of the .xls dictionary file.

Your data has now successfully been uploaded into an Opal server. You will need to repeat the process for each Opal server you wish to use.

To start using the DataSHIELD training environment sit our Tutorial for DataSHIELD users using your own data. The tutorial teaches you the basics of DataSHIELD including how to:

  • login
  • run commands to:
    • generate descriptive statistics
    • subset tables and vectors
    • fit some regression models

Assistance with DataSHIELD can be found:

Delete tables from Opal

It is simple to delete a file once it has been uploaded to Opal, you can practice by selecting the alspacsim.csv file (or your own data file) you have just uploaded. 

  • Click on the small square box to the left of the file name and a tick will appear
  • Then click on the grey Rubbish Bin icon from the top tab (the farthest right icon above the list of file names in the folder)
  • You will be asked to confirm the file deletion
  • Click the left hand dark grey Yes button to confirm 
  • Other objects such as Projects can be deleted in an equivalent way

Archive and export tables from Opal

  • Click on the Projects tab from the top menu (it will turn green) and click on your project name (CNSIM in the example below)
  • Tick the data table you wish to archive e.g. CNSIM3 in the example below and click the Export button (top right)

  • You can choose to export as a .csv file or as a compressed Opal archive file.

Opal archive files

The .csv is just the data file. The Opal archive file is a .zip file containing the data file (.csv) and the data dictionary for that file all ready formatted to be imported into Opal.

  • (optional) choose the location on the server to archive the file to. 
  • Press submit.

  • If the archive is stored in the default location, click on the folder icon on the far left menu.

  • Click on User export and the archived file will be in this folder where you can download it for secure storage outside of the system should you require it.

Opal API

You can also use the Opal API to upload and manage data. See: Importing data into Opal with the API

DataSHIELD Administration

You can manage and update DataSHIELD packages and functions though the Opal Management Interface. 

Install DataSHIELD packages

  • Click Administration in the top right of the horizontal menu bar

  • Click DataSHIELD located in the Data Analysis column on the right. 

DataSHIELD packages in the training environment

The following DataSHIELD R packages are installed by default in the DataSHIELD training environments.

Environment
Local VMs

dsBase

dsGraphics

dsModelling

dsStats

dsBetaTest

Cloud

dsBase

dsGraphics

dsModelling

dsStats

To install any of the existing DataSHIELD packages you will first need to remove one or more DataSHIELD packages by clicking the  remove  button adjacent to each package. Confirm the removal by selecting yes 

  • To install one new DataSHIELD package, click on the button  Add Package and select install a specific DataSHIELD package . Type the name of the package. In our example will use dsBetaTest to install new DataSHIELD functions currently in beta test.  Click Install .

dsBetaTest

Functions in dsBetaTest have not been fully audited for non-disclosure. They are functions that have been newly developed but not fully tested. Following testing, functions in dsBetaTest will be released in one of the standard DataSHIELD packages.

  • Newly installed packages will now appear in the Packages list. 

  • To make the functions available for all users, click Publish methods for the package and click yes to confirm. 
  • It is also possible to use a Git reference to install a DataSHIELD package. First remove one of DataSHIELD packages in our example we will remove dsBase by clicking the  remove button adjacent to each package. Confirm the removal by selecting yes 
  • dsBase has now been removed from the Packages list.

  • On the DataSHIELD github repository search for dsBase and click on the repository named dsBase.
  • Click on Releases tab - this shows all the package releases.

  • Copy the git commit reference number from the latest release e.g. 804bc5e

  • To install the package from a git reference click on the button  Add Package and select install a specific DataSHIELD package . Type the name of the package e.g dsBase.
  • Click the green link + Advanced Options and type the Git reference into the box.  Click Install.

  • dsBase from that commit (the latest release) will now be installed

Installing DataSHIELD packages from a Git reference

By installing packages from a Git reference it is possible to roll back to a previous version of a DataSHIELD. If you are a DataSHIELD developer, it is possible to install development branches in this way.

Install all DataSHIELD packages

To install all the DataSHIELD packages, click on the button  Add Package and select install all DataSHIELD packages . Click Install

Update all DataSHIELD server side packages

  • First remove existing DataSHIELD packages by clicking the  remove  button adjacent to each package. Confirm the removal by selecting yes 
  • Click on the button  Add Package  and select install a specific DataSHIELD package . Type the name of the package and click Install .
  • Make the functions available by clicking on  Publish methods for each package.

Setting DataSHIELD privacy levels 

DataSHIELD privacy level

DataSHIELD privacy levels are set in Opal and correspond to the minimum cell count for calculations. By default the DataSHIELD privacy level is set to 5, returning no results if data from <5 participants has been used for the calculation as the result may potentially be disclosive. 

DataSHIELD privacy level is applied to all tables held on the Opal.

  • In the DataSHIELD Administration window the existing DataSHIELD privacy level can be viewed in the table under Options.  By default it is set to 5. 

  • To test the current privacy level connect to the Opal server from R studio using a log-in script template as below.  You will need to edit your server name and IP address in section 1.
################################################################################
# 1. build your login in data frame.  
################################################################################
server <- c("name-of-server", "name-of-server")
url <- c("http://XXX.XXX.X.XXX:8080", "http://XXX.XXX.X.XXX:8080")
user <- "administrator" 
password <- "datashield_test&"
table <- c("CNSIM.CNSIM1","CNSIM.CNSIM2")
my_logindata <- data.frame(server,url,user,password,table)


################################################################################
# 2. Load the DataSHIELD Client Libraries
################################################################################
library(opal)
library(dsBaseClient)
library(dsStatsClient)
library(dsGraphicsClient)
library(dsModellingClient)

################################################################################
# 3. Login to DataSHIELD
################################################################################
opals <- datashield.login(logins=my_logindata,assign=TRUE)
  • Create a contingency table of GENDER and DIS_CVA
ds.table2D("D$GENDER","D$DIS_CVA")
  • The results will be a table of NAs as cells in the table are calculated with the data from <5 participants

  • Return to the Opal Management Interface, change the privacy level click Edit on the row for privacy level and set it to 1. 

  • The DataSHIELD privacy level will now be set to 1 - data from 1 or less participants required to return the calculation. 

  • Return to R studio and logout from the current opal connection
datashield.logout(opals)
  • Reconnect to the Opal server as before.  Reconnected, you will now have your privacy level set to 1. 
################################################################################
# 1. build your login in data frame.  
################################################################################
server <- c("name-of-server", "name-of-server")
url <- c("http://XXX.XXX.X.XXX:8080", "http://XXX.XXX.X.XXX:8080")
user <- "administrator" 
password <- "datashield_test&"
table <- c("CNSIM.CNSIM1","CNSIM.CNSIM2")
my_logindata <- data.frame(server,url,user,password,table)


################################################################################
# 2. Load the DataSHIELD Client Libraries
################################################################################
library(opal)
library(dsBaseClient)
library(dsStatsClient)
library(dsGraphicsClient)
library(dsModellingClient)

################################################################################
# 3. Login to DataSHIELD
################################################################################
opals <- datashield.login(logins=my_logindata,assign=TRUE)
  • Repeat the contingency table function, you will now return tables that are not NA.

  • Use the datashield.logout function to disconnect from the Opal server.
  • Return to the Opal Management interface and reset the DataSHIELD privacy level to 5. 


DataSHIELD privacy

The DataSHIELD privacy level is applied to all tables held on the Opal. Should a study belong to multiple consortia requiring different privacy levels, it is recommended the data tables be held in a separate Opal instance.

Where to get support


DataSHIELD news and support is available by the DataSHIELD community in the DataSHIELD forum. Tailored support and training in DataSHIELD is provided on a fee basis, please email us with your enquiry.

Opal is supported by the software creators at Obiba. Opal support is available on the Obiba-users mailing list, where support questions can be posted for free. Opal general enquiries can be sent to info@obiba.org.



DataSHIELD Wiki by DataSHIELD is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Based on a work at http://www.datashield.ac.uk/wiki