Opal management
- 1 Logging onto the Opal management interface
- 2 Managing data in Opal
- 2.1 Preparing your data for Opal
- 2.1.1 The data dictionary file
- 2.1.1.1 The variables tab
- 2.1.1.2 The categories tab
- 2.1.2 The data file
- 2.1.1 The data dictionary file
- 2.2 Uploading your data and data dictionary files to Opal
- 2.3 Creating the Opal table: the data dictionary
- 2.4 Creating the Opal table: the data
- 2.5 Delete tables from Opal
- 2.6 Archive and export tables from Opal
- 2.1 Preparing your data for Opal
- 3 DataSHIELD Administration
- 4 Update all DataSHIELD server side packages
- 5 Setting DataSHIELD privacy levels
- 6 Where to get support
Logging onto the Opal management interface
If using the cloud training environment your trainer will have switched the training Opal servers on.
If using local VMs Start the Opal servers
Starting the Opal training VMs
Having opened the training Opal VMs, it is necessary to leave them running for a couple of minutes before trying to login. If you do not wait long enough you will find
you may not be given the login window or
after entering username and password the window just hangs
When the training Opal servers are ready the login window appears very rapidly. Once you have entered the user name and password the Opal web interface appears almost immediately.
Your opal servers will sit at an IP address on port :8080 (http) or :8443 (https)
DataSHIELD Training Environment | Description | IP Address |
|---|---|---|
VMs | DataSHIELD training Opal on your local machine | |
Cloud | DataSHIELD training Opal in the cloud | Ask your trainer. |
Navigate to the address in your web browser
Enter the username and password - default training details given below
Training Environment | Default Username | Default Password |
|---|---|---|
VMs | administrator | datashield_test& |
Cloud | administrator | datashield_test& |
Managing data in Opal
Simulated datasets are provided in the training Opal servers. However, users may wish to conduct analysis on their own simulated data. The instructions on this page will allow you to upload your own simulated data to your training Opal servers. You can split one simulated dataset into two - and upload half the dataset onto one Opal server and the rest on the second server.
Preparing your data for Opal
Opal servers require that a formal data dictionary is specified and uploaded in order that a data set can be properly imported.
Opal accepts various formats but in this tutorial a .csv is used for the data file and Microsoft Excel for the data dictionary file (.xls or .xlsx).
The data dictionary file
Download the sample dataset alspacsim.csv or use your own data table in .csv format. This is a simulated dataset from the ALSPAC clinic at participant age 7.
Variable name | Description | Categorical |
|---|---|---|
MALE | codes sex | 1 = male 0 = female |
AGE_YEARS | age in decimal years on the day of the clinic | |
HEIGHT | height age 7 (cm) | |
HEIGHT_SIT | sitting height age 7 (cm) | |
WAIST | waist circumference age 7 (cm) | |
HIP | hip circumference age 7 (cm) | |
WEIGHT | weight age 7 (Kg) | |
SBP | systolic blood pressure age 7 (top of the blood pressure fluctuation) (mm of Hg) | |
DPB | diastolic blood pressure age 7 (bottom of the blood pressure fluctuation) (mm of Hg) | |
PULSE | pulse rate age 7 (beats per minute) | |
BMI | Body Mass Index derived as wt/(ht/100)2 The height variable is divided by 100 |
Download the Opal-dictionary-template.xls template. You will need to follow the instructions below to compile a data dictionary for the new dataset in order to upload it to Opal.
The data dictionary file requires formatting as an Excel spreadsheet (.xls or .xlsx) with two tabs
VariablesandCategories.
The variables tab
The image below shows the variables tab for the simulated dataset CNSIM used within the v4 Tutorial for DataSHIELD users.
The table below summarises the column names in the variables tab, including examples from the test data built into the training environment in the spreadsheet image above.
Column Names | Description | Default value | Example value in the test data | Notes |
|---|---|---|---|---|
table | the table name the variable will be added to | Table | Column A (CNSIM) | This is the table name you refer to in your DataSHIELD login details. It is critical that the table name appears in every row |
name | the variable name | Column B (e.g. LAB_TSC) | Mandatory field. | |
valueType | the value type of the variable | text | Column C (e.g. decimal, integer) | |
entityType | Opal can store data on different entities | Participant | Column D (e.g. Participant) | Examples: Participant (each row corresponds to a different participant), Instrument, Area, Drug |
referencedEntityType | if the variable values are entity identifiers, this is the type of the entities that are referenced | Column E | Can be left blank | |
mimeType | the mime type of the variable to help applications to display documents | Column F | Examples: image/jpeg, application/excel. Can be left blank | |
unit | the unit in which variables are expressed | Column G (e.g. Participant) | Examples: cm, kg, ml etc. Can be left blank | |
repeatable | repeatable measurements | 0 | Column H (0) | 1 if repeatable, 0 if not (e.g. Three measures of blood pressure) |
occurrenceGroup | name of a repeatable variable group | Column I | Example: [ | |
label:en | label of the variable. | Column J | Can be localized by language e.g. label:en in english, label:fr for french) | |
alias | Alternative name for the variable, usually used for defining a shorter name for the variable | Column K |
Edit the
variabletab of your data dictionary template to reflect the variables in alspacsim.csv (or your own data).
The categories tab
The image below shows the categories tab for the simulated dataset CNSIM used within the v4 Tutorial for DataSHIELD users. Each category for each variable is represented by a single row in the spreadsheet. For example, in the dictionary file below, 3 rows (rows 12-14 inclusive) are for PM_BMI_CATEGORICAL as it has 3 categories.
The table below summarises the column names in the categories tab, including examples from the simulated datasets built into the DataSHIELD training environments in the spreadsheet image above.
Column Names | Description | Default value | Example value in the test data | Notes |
|---|---|---|---|---|
table | the table name the variable will be added to | Table | Column A (CNSIM) | This is the table name you refer to in your DataSHIELD login details. It is critical that the table name appears in every row |
variable | the variable name (mandatory field) | Column B (e.g. DIS_CVA) | mandatory field. One row per category for each variable. | |
name | the variable category | integer | Column C (e.g. 1) | mandatory field. One row per category for each variable |
code | can be left blank | Column D | Can be left blank | |
missing | Some categories are interpreted as missing answers (e.g. 'Don't know', 'Prefer not to answer'). | 0 | Column E | Use 1 for missing and 0 for not missing (normal answer). |
label:en | label of the variable category | Column F | Human readable text description of the category. Can be localized by language e.g. label:en in english, label:fr for french) |
Edit the
categoriestab of your data dictionary template to reflect the variables in alspacsim.csv (or your own data).
The data file
In the training Opal servers, the data file is a
.csv(comma delimited) fileMissing values are represented in the data file as two consecutive commas
,,
Missing values
Missing values can not be represented as white space or NA
There is no column name for the first column. The first column contains the row number for the data. "Participant" or Opal ID number setting in Opal is using the row number.
See below for an example of what the .csv file looks like when opened in Excel.
Uploading your data and data dictionary files to Opal
Data structure in Opal
Data are held in Opal in what is called a table.
each row contains the data for one of the primary units of data collection (perhaps most commonly, a participant)
each column represents a different variable.
Opal holds all relevant data tables in a project.
Creating a new Opal project
The first step to uploading your data to Opal is to indicate within which project you want to site the new data table. This may either be an existing project or a new one:
To create a new project click on the
Projecttab in the top left (after clicking it appears in green on the dark blue horizontal bar)click
Add Project.
Fill in the details of your project:
You must specify a name and this will be used to point to the data. For convenience do not use a very long name. The example below shows the name as CNSIM.
The "title" can then be a longer explanatory label for the table
The database currently defaults to mongodb but can also be MySQL
You can give an additional description if you wish
Project and table names
In DataSHIELD in order to refer uniquely to a table held in Opal you must specify both the Opal project and table names. For example the table EMISS in the SURVSIM Project is referred to as SURVSIM.EMISS in your DataSHIELD login template. In the case of the training data, it happens to be that both the Project and the Table within it are called CNSIM - and so the table is referred to as CNSIM.CNSIM.
Upload data and data dictionary into Opal
To make data available in Opal, you need to upload the data dictionary (.xls) and the data (.csv) files you have created:
To upload data files from your local computer, click on
Dashboardfrom the top menu bar (the word changes to green)Click
Manage Filesfrom the left hand menu
By default, you will see a list of files currently held in your
Opal Home folder. This is where your dictionary and data files will be saved.
If you want to save the data in a project-specific directory (which is often recommended) then click Projects from the left hand menu and choose the project you will be working in.
You can also create a new folder by navigating to wherever you want the new folder to be (e.g. you may want to navigate to the Project folder and then create a new subfolder within that Project folder). Once you are there click Add Folder (button with blue background on top left) and specify the name of the new folder.
If, when you have finalised where you want to keep the data, you find that the .xls and/or .csv file already exist in that location, you need to decide whether the pre-existing files are current or whether you need to over-write them with the up-to-date version(s). Rather than making a default assumption about this, Opal explicitly asks you what decision you would like to take.
To upload the data dictionary (.xls file) and the data file (.csv file):
Click the grey
Uploadbutton from the top tab
This brings up a new window, click
Choose file
Browse to the .xls file to upload from your your local machine
Select that file and click
open(you can simply double-click the file in Windows)Clicking the dark blue
uploadat the bottom of the windowIf that file already exists in that location then it will ask you whether to replace it or not (see above tip)
Repeat for the .csv (data file)
Creating the Opal table: the data dictionary
Your data dictionary (.xls) and data (.csv) files should now be uploaded into Opal. However, at present they simply exist as stored files, the data cannot be used in Opal until you have converted them into an Opal data table.
Click on the
Projectstab from the top menu (it will turn green) and click on your project name (CNSIMin the example below)Click on the large blue
+Add Tablebutton that sits above the list of tables in the project you have specifiedSelect
Add/update tables from dictionary ...from the drop down menu
Locate the data dictionary (.xls) file you have uploaded by clicking
Browse.
Use
HOMESandSYSTEMon your left menu to navigate to the folder that holds the .xls data dictionary fileClick the small square box to the left of the file name (a tick appears) and then click the dark blue
Selectbutton towards the bottom right
Click the blue
NextbuttonReview the Opal table in the pop up window. If this is a new data table the information in the window should tell you the name of the Table you have asked to be created and the number of New Variables (corresponding to number of columns in your .csv data file).
The table name must match the first column of the two tabs in the .xls data dictionary file.
If the information is correct, click the small box to the left of the table name (CNSIM in the example below). A tick will appear.
Click the dark blue
Finishbutton from the bottom right
This will take you back to the list of all available tables in the chosen project, and after a few seconds this will be refreshed to include the new table you have create
Creating the Opal table: the data
Select your project again by clicking on the
Projectstab from the top menu and click on your project name (CNSIMin this particular example)Your table states as holding 0
Entities(indicating it is empty)
Click on the small box at the left of the table name, and a tick will appear.
Click the grey
Importbutton from the tabs above the table. This opens a window to define file format.
The data file is a .csv file. By default the window should state
CSV. If it does not, choose CSV fromthe drop down menu.Now click the blue
Nextbutton to open theImport Datawindow.
Specify data file location by selecting the grey
Browsebutton. Use the left menu to navigate to the folder that holds the required file.Click the small square box to the left of the file name (a tick appears) and then click the dark blue
Selectbutton on the bottom rightDestination Table: type the data table name. This needs to be the same name as your Opal data dictionary table, in this example CNSIM.
As soon as you start typing the table name, Opal will list you all of the valid Opal tables it currently holds, you only have to click on the correct one rather than typing the whole name.
Entity Type: This is the observational unit that contributes each row in the .csv data file and will be the same as in column D under the
Variablestab in the data dictionary .xls file. The default isParticipant.
You can use Entity Type: Participant in the data dictionary column D even if, as in some survival models, each row in the data set corresponds to something other than a single participant.
Click the blue
Next>button to open up theConfigure Data Importwindow which you can ignore.Click the blue
Next>button to open up theReview and select the data dictionaries you wish to importwindow.Select the box for the correct Table into which Opal will import the data. A tick will appear.
Then click the blue
Next>button to open up theReview the data that will be imported window. This shows you the first few rows and columns of the data in the .csv file you selected to read in.
You can navigate the review data table by clicking the green < and > buttons on the header line. If you want to go down or up the file looking for rows below or above what you can see, use the white on grey DVD-like buttons above the table. If you hover the cursor over each button an explanation will appear.
DataSHIELD Wiki by DataSHIELD is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Based on a work at http://www.datashield.ac.uk/wiki
