Welcome to the DataSHIELD installation instructions for Linux. Click below for the table of contents, specifying all the stages needed to get set up and ready to go.
Click here to view page contents...
DataSHIELD support is freely available in the DataSHIELD forum by the DataSHIELD community. Please use this as the first port of call for any problems you may be having, it is monitored closely for new threads.
The minimum computer specification for installing the DataSHIELD training environment is:
- 12GB+ of RAM.
- Each VM is allocated 4GB RAM, and some needs to be spare for other applications.
- Reasonably powerful processor (for example: Intel i5 or i7).
- 16GB space on the hard disk.
- 7GB per VM + VirtualBox installation requirement (80 MB)
Installing the Virtual Servers
To run the virtual servers you will need to install VirtualBox (click here for more about what Virtualbox is).
There are three ways to install.
- Via web browser: go to the download page for linux distributions and choose the one corresponding to your machine.
- Via Ubuntu Software app: in the applications search, type "Ubuntu Software" (called "software store" in Ubuntu 18), then click on "install".
- Via terminal:
$ sudo apt-get install virtualbox
Download the Virtual Servers
Two virtual servers are available for you to test DataSHIELD with. The virtual servers require 4GB RAM each and about 5GB hard-disk space each.
You can download the VMs from google drive:
Import the Virtual Servers into VirtualBox
You will need to import the VMs into Virtualbox. Open Virtualbox, and in the top menu, on the left:
File > Import Appliance...
- Navigate to the directory the VM files were saved to (e.g. Downloads...)
- Click "Next", to move onto "Appliance Settings"
- Nothing needs to be changed here, so click "Import"
- the import should take around 1 minute
- Repeat for the second downloaded VM file.
Setting up the Virtual Servers
Setting up the VirtualBox network adapter
- On the VirtualBox "Tools" bar at the top of the list of Virtual Machines on the left hand side, right click and select Host Network manager. A popup window should appear.
- Ensure the Adapter
tab at the bottom of the screen has the following settings:
- Configure Adapter Manually options checked
- IPv4 address: 192.168.56.1
- IPv4 Network Mask: 255.255.255.0
- IPv6 Prefix Length: 0
- And ensure the DHCP Server tab at the bottom of the screen, next to Adapter;
- Has Enable Server unchecked.
- Click "Apply" to save any changed settings.
- You can now safely close this Host Network Manager popup window
- Now on each of your imported Virtual Machines, one at a time, right click the VM and click on "Settings" at the top of the right click menu (icon is orange in colour):
- Check that under the Network tab on the left hand side, the following options are set:
- Adapter 1: Attached to: NAT
- Adapter 2: Attached to: Host-only Adapter
Connect to the Virtual Servers
Boot the Virtual Machines
Now that the two virtual machines have been downloaded, imported and configured, it's time to launch them. What is happening when the VM launches is that it is booting up a mini-computer (within your computer) which plays the role of an Opal server as if it were online and you were remotely connecting to the data stored on it.
To start a VM (i.e. a Virtual Opal Server):
- When in VirtualBox:
- left-click on a server
- click on the green arrow that says start
- (or just double click on the Opal server name).
- Repeat for the second VM
Depending on how powerful your computer is, the Opal servers may take a few minutes to boot and for Opal to start.
On first launching the VMs (Virtual Opal Servers) you will observe several grey warning messages. These can be suppressed by clicking the "x" in the top right corner. This will trap your mouse-
Clicking inside them can cause your mouse pointer to become captured within the mini-computer environment. The default key to escape this is Right-Ctrl
For ordinary use, once the VMs have booted there is no cause to type commands within them, as they are just acting as a server while you do analysis on the web or on R, just connecting to them.
However if you are learning DataSHIELD for development purposes, you may wish to log in to the VM. Once the machines have been started you can use the credentials:
- username: administrator
- password: datashield_test&
for tasks such as reviewing logs produced on the server, instructions here.
Now that your VMs are launched you should check they are ready to be used before getting set up in R.
Virtual Servers' IP addresses
By default, machines on the host-only network can be found at 192.168.56.xxx:
Your computer (the host) will be at 192.168.56.1
The virtual servers have been configured as follows:
- dstesting-100 – 192.168.56.100
- dstesting-101 – 192.168.56.101
Web Access to the Opal Server (How to look inside the VM)
The opal web interface is accessed in you browser. Simply type the IP address of the VM, followed by the port number.
For example: to access the web interface for dstesting-100, go to:
Please allow up to 2 minutes after launching a VM to gain access to the Opal web interface.
Logging onto the Opal web interface
The following username and password is used to access the web portal of the training servers:
e.g. to read about the metadata of the studies, or to connect to external resources.
When finished: Shutting down the Opal servers
Assuming no changes have been applied to the Virtual Opal Servers (which will be the case for general users) after you have finished your analysis, to shutdown the Opal server:
Setting up your R Session
These are instructions for installing R on your own machine. That is, the machine you will be using for analysis, not the virtual servers.
These instructions assume you are using Ubuntu on your local machine.
Installing R in Ubuntu
Note: the CRAN R repository you add depends on the Ubuntu release you are running:
How to check your Ubuntu version:
Open your Terminal, and past command
And take note of the number (20, 18, 16) next to "Release".
Now install R using the following instructions:
- If you are running Ubuntu 20.04 "Focal Fossa":
- If you are running Ubuntu 18.04 "Bionic Beaver":
- If you are running Ubuntu 16.04 "Xenial Xerus":
Set-up R for DataSHIELD analysis
Install the Opal packages
Open an R Session (whether in terminal, RGui or RStudio), then run:
install.packages(c('fields', 'metafor', 'ggplot2', 'gridExtra', 'data.table'))
Install the DataSHIELD client packages
install.packages('dsBaseClient', repos=c(getOption('repos'), 'http://cran.obiba.org'), dependencies=TRUE)
Although be aware that this will place the DataSHIELD packages wherever your R libraries are saved. This may be unhelpful for development- in which case you may have to relocate your files.
DataSHIELD R Package Manuals
See: Current release
Keeping up to date
Please see our keeping up to date wiki page in the user/analyst support section.
You are now fully set up. To start using the DataSHIELD test environment, you can try our Tutorial for DataSHIELD users. The tutorial teaches you the basics of DataSHIELD including how to:
- run commands to:
- generate descriptive statistics
- subset tables and vectors
- fit some regression models
Further instructions are available for the advanced users of the DataSHIELD test environment:
- Follow the instructions in the Opal management tutorial to learn how to upload your own data.
- DataSHIELD R package manuals will be available soon on the release notes page.
- Install non-CRAN R packages to the training Opal servers (coming soon)