How to modify the DataSHIELD packages

The logic

The environment that you have set up places a number of virtual machines on your computer.

The virtual machines play the role of the servers of different cohorts; they have Opal and R installed and each contain some simulated test data.

Your machine, the host, plays the role of the client; that is, the computer that is conducting the analysis.

The DataSHIELD code is also split into client and server packages. The DataSHIELD client packages are installed in R on your machine (so that it can act as a client), and the DataSHIELD server packages are installed in R on each of the virtual machines.

  • Modifying the client packages therefore involves downloading the DataSHIELD source code to your machine, making your changes, and the installing the changed client packages on your machine.
  • Modifying the server packages also involves downloading the DataSHIELD source code to your machine and making your changes. However the changed server packages must be installed on each of your virtual machines.

Download the DataSHIELD code

DataSHIELD code is stored publicly on github, a copy can be downloaded using git.

Install git

sudo apt-get install git

Now read the Using Git page

Clone the DataSHIELD repositories

DataSHIELD code is broken down into a number of client and server packages:

Client side packages
Server side packages
  • dsbase
  • dsmodelling
    If you want to download everything, then simply 'clone' all of these repositories. This creates a local copy of the code on your machine.
mkdir dsdev
cd dsdev
git clone https://github.com/datashield/dsbaseclient.git
git clone https://github.com/datashield/dsmodellingclient.git
git clone https://github.com/datashield/dsbase.git
git clone https://github.com/datashield/dsmodelling.git

Installing the client side packages

When we installed the DataSHIELD client packages in order to 'play' with DataSHIELD, we did this by running R and then using the install.packages command, specifically we told R to get the DataSHIELD client packages from the OBiBa website.

Since we now want to modify the DataSHIELD client packages, we will instead want to install our personal, modified, version of the packages from a local directory on our computer.

First, delete the existing packages, if they are installed:

# R
> remove.packages('dsbaseclient')
> remove.packages('dsmodellingclient')

Now install your local version of the DataSHIELD client source code, using the devtools package:

# R
> library(devtools)
> devtools::install('/home/me/ds-dev/dsbaseclient')

You are now able to use your modified client code to run analyses on the virtual machines.

Installing Server Side DataSHIELD packages on the virtual servers

Official Packages

The official DataSHIELD packages can be installed in their current state through the Opal web interface:

Administration > DataSHIELD > Add Package

If you select to install them all, this will install dsbase and dsmodelling.

Public In-development Packages

Any packages (each package is its own git repository) in the DataSHIELD github project can be installed on the virtual servers from within an R instance running on the host (that is to say, from the computer acting as the client or 'analysis computer').

Installing packages this way uses the dsadmin.install_package function, from R running on the client*. This function can be found within the opaladmin package.

For example, one could install a package by specifying a repository from the DataSHIELD github project as follows:

# R
> library('opaladmin')
> dsadmin.install_package(ag.dev.sv)

This would install the package on all the virtual machines.

Private Packages

Installing you own local modified versions of the DataSHIELD server code is a little more involved.

Installing modified versions of the server code

Our aim is to take the DataSHIELD server source code that we have been modifying on our own computer and to install it on each of the virtual machines. As such, this means you will need to repeat this process on each of the virtual machines you are using.

First, we need to get the code from our computer onto the virtual machine. For example, we can place the code in the home directory of 'user' as follows:

# rsync -av /home/me/ds-dev/dsbase user@192.168.56.100:/home/user

Now ssh into the virtual machine in order to install the code:

# ssh user@192.168.56.100

We have ssh'd in as 'user', but the installation must be done as a different user on the system. This is because, if we installed the packages as 'user', Opal's rserver would not be able to access them.

Instead will install them as root, so all users have access to them.

Switch to the root user and use devtools to install from where ever you copied the package source to on the virtual machine:

# sudo su root
# R
> library(devtools)
> devtools::install('/home/user/dsbase')

This will install the packages in the R system library ( /usr/local/lib/R/site-library ). This is fine, and they we be available to use by all users. However, you will notice that you cannot delete them using the Opal web interface. Rather, you will have to delete them from the R system library manually.

DataSHIELD Wiki by DataSHIELD is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Based on a work at http://www.datashield.ac.uk/wiki