The major focuses of the v6.0 release of DataSHIELD is the addition of new analytical functions and the integration with DataSHIELD Interface (DSI).
Changes from DataSHIELD v5.1 to v6.0
DataSHIELD Interface (DSI)
DataSHIELD’s dsBaseClient package now uses DataSHIELD Interface (DSI) to communicate with the Opal Server, this replaces the legacy opal R package. This will be a breaking change for code written to use DataSHIELD v4 and v5. The main impact on end users of DataSHIELD it the new technique for logging in to and out of the server, for example,
The motivation for this change is to give DataSHIELD the ability to connect to other types of server in the future. More information about DSI can be found on it’s GitHub page at https://github.com/datashield/DSI
New Analytical Functions
The functions ds.completeCases, ds.glmerSLMA, ds.lmerSLMA, ds.rep, ds.sample and ds.table have been added to the suite of analytical functions provided by DataSHIELD.
ds.completeCases: constructs a modified data frame, matrix or vector, contains no missing values
ds.glmerSLMA: fits a Generalized Linear Mixed-Effects Model (GLME) on data from one or multiple sources with pooling via SLMA
ds.lmerSLMA: fits a Linear Mixed-Effects Model (lme) - can include both fixed and random-effects - on data from one or multiple sources with pooling via SLMA (Study-Level Meta-Analysis)
ds.rep: creates a repetitive sequence by repeating the specified scalar number, vector or list in each data source
ds.sample: draws a pseudorandom sample from a vector, dataframe or matrix on the serverside
ds.table: creates 1-dimensional, 2-dimensional and 3-dimensional tables using the table function in native R
The functions ds.dim, ds.length, ds.colnames, ds.ls and ds.levels have been reimplemented not to use the server-side aliases dim, length, colnames, ls and levels (which have now been removed), but now dedicated DataSHIELD server-side functions dimDS, lengthDS, colnamesDS, lsDS and levelsDS. These changes should not affect the behaviour of the functions, they merely reduce the reliance on non-DataSHIELD functions internally on the server and therefore make it more secure and reliable.
The functions ds.cbind and ds.dataFrame have been modified to remove any “DATAFRAME.NAME$“ strings from the column names of the assigned data frames. In addition, the new version of the ds.cbind function generates data frames instead of matrices. We have also fixed a bug related to this issue, on how the two functions were defining the column names in the assigned dataframes when the order of the input components is different in different studies.
An additional disclosure control was added to the ds.cov and ds.cor functions. The disclosure control checks that the number of the input variables is lower than a pre-specified proportion of the individual-level records. To specify the maximum allowed proportion we have used the same filter as the one used in the ds.glm function which checks if the regression model is not oversaturated (you can find more details here). The used filter is set by default to 0.33 which means for example that for a dataframe of 100 rows (i.e. individual-level records) only the variance-covariance or the correlation matrix of up to 33 variables can be returned.
There are a number of functions in DataSHIELD v6.0 which should be regarded as deprecated - i.e. they are still there, but we strongly recommend you stop using them as they will be removed in v6.1. The functions which are deprecated are shown below, along with their replacements which should be used as soon as is practicable.
ds.setDefaultOpal, and should be replaced by datashield.connections_defaults
ds.listOpals, and should be replaced by datashield.connections
ds.table1DS, and should be replaced by ds.table
ds.table2DS, and should be replaced by ds.table
ds.meanByClass, and should be replaced by ds.meanSdGp
ds.recodeLevels, and should be replaced by ds.recodeValues
ds.subset, and should be replaced by ds.dataFrameSubset
ds.subsetByClass, and should be replaced by ds.dataFrameSubset
ds.vectorCalc, and should be replaced by ds.make
It should be noted that use of [ and ] should be avoided when performing analysis, specially in conjunction with ds.dataFrameSubset.
There are a number of server-side aliases in DataSHIELD v6.0 which should be regarded as deprecated, so should not be used as they will be removed in v6.1. The aliases which are deprecated are:
is.character (aggregate alias)
is.factor (aggregate alias)
is.list (aggregate alias)
is.null (aggregate alias)
is.numeric (aggregate alias)
NROW (aggregate alias)
t.test (aggregate alias)
as.character (assign alias)
as.null (assign alias)
as.numeric (assign alias)
attach (assign alias)
complete.cases (assign alias)
rep (assign alias)
unlist (assign alias)
In addition to the depreciated function it should be noted that it is planned to rename ds.meanSdGp to ds.meanSDByClass in DataSHIELD v6.1.
The documentation of all DataSHIELD functions has been updated. This new documentation has the same format in all the functions and examples with the logging in according to version 6.0, the usage of the function, and the logging out from the server.
We have continued to develop our continuous integration (CI), and how have 6310 tests which are run every day and on every proposed code change.
How to upgrade
Update DataSHIELD server-side package
If you have a suitable version of Opal server, and you would like to upgrade the DataSHIELD server package (dsBase). This can be done via the Opal Web Portal. If you go to the “DataSHIELD” page within the “Administration” section of the Opal Web Portal, the old “dsBase” can be removed, then using the “+Add Package” button the new version of “dsBase” can be installed. Select “Install all DataSHIELD packages” then press the “Install” button.
Update DataSHIELD client-side package
If you have installed the DataSHIELD client package (dsBaseClient) using the function install.packages and specifying the Obiba repository, then you can update the client package as follows:
If you do not have the “DSI” and “DSOpal” packages installed these packages can be installed as follows:
as installing ‘DSOpal’ will cause the installation of 'DSI'.
DataSHIELD v6.0 is supported on R3.5, R3.6 and R 4.0, and would be expected to work with intermediate versions. At present the DataSHIELD client-side package is known to work on Ubuntu 16.04, Ubuntu 18.04 and Windows 10. DataSHIELD server-side package is known to work when deployed to Opal 2.16.0 running on Ubuntu 16.04.