The client side functions
Introduction to server side functions
Our client tutorial package will contains four functions, one for each of the four function on the server side package. To make it obvious what server side function a client function calls the convention is to match client names with those of their counterpart with a slight difference that distinguishes them.
ds.mean
is that client that callsmeanDS
ds.replaceNA
is that client that callsreplaceNaDS
ds.log
is that client that callslog
ds.length
is that client that callslength
Remember, as already explained, that log
and length
do not have the suffix DS
because these two server side functions are called directly from the R package base
.
We need some internal functions used to carry out the same tasks in all client side functions. Follow the below links and copy over the internal functions to the R
directory of your project.
findLoginObjects
isDefined
isAssigned
checkClass
extract
getPooledMean
We also need some login data; follow the link below and copy over the data object to the data
directory of your project. To create your own login table see this guide.
logindata.rda
Just as for the server side function and actually for any programming it is paramount to observe good coding practice. So in the below box we re-iterate the same advices.
Good practice in programming
As with any programming it is highly important to stick to best practices:
- Avoid dense/compact code for improved legibility: put space between terms.
# write result <- mean(xvect, na.rm=TRUE) # rather than result<-mean(xvect,na.rm=TRUE)
- Use brackets with 'if' and loop statements; this is again better for legibility particularly when it comes to debugging as we can highlight opening and closing brackets to check specific section more easily.
# write if(mystuff == 1){ ........... }else{ ........... } # rather than the below lines or anything not completely obvious if(mystuff == 1) .......... else ............
- Comment each line of code unless the line is obvious to even a beginner; not writing comments can prove costly when it comes to debugging after a long period. However avoid writing to much comments to a point where it becomes difficult to 'see' the lines that are executed. * Comments should be in small case* except in the rare situations where you really need to 'shout out' some important information. Using too much capital letters in comments leads to the same problem as having too much comments.
- Separate blocks of lines by an empty line, again for improved legibility.
- Indent properly after an 'if' or a loop statement. This is particularly important when using nested 'if' statements or nested loops and will prove valuable when tracking down errors.
- Use internal functions to avoid extremely long scripts which can prove difficult to debug - see
subsetByClassDS
which makes use of many internal functions.
For the remainder of this page we are going to show one function and explain the main sections of the script in detail. Nearly all client functions have a similar structure; differences will be highlighted and explained. The sections we are going to explain are signposted so you can follow the explanations more easily in the last chapter of this page.
The function 'ds.mean'
In your Rstudio go the tab File
in the top menu, select New File
and then choose R Script
. This will open up a new R script file; copy the below code and paste the code below or write the lines in the file. Then Go to the tab File
on the Rstudio top menu bar and choose Save As
, browser to the R
folder in the project directory and save the file under the name ds.mean
. Always use the same name as the function for the script file and the file extension should always be .R
.
#-------------------------------------- HEADER --------------------------------------------# #' @title Computes the statistical mean of a given vector #' @description This function is similar to the R function \code{mean}. #' @details It is a wrapper for the server side function. #' @param x a character, the name of a numerical vector #' @param type a character which represents the type of analysis to carry out. #' If \code{type} is set to 'combine', a global mean is calculated #' if \code{type} is set to 'split', the mean is calculated separately for each study. #' @param checks a boolean, if TRUE (default) checks that verify elements on the server side #' such checks lengthen the run-time so the default is FALSE and one can switch these checks #' on (set to TRUE) when faced with some error(s). #' @param datasources a list of opal object(s) obtained after login in to opal servers; #' these objects hold also the data assign to R, as \code{data frame}, from opal datasources. #' @return a numeric #' @author Gaye A., Isaeva I. #' @seealso \code{ds.quantileMean} to compute quantiles. #' @seealso \code{ds.summary} to generate the summary of a variable. #' @export #' @examples { #' #' # load that contains the login details #' data(logindata) #' #' # login and assign specific variable(s) #' myvar <- list('LAB_TSC') #' opals <- datashield.login(logins=logindata,assign=TRUE,variables=myvar) #' #' # Example 1: compute the pooled statistical mean of the variable 'LAB_TSC' - default behaviour #' ds.mean(x='D$LAB_TSC') #' #' # Example 2: compute the statistical mean of each study separately #' ds.mean(x='D$LAB_TSC', type='split') #' #' # clear the Datashield R sessions and logout #' datashield.logout(opals) #' #' } ds.mean = function(x=NULL, type='combine', checks=FALSE, datasources=NULL){ #-------------------------------------- BASIC CHECKS ----------------------------------------------# # if no opal login details are provided look for 'opal' objects in the environment if(is.null(datasources)){ datasources <- findLoginObjects() } if(is.null(x)){ stop("Please provide the name of the input vector!", call.=FALSE) } # the input variable might be given as column table (i.e. D$x) # or just as a vector not attached to a table (i.e. x) # we have to make sure the function deals with each case xnames <- extract(x) varname <- xnames$elements obj2lookfor <- xnames$holders #--------------------------------------------------------------------------------------------------# #-------------------------------------- SERVER SIDE CHECKS ----------------------------------------# if(checks){ # check if the input object(s) is(are) defined in all the studies if(is.na(obj2lookfor)){ defined <- isDefined(datasources, varname) }else{ defined <- isDefined(datasources, obj2lookfor) } # call the internal function that checks the input object is of the same class in all studies. typ <- checkClass(datasources, x) # the input object must be a numeric or an integer vector if(typ != 'integer' & typ != 'numeric'){ stop("The input object must be an integer or a numeric vector.", call.=FALSE) } } #----------------------------------------------------------------------------------------------------# # number of studies num.sources <- length(datasources) #-------------------------------------- CALLING SERVER SIDE FUNCTION --------------------------------# cally <- paste0("meanDS(", x, ")") mean.local <- datashield.aggregate(datasources, as.symbol(cally)) cally <- paste0("NROW(", x, ")") length.local <- datashield.aggregate(datasources, cally) # get the number of entries with missing values cally <- paste0("numNaDS(", x, ")") numNA.local <- datashield.aggregate(datasources, cally) #-----------------------------------------------------------------------------------------------------# #-------------------------------------- FINALIZING RESULTS -------------------------------------------# if (type=='split') { return(mean.local) } else if (type=='combine') { length.total = 0 sum.weighted = 0 mean.global = NA for (i in 1:num.sources){ if ((!is.null(length.local[[i]])) & (length.local[[i]]!=0)) { completeLength <- length.local[[i]]-numNA.local[[i]] length.total = length.total+completeLength sum.weighted = sum.weighted+completeLength*mean.local[[i]] } } mean.global = sum.weighted/length.total return(list("Global mean"=mean.global)) } else{ stop('Function argument "type" has to be either "combine" or "split"') } }
Main components of a client side function
The header of the script
The header of the script (lines starting with #'
) is used by Roxygen to produce the documentation files. Except for @examples
all the entries on the header have been already explained at the bottom of this page.
Importance of documentation for users
Writing a comprehensive documentation is paramount to help users. The header of the function is used to produce the R documentation. This documentation is far more important for client side functions than it is for server side functions simply because server side functions are called by DataSHIELD developers who are familiar with the code whilst client functions are ran by users with a wide range of R experience (from beginners to experts).
One of the most important part of the header is the @examples
section. Most need to actually see a clear example on how to use the function; many actually will just copy the example and change the arguments to suit their need. It is hence critical to provide clear and working examples. The function will not pass standard R checks if the example is not working.
The code in the script
- Checks
- Checking the arguments
These are no different from standard R checks to ensure the user has provided the required input object. Make sure the function throws a clear message and stops if the user has not provided the right argument or not in the right type (e.g. if an argument is meant to be a character other r type should not be allowed). If the user does not specify a login object through the argumentdatasources
, the internal functionfindLoginObjects
looks for some login object in the working environment and if more than one object is found asks the user to chose the one, if no login object is found the attempted analysis is aborted. - Checking server side objects
In DataSHIELD the user cannot see data stored on the server side. These checks are to ensure the object to process (1) are actually available on the server side and (2) they are of the same class (type) in all the servers of the collaborating studies (e.g. if a variable is meant to be a numeric it must be a numeric in all the servers. Depending on the function other checks might be required in this section.About checking server side object
Previous functions have systematically ran these checks. It was however flagged that for some functions the checks were time consuming (just minutes though!). So a new argument,
checks
, will now be introduced in all new functions (earlier functions will be updated as we go). This argument allows users to run the checks or not. By default the checks are switched off (checks=FALSE
). It is important, when faced with errors and/or weird results, to switch on the checks as a first step in trying to understand what is going on.
- Checking the arguments
- Calling server side functions
This is where we actually run the analysis. A command is sent to the servers where the actual data are sitting. The command is processed and some summary statistics returned if the function called is an 'aggregate' function or the results stored on the server if the function called is an 'assign' function.Calling a server side function
The strategy is to construct a call object and passed it on to the opal functions
datashield.aggregate
ordatashield.assign
. There is no ready to use formula for the construction of the call object. This is actually the most challenging part for a developer; for certain functions the call might be complex and long. With experience you will sense when to construct a call as a character expression to be coerced byas.symbol
or directly construct an object of type 'call'. See below the syntax fordatashield.aggregate
anddatashield.assign
.datashield.aggregate(inputObject, aggreagteFunctionName(arguments)) datashield.assign(inputObject, nameOfOutput, assignFunctionName(arguments))
- Finalizing the results
- For aggregate functions
It is important to ensure the results are presented in a non confusing manner to the user. Many client side function that call an aggregate function offer the option to either return pooled results (overall results across all the servers) or return separately one result for each server. By default and in most function the default is to return pooled results (type='combine'
). Make sure, like in the last of the above illustration, an error message is thrown if the user attempt to specify anything else than 'combine' or 'split' for the argumenttype
. - For assign functions
For clients that call an 'assign' function there is no output to finalize because nothing is return to the user. There is however a final message that informs the user when the sought object has not been generated on the server side. Have a look at the other functions to see those lines of code.
- For aggregate functions
The other three function
In the above section we have used ds.mean
to illustrate the development of a client side function that calls an 'aggregate' server side function. The structure is not very different for clients that call an 'assign' function. Follow the below link to copy over the scripts of the other three functions to the R
directory of your project.
ds.log
ds.length
ds.replaceNA
DataSHIELD Wiki by DataSHIELD is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Based on a work at http://www.datashield.ac.uk/wiki