The client side functions

Introduction to server side functions

Our client tutorial package will contains four functions, one for each of the four function on the server side package. To make it obvious what server side function a client function calls the convention is to match client names with those of their counterpart with a slight difference that distinguishes them.

  • ds.mean is that client that calls meanDS
  • ds.replaceNA is that client that calls replaceNaDS
  • ds.log is that client that calls log
  • ds.length is that client that calls length

Remember, as already explained, that log and length do not have the suffix DS because these two server side functions are called directly from the R package base.
We need some internal functions used to carry out the same tasks in all client side functions. Follow the below links and copy over the internal functions to the R directory of your project.
findLoginObjects
isDefined
isAssigned
checkClass
extract
getPooledMean

We also need some login data; follow the link below and copy over the data object to the data directory of your project. To create your own login table see this guide.
logindata.rda

Just as for the server side function and actually for any programming it is paramount to observe good coding practice. So in the below box we re-iterate the same advices.

Good practice in programming

As with any programming it is highly important to stick to best practices:

  • Avoid dense/compact code for improved legibility: put space between terms.
    # write
    result <- mean(xvect, na.rm=TRUE)
    
    # rather than
    result<-mean(xvect,na.rm=TRUE)
    
  • Use brackets with 'if' and loop statements; this is again better for legibility particularly when it comes to debugging as we can highlight opening and closing brackets to check specific section more easily.
    # write
    if(mystuff == 1){
      ...........
    }else{
      ...........
    }
    
    # rather than the below lines or anything not completely obvious
    if(mystuff == 1)
    ..........
    else
    ............
    
  • Comment each line of code unless the line is obvious to even a beginner; not writing comments can prove costly when it comes to debugging after a long period. However avoid writing to much comments to a point where it becomes difficult to 'see' the lines that are executed. * Comments should be in small case* except in the rare situations where you really need to 'shout out' some important information. Using too much capital letters in comments leads to the same problem as having too much comments.
  • Separate blocks of lines by an empty line, again for improved legibility.
  • Indent properly after an 'if' or a loop statement. This is particularly important when using nested 'if' statements or nested loops and will prove valuable when tracking down errors.
  • Use internal functions to avoid extremely long scripts which can prove difficult to debug - see subsetByClassDS which makes use of many internal functions.

For the remainder of this page we are going to show one function and explain the main sections of the script in detail. Nearly all client functions have a similar structure; differences will be highlighted and explained. The sections we are going to explain are signposted so you can follow the explanations more easily in the last chapter of this page.

The function 'ds.mean'

In your Rstudio go the tab File in the top menu, select New File and then choose R Script. This will open up a new R script file; copy the below code and paste the code below or write the lines in the file. Then Go to the tab File on the Rstudio top menu bar and choose Save As, browser to the R folder in the project directory and save the file under the name ds.mean. Always use the same name as the function for the script file and the file extension should always be .R.

#-------------------------------------- HEADER --------------------------------------------#
#' @title Computes the statistical mean of a given vector
#' @description This function is similar to the R function \code{mean}.
#' @details It is a wrapper for the server side function.
#' @param x a character, the name of a numerical vector
#' @param type a character which represents the type of analysis to carry out.
#' If \code{type} is set to 'combine', a global mean is calculated
#' if \code{type} is set to 'split', the mean is calculated separately for each study.
#' @param checks a boolean, if TRUE (default) checks that verify elements on the server side
#' such checks lengthen the run-time so the default is FALSE and one can switch these checks
#' on (set to TRUE) when faced with some error(s).
#' @param datasources a list of opal object(s) obtained after login in to opal servers;
#' these objects hold also the data assign to R, as \code{data frame}, from opal datasources.
#' @return a numeric
#' @author Gaye A., Isaeva I.
#' @seealso \code{ds.quantileMean} to compute quantiles.
#' @seealso \code{ds.summary} to generate the summary of a variable.
#' @export
#' @examples {
#'
#'   # load that contains the login details
#'   data(logindata)
#'
#'   # login and assign specific variable(s)
#'   myvar <- list('LAB_TSC')
#'   opals <- datashield.login(logins=logindata,assign=TRUE,variables=myvar)
#'
#'   # Example 1: compute the pooled statistical mean of the variable 'LAB_TSC' - default behaviour
#'   ds.mean(x='D$LAB_TSC')
#'
#'   # Example 2: compute the statistical mean of each study separately
#'   ds.mean(x='D$LAB_TSC', type='split')
#'
#'   # clear the Datashield R sessions and logout
#'   datashield.logout(opals)
#'
#' }

ds.mean = function(x=NULL, type='combine', checks=FALSE, datasources=NULL){

#-------------------------------------- BASIC CHECKS ----------------------------------------------#
  # if no opal login details are provided look for 'opal' objects in the environment
  if(is.null(datasources)){
    datasources <- findLoginObjects()
  }

  if(is.null(x)){
    stop("Please provide the name of the input vector!", call.=FALSE)
  }

  # the input variable might be given as column table (i.e. D$x)
  # or just as a vector not attached to a table (i.e. x)
  # we have to make sure the function deals with each case
  xnames <- extract(x)
  varname <- xnames$elements
  obj2lookfor <- xnames$holders
#--------------------------------------------------------------------------------------------------#
#-------------------------------------- SERVER SIDE CHECKS ----------------------------------------#
  if(checks){
    # check if the input object(s) is(are) defined in all the studies
    if(is.na(obj2lookfor)){
      defined <- isDefined(datasources, varname)
    }else{
      defined <- isDefined(datasources, obj2lookfor)
    }

    # call the internal function that checks the input object is of the same class in all studies.
    typ <- checkClass(datasources, x)

    # the input object must be a numeric or an integer vector
    if(typ != 'integer' & typ != 'numeric'){
      stop("The input object must be an integer or a numeric vector.", call.=FALSE)
    }
  }
#----------------------------------------------------------------------------------------------------#

  # number of studies
  num.sources <- length(datasources)

#-------------------------------------- CALLING SERVER SIDE FUNCTION --------------------------------#

  cally <- paste0("meanDS(", x, ")")
  mean.local <- datashield.aggregate(datasources, as.symbol(cally))

  cally <- paste0("NROW(", x, ")")
  length.local <- datashield.aggregate(datasources, cally)

  # get the number of entries with missing values
  cally <- paste0("numNaDS(", x, ")")
  numNA.local <- datashield.aggregate(datasources, cally)
#-----------------------------------------------------------------------------------------------------#

#-------------------------------------- FINALIZING RESULTS -------------------------------------------#
  if (type=='split') {
    return(mean.local)
  } else if (type=='combine') {
    length.total = 0
    sum.weighted = 0
    mean.global  = NA

    for (i in 1:num.sources){
      if ((!is.null(length.local[[i]])) & (length.local[[i]]!=0)) {
        completeLength <- length.local[[i]]-numNA.local[[i]]
        length.total = length.total+completeLength
        sum.weighted = sum.weighted+completeLength*mean.local[[i]]
      }
    }

    mean.global = sum.weighted/length.total
    return(list("Global mean"=mean.global))

  } else{
    stop('Function argument "type" has to be either "combine" or "split"')
  }

}

Main components of a client side function

The header of the script

The header of the script (lines starting with #') is used by Roxygen to produce the documentation files. Except for @examples all the entries on the header have been already explained at the bottom of this page.

Importance of documentation for users

Writing a comprehensive documentation is paramount to help users. The header of the function is used to produce the R documentation. This documentation is far more important for client side functions than it is for server side functions simply because server side functions are called by DataSHIELD developers who are familiar with the code whilst client functions are ran by users with a wide range of R experience (from beginners to experts).
One of the most important part of the header is the @examples section. Most need to actually see a clear example on how to use the function; many actually will just copy the example and change the arguments to suit their need. It is hence critical to provide clear and working examples. The function will not pass standard R checks if the example is not working.

The code in the script

  • Checks
    • Checking the arguments
      These are no different from standard R checks to ensure the user has provided the required input object. Make sure the function throws a clear message and stops if the user has not provided the right argument or not in the right type (e.g. if an argument is meant to be a character other r type should not be allowed). If the user does not specify a login object through the argument datasources, the internal function findLoginObjects looks for some login object in the working environment and if more than one object is found asks the user to chose the one, if no login object is found the attempted analysis is aborted.
    • Checking server side objects
      In DataSHIELD the user cannot see data stored on the server side. These checks are to ensure the object to process (1) are actually available on the server side and (2) they are of the same class (type) in all the servers of the collaborating studies (e.g. if a variable is meant to be a numeric it must be a numeric in all the servers. Depending on the function other checks might be required in this section.

      About checking server side object

      Previous functions have systematically ran these checks. It was however flagged that for some functions the checks were time consuming (just minutes though!). So a new argument, checks, will now be introduced in all new functions (earlier functions will be updated as we go). This argument allows users to run the checks or not. By default the checks are switched off (checks=FALSE). It is important, when faced with errors and/or weird results, to switch on the checks as a first step in trying to understand what is going on.

  • Calling server side functions
    This is where we actually run the analysis. A command is sent to the servers where the actual data are sitting. The command is processed and some summary statistics returned if the function called is an 'aggregate' function or the results stored on the server if the function called is an 'assign' function.

    Calling a server side function

    The strategy is to construct a call object and passed it on to the opal functions datashield.aggregate or datashield.assign. There is no ready to use formula for the construction of the call object. This is actually the most challenging part for a developer; for certain functions the call might be complex and long. With experience you will sense when to construct a call as a character expression to be coerced by as.symbol or directly construct an object of type 'call'. See below the syntax for datashield.aggregate and datashield.assign.

    datashield.aggregate(inputObject, aggreagteFunctionName(arguments))
    datashield.assign(inputObject, nameOfOutput, assignFunctionName(arguments))
    
  • Finalizing the results
    • For aggregate functions
      It is important to ensure the results are presented in a non confusing manner to the user. Many client side function that call an aggregate function offer the option to either return pooled results (overall results across all the servers) or return separately one result for each server. By default and in most function the default is to return pooled results (type='combine'). Make sure, like in the last of the above illustration, an error message is thrown if the user attempt to specify anything else than 'combine' or 'split' for the argument type.
    • For assign functions
      For clients that call an 'assign' function there is no output to finalize because nothing is return to the user. There is however a final message that informs the user when the sought object has not been generated on the server side. Have a look at the other functions to see those lines of code.

The other three function

In the above section we have used ds.mean to illustrate the development of a client side function that calls an 'aggregate' server side function. The structure is not very different for clients that call an 'assign' function. Follow the below link to copy over the scripts of the other three functions to the R directory of your project.
ds.log
ds.length
ds.replaceNA

DataSHIELD Wiki by DataSHIELD is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Based on a work at http://www.datashield.ac.uk/wiki