DataSHIELD Training Part 3: Assign functions and tables


Introduction

This is the third in a 6-part DataSHIELD tutorial series. 

The other parts in this DataSHIELD tutorial series are:

Quick reminder for logging in:


Descriptive statistics: assigning variables

So far all the functions in the tutorial have returned something to the screen. Some functions (assign functions) create new objects in the server-side R session that are required for analysis but do not return an anything to the client screen. For example, in analysis the log values of a variable may be required.

  • By default the function ds.log computes the natural logarithm. It is possible to compute a different logarithm by setting the argument base to a different value. There is no output to screen:
  • In the above example the name of the new object was not specified. By default the name of the new variable is set to the input vector followed by the suffix '_log' (i.e. 'LAB_HDL_log')
  • It is possible to customise the name of the new object by using the newobj argument:
  • The new object is not attached to assigned variables data frame (default name "D"). We can check the size of the new LAB_HDL_log vector we generated above; the command should return the same figure as the number of rows in the data frame 'D'.
  • Using ds.assign we subtract the pooled mean calculated earlier from LAB_HDL (mean centring) and assign the output to a new variable called LAB_HDL.c. The function returns no output to the client screen, the newly created variable is stored server-side.

Further DataSHIELD functions can now be run on this new mean-centred variable LAB_HDL.c. The example below calculates the mean of the new variable LAB_HDL.c which should be approximately 0.

Contingency tables

The function ds.table creates contingency tables of a categorical variables. The default is set to run on pooled data from all studies, to obtain an output of each study set the argument type='split' .

  • The example below calculates a one-dimensional table for the variable GENDER . The function returns the counts and the column and row percent per category, as well as information about the validity of the variable in each study dataset:

The function ds.table also creates two-dimensional contingency tables of a categorical variable. The example below constructs a two-dimensional table comprising cross-tabulation of the variables DIS_DIAB (diabetes status) and GENDER .

The function can additionally compute a chi-squared test for homogeneity on (nc-1)*(nr-1) degrees of freedom (where nc is the number of columns and nr is the number of rows):

Below code omits the first section of output which is an exact duplicate of above, only chisquare reports shown:

Conclusion

The other parts in this DataSHIELD tutorial series are: