DataSHIELD Training Part 2: Basic statistics and data manipulations


Introduction

This is the second in a 6-part DataSHIELD tutorial series.

The other parts in this DataSHIELD tutorial series are:

Quick reminder for logging in:


Basic statistics and data manipulations

Descriptive statistics: variable dimensions and class

It is possible to get some descriptive or exploratory statistics about the assigned variables held in the server-side R session such as number of participants at each data provider, number of participants across all data providers and number of variables. Identifying parameters of the data will facilitate your analysis.


The output of the command is shown below. It shows that in study 1 there are 2163 individuals with 11 variables and in study 2 there are 3088 individuals with 11 variables, and that in both studies together there are in total 5251 individuals with 11 variables:


  • Up to here, the dimensions of the assigned data frame D have been found using the ds.dim command in which type='both' is the default argument.
  • Now use the type='combine' argument in the ds.dim function to identify the number of individuals (5251) and variables (11) pooled across all studies:


  • To check the variables in each study are identical (as is required for pooled data analysis), use the ds.colnames function on the assigned data frame D:
  • Use the ds.class function to identify the class (type) of a variable - for example if it is an integer, character, factor etc. This will determine what analysis you can run using this variable class. The example below defines the class of the variable LAB_HDL held in the assigned data frame D, denoted by the argument x='D$LAB_HDL'.

Descriptive statistics: quantiles and mean

As LAB_HDL is a numeric variable the distribution of the data can be explored.

  • The function ds.quantileMean returns the quantiles and the statistical mean.


  • To get the statistical mean alone, use the function ds.mean use the argument type to request split results:

Conclusion

The other parts in this DataSHIELD tutorial series are: