# DataSHIELD Training Part 2: Basic statistics and data manipulations

# Introduction

This is the second in a 6-part DataSHIELD tutorial series.

The other parts in this DataSHIELD tutorial series are:

5: Subsetting

6: Modelling

## Quick reminder for logging in:

## Basic statistics and data manipulations

### Descriptive statistics: variable dimensions and class

It is possible to get some descriptive or exploratory statistics about the assigned variables held in the server-side R session such as number of participants at each data provider, number of participants across all data providers and number of variables. Identifying parameters of the data will facilitate your analysis.

The output of the command is shown below. It shows that in study 1 there are 2163 individuals with 11 variables and in study 2 there are 3088 individuals with 11 variables, and that in both studies together there are in total 5251 individuals with 11 variables:

- Up to here, the dimensions of the assigned data frame

have been found using the**D**

command in which**ds.dim**

is the default argument.**type='both'** - Now use the

argument in the**type='combine'**

function to identify the number of individuals (5251) and variables (11) pooled across all studies:**ds.dim**

- To check the variables in each study are identical (as is required for pooled data analysis), use the

function on the assigned data frame**ds.colnames**

:**D**

- Use the

function to identify the class (type) of a variable - for example if it is an integer, character, factor etc. This will determine what analysis you can run using this variable class. The example below defines the class of the variable**ds.class**held in the assigned data frame`LAB_HDL`

, denoted by the argument`D`

.**x='D$LAB_HDL'**

### Descriptive statistics: quantiles and mean

As** LAB_HDL** is a numeric variable the distribution of the data can be explored.

- The function

returns the quantiles and the statistical mean.**ds.quantileMean**

- To get the statistical mean alone, use the function

use the argument**ds.mean**

to request split results:**type**

# Conclusion

The other parts in this DataSHIELD tutorial series are:

5: Subsetting

6: Modelling