DataSHIELD Training Part 3: Assign functions and tables
Introduction
This is the third in a 6-part DataSHIELD tutorial series.
The other parts in this DataSHIELD tutorial series are:
5: Subsetting
6: Modelling
Quick reminder for logging in:
Descriptive statistics: assigning variables
So far all the functions in the tutorial have returned something to the screen. Some functions (assign functions) create new objects in the server-side R session that are required for analysis but do not return an anything to the client screen. For example, in analysis the log values of a variable may be required.
- By default the function
ds.log
computes the natural logarithm. It is possible to compute a different logarithm by setting the argumentbase
to a different value. There is no output to screen:
- In the above example the name of the new object was not specified. By default the name of the new variable is set to the input vector followed by the suffix '_log' (i.e. '
LAB_HDL_log'
)
- It is possible to customise the name of the new object by using the
newobj
argument:
- The new object is not attached to assigned variables data frame (default name "
D
"). We can check the size of the new LAB_HDL_log vector we generated above; the command should return the same figure as the number of rows in the data frame 'D'.
- Using
ds.assign
we subtract the pooled mean calculated earlier from LAB_HDL (mean centring) and assign the output to a new variable calledLAB_HDL.c
. The function returns no output to the client screen, the newly created variable is stored server-side.
Further DataSHIELD functions can now be run on this new mean-centred variable
LAB_HDL.c
. The example below calculates the mean of the new variable
LAB_HDL.c
which should be approximately 0.
Contingency tables
The function
ds.table
creates contingency tables of a categorical variables. The default is set to run on pooled data from all studies, to obtain an output of each study set the argument
type='split'
.
- The example below calculates a one-dimensional table for the variable
GENDER
. The function returns the counts and the column and row percent per category, as well as information about the validity of the variable in each study dataset:
The function ds.table also creates two-dimensional contingency tables of a categorical variable. The example below constructs a two-dimensional table comprising cross-tabulation of the variables
DIS_DIAB
(diabetes status) and
GENDER
.
The function can additionally compute a chi-squared test for homogeneity on (nc-1)*(nr-1) degrees of freedom (where nc is the number of columns and nr is the number of rows):
Below code omits the first section of output which is an exact duplicate of above, only chisquare reports shown:
Conclusion
The other parts in this DataSHIELD tutorial series are:
5: Subsetting
6: Modelling