Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Horizontal DataSHIELD allows the fitting of generalised linear models (GLM). In the GLM function Generalised Linear Modelling, the outcome can be modelled as continuous, or categorical (binomial or discrete). The error to use in the model can follow a range of distribution distributions including gaussianGaussian, binomial, gamma and Poisson. In this section only one example will be shown, for more examples please see the manual help page for the function. Anchorglmglm

...

This section will make more sense with an understanding of Generalised Linear Modelling theory and techniques. More information can always be found online, for example this Colorado University publication.


Basic 1-covariate GLM

We want to examine the relationship between BMI (a continuous variable) and Triglycerides (another continuous variable). Because the response variable here, BMI, is continuous, this indicates that there should be a Gaussian underlying distribution.

A correlation command will establish how closely linked these two variables might be:

Code Block
ds.cor(x='D$PM_BMI_CONTINUOUS', y='D$LAB_TRIG')

Let's visualise with a scatterplot:

Code Block
ds.scatterPlot(x='D$PM_BMI_CONTINUOUS', y='D$LAB_TRIG')

Regress Triglycerides with BMI using the Individual Partition Data (IPD) approach:

The method for this (ds.glm) is a "pooled analysis"- equivalent to placing the individual-level data from all sources in one warehouse.

Important to note that the link function is by default the canonical link function for each family. So binomial <-> logistic link, poisson <-> log link, gaussian <-> identity link.

Code Block
ds.glm(formula = "D$LAB_TRIG~D$PM_BMI_CONTINUOUS", family="gaussian", datasources = connections)

Regress Triglycerides with BMI using the Study-Level Meta-Analysis (SLMA) approach:

ds.glmSLMA(formula = "D$LAB_TRIG~D$PM_BMI_CONTINUOUS", family="gaussian", newobj = "workshop.obj", datasources = connections)

Multi-covariate GLM

  • The function ds.glm is used to analyse the outcome variable DIS_DIAB (diabetes status) and the covariates PM_BMI_CONTINUOUS (continuous BMI), LAB_HDL (HDL cholesterol) and GENDER (gender), with an interaction between the latter two. In R this model is represented as:

...