Changing variable class

Occasionally the class of variable defined in the /wiki/spaces/DSDEV/pages/12943489 may not be the format required for the DataSHIELD function. The examples below give a summary of potential function errors when the input variable is of the wrong class.

 Click here to read more....

Integer

If a variable is defined as an integer in valueType in the Variables tab and category levels are named in the Categories tab, when the table is read into R it will automatically assigned as a factor. Some DataSHIELD functions will not accept a factor, it will be necessary to coerce it to an appropriate class after the data has been read into R. More information on this can be found in Changing variable class for use in a DataSHIELD function.

Numeric

If a numeric variable contains anything other than digits, for example if it contains:

  • decimals
  • negative numbers
  • absolutely nothing at all (in csv files this means 2 consecutive commas)
  • NAs (commonly used to denote missing values)

it will be converted into a character or logical variable when imported into R from Opal. If the DataSHIELD function requires a numeric variable, you will need to coerce the variable using the ds.asNumeric function. More information on this can be found in Changing variable class for use in a DataSHIELD function.

In the example below, the variable named cens - a censoring indicator for a survival model (1=died, 0=censored) - has been read into R. If you declare these as the names of the two levels of the variable in the categories tab of the /wiki/spaces/DSDEV/pages/12943489, it will then automatically be read in as a factor.

But if you want to use this variable as the outcome in a generalized linear model to analyse survival (e.g. a piecewise exponential regression model) it cannot be used as a factor - it has to be a numeric. It is possible to coerce variables into the class required for analysis. This is illustrated in the following example that coerces factor variables into numeric variables.

#What is the class of the variable cens after it has first been imported into the dataframe called EM?
> ds.class("EM$cens") 


$study1
[1] "factor"

$study2
[1] "factor"

#Create a new variable called EVENT that is of class numeric
> ds.asNumeric("EM$cens","EVENT") 
 
#Check that EVENT is of class numeric in both studies
> ds.class("EVENT") 

$study1
[1] "numeric"

$study2
[1] "numeric"

DataSHIELD Wiki by DataSHIELD is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Based on a work at http://www.datashield.ac.uk/wiki