Dataset for testing


A specific dataset for testing DataSHIELD functions

Functions written for DataSHIELD must be thoroughly tested. Otherwise, some errors, unwanted behaviours and bugs may not be identified and corrected. A testing dataset  referred as TESTING in the Opal server provides three tables; DATASET1, DATASET2, DATASET3. Each column of these tables represents either a data type used in R, some numerical sets such as N, Z, Q, and R, or some factors. Those fields should provide some specific data to test some expected and valid results as well as to assess the mathematical properties of the functions outcomes.  The granularity of the data set should help identifying some issues with the functions for a specific type of data. The table below defines each field of TESTING.DATASET1,  TESTING.DATASET2, and TESTING.DATASET3.


Field name Description
CHARACTERProvides a list of gods from across the world
LOGICALProvides some Boolean values: TRUE and FALSE 
NA_VALUESSome empty values 
NULL_VALUESNULL values
INTEGERrepresents the Z mathematical set. It contains some integer negative, positive and 0 values
NON_NEGATIVE_INTEGERrepresents the natural numbers and the value 0. 
POSITIVE_INTEGERrepresents the natural numbers. That is n > 0. 

NEGATIVE_INTEGER 

represents a subset of Z, where n < 0. 
NUMERICrepresents the R and Q mathematical sets. It contains some decimal numbers that are negative, positive and 0.
NON_NEGATIVE_NUMERICrepresents some decimal values that are greater than or equal to 0. 
POSITIVE_NUMERICrepresents some decimal values that are greater to 0.
NEGATIVE_NUMERICrepresents some decimal values that are lesser than 0.
FACTOR_CHARACTERrepresents some various marital status, that are repeated several times over the values. It is suitable to test functions that return factors. 
FACTOR_INTEGERrepresents some integer factor values. The factors values are repeated several times over the values. It is suitable to test function that return factors.
IDENTIFIERrepresents some integer numerical values, that are repeated. With the column CATEGORY, some functions that shape the values differently can be tested.
CATEGORYrepresents some categories for some identifiers; the latter is provided by the column IDENTIFIER.  With the latter, some functions that shape the values differently can be tested.
NUMERIC_ONE_CHANGErepresents copy of NUMERIC field, with only one change
INTEGER_ONE_CHANGErepresents copy of INTEGER field, with only one change

TESTING dataset and the DataSHIELD testing framework

The content of the three tables in TESTING have been made available in the DataSHIELD testing framework. Three comma-separated files are provided, one for each table. Each of these files contains the same values as the one available on the server.  For that reason,  some expected values can be computed using the data stored locally and then compared to the values obtained from some DataSHIELD functions. The latter should use the same values stored on the server and the results be accurate to at least 10-6.

Local and remote storage

This table relates the data stored locally and remotely on the virtual machine. Each file is uploaded and converted as a data frame.

TableLocal fileds.test_env environment variable
TESTING.DATASET1tests\testthat\data_files\DATASET1.csv ds.test_env$local.values.1
TESTING.DATASET2tests\testthat\data_files\DATASET2.csvds.test_env$local.values.2
TESTING.DATASET3tests\testthat\data_files\DATASET3.csvds.test_env$local.values.3


Each of the columns correspond to a field of the remote storage. The table below provides a summary with some examples. 

Field name Description

Field name used as argument 

to DataSHIELD function

Local use of

data.frame

CHARACTERList of gods 'D$CHARACTER'

ds.test_env$local.values.1[ ,2]

ds.test_env$local.values.2[ ,2]

ds.test_env$local.values.3[ ,2]

LOGICALTRUE and FALSE 'D$LOGICAL'

ds.test_env$local.values.1[ ,3]

ds.test_env$local.values.2[ ,3]

ds.test_env$local.values.3[ ,3]

NA_VALUESSome empty values 'D$NA_VALUES'

ds.test_env$local.values.1[ ,4]

ds.test_env$local.values.2[ ,4]

ds.test_env$local.values.3[ ,4]

NULL_VALUESNULL values'D$NULL_VALUES'

ds.test_env$local.values.1[ ,5]

ds.test_env$local.values.2[ ,5]

ds.test_env$local.values.3[ ,5]

INTEGERThe Z numerical set. 'D$INTEGER'

ds.test_env$local.values.1[ ,6]

ds.test_env$local.values.2[ ,6]

ds.test_env$local.values.3[ ,6]

NON_NEGATIVE_INTEGER

Natural numbers and the value 0. 'D$NON_NEGATIVE_INTEGER'

ds.test_env$local.values.1[ ,7]

ds.test_env$local.values.2[ ,7]

ds.test_env$local.values.3[ ,7]

POSITIVE_INTEGER

Natural numbers. 'D$POSITIVE_INTEGER'

ds.test_env$local.values.1[ ,8]

ds.test_env$local.values.2[ ,8]

ds.test_env$local.values.3[ ,8]

NEGATIVE_INTEGER 

Subset of Z, where n < 0. 'D$NEGATIVE_INTEGER'

ds.test_env$local.values.1[ ,9]

ds.test_env$local.values.2[ ,9]

ds.test_env$local.values.3[ ,9]

NUMERICR and Q mathematical sets. 'D$NUMERIC'

ds.test_env$local.values.1[ ,10]

ds.test_env$local.values.2[ ,10]

ds.test_env$local.values.3[ ,10]

NON_NEGATIVE_NUMERIC

Subset of R and Q sets, where n > 0'D$NON_NEGATIVE_NUMERIC'

ds.test_env$local.values.1[ ,11]

ds.test_env$local.values.2[ ,11]

ds.test_env$local.values.3[ ,11]

POSITIVE_NUMERIC

Subset of R and Q sets, where n >= 0'D$POSITIVE_NUMERIC'

ds.test_env$local.values.1[ ,12]

ds.test_env$local.values.2[ ,12]

ds.test_env$local.values.3[ ,12]

NEGATIVE_NUMERIC

Subset of R and Q sets, where n <= 0'D$NEGATIVE_NUMERIC'

ds.test_env$local.values.1[ ,13]

ds.test_env$local.values.2[ ,13]

ds.test_env$local.values.3[ ,13]

FACTOR_CHARACTERSubset of R and Q sets, where n <= 0'D$FACTOR_CHARACTER

ds.test_env$local.values.1[ ,14]

ds.test_env$local.values.2[ ,14]

ds.test_env$local.values.3[ ,14]

FACTOR

INTEGER

integer factor values. 'D$FACTOR_INTEGER'

ds.test_env$local.values.1[ ,15]

ds.test_env$local.values.2[ ,15]

ds.test_env$local.values.3[ ,15]

IDENTIFIERsome integer numerical values, that are repeated. With the column CATEGORY, some functions that shape the values differently can be tested.'D$IDENTIFIER'

ds.test_env$local.values.1[ ,16]

ds.test_env$local.values.2[ ,16]

ds.test_env$local.values.3[ ,16]

CATEGORYsome categories for some identifiers; the latter is provided by the column IDENTIFIER.  With the latter, some functions that shape the values differently can be tested.'D$CATEGORY'

ds.test_env$local.values.1[ ,17]

ds.test_env$local.values.2[ ,17]

ds.test_env$local.values.3[ ,17]

NUMERIC_ONE_CHANGEcopy of NUMERIC field, with only one change, purpose of the field is disclosure testing.'D$NUMERIC_ONE_CHANGE'

ds.test_env$local.values.1[ ,18]

ds.test_env$local.values.2[ ,18]

ds.test_env$local.values.3[ ,18]

INTEGER_ONE_CHANGEcopy of INTEGER field, with only one change, purpose of the field is disclosure testing.'D$INTEGER_ONE_CHANGE'

ds.test_env$local.values.1[ ,19]

ds.test_env$local.values.2[ ,19]

ds.test_env$local.values.3[ ,19]

Download

The datasets as well as the dictionary can be found in GitHub: https://github.com/patRyserWelch8/dsTestData


DataSHIELD Wiki by DataSHIELD is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Based on a work at http://www.datashield.ac.uk/wiki