Dataset for testing
A specific dataset for testing DataSHIELD functions
Functions written for DataSHIELD must be thoroughly tested. Otherwise, some errors, unwanted behaviours and bugs may not be identified and corrected. A testing dataset  referred as TESTING in the Opal server provides three tables; DATASET1, DATASET2, DATASET3. Each column of these tables represents either a data type used in R, some numerical sets such as N, Z, Q, and R, or some factors. Those fields should provide some specific data to test some expected and valid results as well as to assess the mathematical properties of the functions outcomes.  The granularity of the data set should help identifying some issues with the functions for a specific type of data. The table below defines each field of TESTING.DATASET1,  TESTING.DATASET2, and TESTING.DATASET3.
Field name | Description |
---|---|
CHARACTER | Provides a list of gods from across the world |
LOGICAL | Provides some Boolean values: TRUE and FALSEÂ |
NA_VALUES | Some empty values |
NULL_VALUES | NULL values |
INTEGER | represents the Z mathematical set. It contains some integer negative, positive and 0 values |
NON_NEGATIVE_INTEGER | represents the natural numbers and the value 0. |
POSITIVE_INTEGER | represents the natural numbers. That is n > 0. |
NEGATIVE_INTEGER | represents a subset of Z, where n < 0. |
NUMERIC | represents the R and Q mathematical sets. It contains some decimal numbers that are negative, positive and 0. |
NON_NEGATIVE_NUMERIC | represents some decimal values that are greater than or equal to 0. |
POSITIVE_NUMERIC | represents some decimal values that are greater to 0. |
NEGATIVE_NUMERIC | represents some decimal values that are lesser than 0. |
FACTOR_CHARACTER | represents some various marital status, that are repeated several times over the values. It is suitable to test functions that return factors. |
FACTOR_INTEGER | represents some integer factor values. The factors values are repeated several times over the values. It is suitable to test function that return factors. |
IDENTIFIER | represents some integer numerical values, that are repeated. With the column CATEGORY, some functions that shape the values differently can be tested. |
CATEGORY | represents some categories for some identifiers; the latter is provided by the column IDENTIFIER.  With the latter, some functions that shape the values differently can be tested. |
NUMERIC_ONE_CHANGE | represents copy of NUMERIC field, with only one change |
INTEGER_ONE_CHANGE | represents copy of INTEGER field, with only one change |
TESTING dataset and the DataSHIELD testing framework
The content of the three tables in TESTING have been made available in the DataSHIELD testing framework. Three comma-separated files are provided, one for each table. Each of these files contains the same values as the one available on the server. Â For that reason, Â some expected values can be computed using the data stored locally and then compared to the values obtained from some DataSHIELD functions. The latter should use the same values stored on the server and the results be accurate to at least 10-6.
Local and remote storage
This table relates the data stored locally and remotely on the virtual machine. Each file is uploaded and converted as a data frame.
Table | Local file | ds.test_env environment variable |
---|---|---|
TESTING.DATASET1 | tests\testthat\data_files\DATASET1.csv | ds.test_env$local.values.1 |
TESTING.DATASET2 | tests\testthat\data_files\DATASET2.csv | ds.test_env$local.values.2 |
TESTING.DATASET3 | tests\testthat\data_files\DATASET3.csv | ds.test_env$local.values.3 |
Each of the columns correspond to a field of the remote storage. The table below provides a summary with some examples.Â
Field name | Description | Field name used as argument to DataSHIELD function | Local use of data.frame |
---|---|---|---|
CHARACTER | List of gods | 'D$CHARACTER' | ds.test_env$local.values.1[ ,2] ds.test_env$local.values.2[ ,2] ds.test_env$local.values.3[ ,2] |
LOGICAL | TRUE and FALSEÂ | 'D$LOGICAL' | ds.test_env$local.values.1[ ,3] ds.test_env$local.values.2[ ,3] ds.test_env$local.values.3[ ,3] |
NA_VALUES | Some empty values | 'D$NA_VALUES' | ds.test_env$local.values.1[ ,4] ds.test_env$local.values.2[ ,4] ds.test_env$local.values.3[ ,4] |
NULL_VALUES | NULL values | 'D$NULL_VALUES' | ds.test_env$local.values.1[ ,5] ds.test_env$local.values.2[ ,5] ds.test_env$local.values.3[ ,5] |
INTEGER | The Z numerical set. | 'D$INTEGER' | ds.test_env$local.values.1[ ,6] ds.test_env$local.values.2[ ,6] ds.test_env$local.values.3[ ,6] |
NON_NEGATIVE_INTEGER | Natural numbers and the value 0. | 'D$NON_NEGATIVE_INTEGER' | ds.test_env$local.values.1[ ,7] ds.test_env$local.values.2[ ,7] ds.test_env$local.values.3[ ,7] |
POSITIVE_INTEGER | Natural numbers. | 'D$POSITIVE_INTEGER' | ds.test_env$local.values.1[ ,8] ds.test_env$local.values.2[ ,8] ds.test_env$local.values.3[ ,8] |
NEGATIVE_INTEGER | Subset of Z, where n < 0. | 'D$NEGATIVE_INTEGER' | ds.test_env$local.values.1[ ,9] ds.test_env$local.values.2[ ,9] ds.test_env$local.values.3[ ,9] |
NUMERIC | R and Q mathematical sets. | 'D$NUMERIC' | ds.test_env$local.values.1[ ,10] ds.test_env$local.values.2[ ,10] ds.test_env$local.values.3[ ,10] |
NON_NEGATIVE_NUMERIC | Subset of R and Q sets, where n > 0 | 'D$NON_NEGATIVE_NUMERIC' | ds.test_env$local.values.1[ ,11] ds.test_env$local.values.2[ ,11] ds.test_env$local.values.3[ ,11] |
POSITIVE_NUMERIC | Subset of R and Q sets, where n >= 0 | 'D$POSITIVE_NUMERIC' | ds.test_env$local.values.1[ ,12] ds.test_env$local.values.2[ ,12] ds.test_env$local.values.3[ ,12] |
NEGATIVE_NUMERIC | Subset of R and Q sets, where n <= 0 | 'D$NEGATIVE_NUMERIC' | ds.test_env$local.values.1[ ,13] ds.test_env$local.values.2[ ,13] ds.test_env$local.values.3[ ,13] |
FACTOR_CHARACTER | Subset of R and Q sets, where n <= 0 | 'D$FACTOR_CHARACTER | ds.test_env$local.values.1[ ,14] ds.test_env$local.values.2[ ,14] ds.test_env$local.values.3[ ,14] |
FACTOR INTEGER | integer factor values. | 'D$FACTOR_INTEGER' | ds.test_env$local.values.1[ ,15] ds.test_env$local.values.2[ ,15] ds.test_env$local.values.3[ ,15] |
IDENTIFIER | some integer numerical values, that are repeated. With the column CATEGORY, some functions that shape the values differently can be tested. | 'D$IDENTIFIER' | ds.test_env$local.values.1[ ,16] ds.test_env$local.values.2[ ,16] ds.test_env$local.values.3[ ,16] |
CATEGORY | some categories for some identifiers; the latter is provided by the column IDENTIFIER.  With the latter, some functions that shape the values differently can be tested. | 'D$CATEGORY' | ds.test_env$local.values.1[ ,17] ds.test_env$local.values.2[ ,17] ds.test_env$local.values.3[ ,17] |
NUMERIC_ONE_CHANGE | copy of NUMERIC field, with only one change, purpose of the field is disclosure testing. | 'D$NUMERIC_ONE_CHANGE' | ds.test_env$local.values.1[ ,18] ds.test_env$local.values.2[ ,18] ds.test_env$local.values.3[ ,18] |
INTEGER_ONE_CHANGE | copy of INTEGER field, with only one change, purpose of the field is disclosure testing. | 'D$INTEGER_ONE_CHANGE' | ds.test_env$local.values.1[ ,19] ds.test_env$local.values.2[ ,19] ds.test_env$local.values.3[ ,19] |
Download
The datasets as well as the dictionary can be found in GitHub:Â https://github.com/patRyserWelch8/dsTestData
DataSHIELD Wiki by DataSHIELD is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Based on a work at http://www.datashield.ac.uk/wiki