Several work packages and deliverables Newcastle University are involved in across multiple grants and various stakeholders. This high level roadmap/work programme represents a grouping of these work packages and deliverables into broad projects that we are either currently doing or planning on doing.

Testing

Releases

New Lane

GUI

Policy dev

Non-para

dsOmics

Develop testing framework

Write the remainder of tests

Write training material

Prepare v5.0

v6.0 (DSI)

v6.1

Update training material

Prepare v5.1

v6.0 training material update

Future releases

Develop and implement containerisation

Develop DS Resources

Sever reporting and monitoring

Spec GUI

Develop GUI

Initiate steering committee

Develop first version of non-parametric package

dsOmics development

Current work

Testing framework

Key stakeholders: All DataSHIELD users

Background

Have no standard or automatic way of knowing that when a DataSHIELD function changes if it is introducing errors – either by actually failing in unexpected ways, or by giving the wrong answer. Given end users never see the raw data it is possible they will not know if there are problems, so they could go unnoticed for a long time.
Same argument for if anything in the tool stack changes (the operating system, C libraries, R libraries, Opal, Java etc).
Will require tests to be run when new code is being developed, and for tests to pass before new code is accepted in the master branch.
Will require automated continuous integration to regularly (likely daily) run tests to pick up regressions on changes.
Using testthat as the testing framework.
The number and status of tests is a KPI for EUCAN WP5.

Aims

Easy way for DataSHIELD function developers to test their code works, and isn't breaking other functionality downstream.
Automatic way to test the entire stack for regressions.
Current planned classes of tests:
- Test files are syntactically correct.
- Test all relevant imports have been declared.
- Test answers are mathematically correct (e.g. no standard deviation returning negative values)
- Test answers from DataSHIELD are the same as R.
- Test answers are correct on both a single and a multiply partitioned data set.
- Test behavior for unexpected input arguments.
- Test adherence to disclosure control settings.
- Possibly more...
Measure of code coverage.
Simple digest of test status across all functions.
Run across a range of (specified) versions of key software (DataSHIELD, opal, R, others?).

Status

Have over 6000 tests.
Can be run with standard testthat on any DataSHIELD install.
Using Azure pipeline for continuous integration, which builds a sample DataSHIELD install with bleeding edge version of everything from scratch with a vanilla Ubuntu 16.04 VM.
Have a public facing page at https://datashield.github.io/testStatus/

Planned work

Write documentation to make it easy for function developers to write tests.
Roll out tests for all functions.
Aim to have high coverage of tests.

Possible future work

Add Docker to the testing framework
Add DSLite to the testing framework.
Add dsDanger to CI to edit settings and test how affects stack (e.g. disclosure settings).
Run matrix builds across different OS versions, Opal versions, R versions, DataSHIELD versions.

Minor DataSHIELD releases (6.1 onwards)

Key stakeholders: All DataSHIELD users

Background

There is a continual stream of requests for new functions which needs to be managed.

Required work/decisions to be made

Need a prioritisation mechanism of which functions to develop.

Minor DataSHIELD release policy (v6.1 onwards)

Key stakeholders: All DataSHIELD users

Background

After the v6.0 release we will aim to release DataSHIELD functions more often.

Required work/decisions to be made

Need to have a policy for moving functions into the main DataSHIELD repos. Points to include:
- Which classes of tests should be required?
- Require valid examples?
- Should multiple people review the code?
- Who decides when it is time to move it? Just the NU team or wider community?
- How do we manage upgrades – replacing a function in e.g. dsBase might break working projects.
- How do we inform all the consortia of an upgrade?

Suggest we start to discuss this with the intention of having a draft policy in place by the DataSHIELD workshop in September 2020 and to enact policy from DataSHIELD v6.1.

dsOmics

Key stakeholders: EUCAN, ATHLETE

Background

Strong desire to be able to do omics analysis with DataSHIELD.

Status

Huge amount of development work has gone into this, making use of the newly developed resources feature on opal.

non-parametric package

Background

There is a desire to be able to do non-parametric analysis with DataSHIELD. This is difficult to do in a non-disclosive way since many of the existing algorithms rely on having all the data.

Status

This is being actively worked on.

Future work

DataSHIELD Graphical User Interface (GUI)

Key stakeholders: NU, EUCAN (deliverable), TRUST.

Background

Needed a GUI for a long time, for a variety of reasons:

Menu driven simple interface.

Reporting interface.
Hooks for piping into VR.

Status

Submitted a couple of grants to get extra funding to do this. Both were unsuccessful. Now starting to work with HCI team at NU to develop a specification.

DataSHIELD server status at remote sites

Key stakeholders: NU, EUCAN.

Background

It is a pain to keep on top of which DataSHIELD servers are up etc, often data providers don’t know when there is a problem (as was demonstrated in BioSHARE). This is going to be a big problem with all the sites in EUCAN-CONNECT.
Implemented nagios as a monitoring solution towards the end of BioSHARE.

Can monitor anything, we did CPU, RAM, disk usage, opal accessible. Status checks can literally be anything, could check opal up, can run a simple R command, what version of R installed on Server, what version of DataSHIELD installed etc.

Required work/decisions to be made

Should we resurrect this work?
Was done as CRON jobs on the VMs, how does this work with containers?

DataSHIELD integration with health text

Background

Want to integrate natural language processing of health-related text data into DataSHIELD.
Will need new libraries etc.

Status

Not started actively developing this yet.

Road map

Current work

Testing framework

Background

Aims

Status

Planned work

Possible future work

Minor DataSHIELD releases (6.1 onwards)

Background

Required work/decisions to be made

Minor DataSHIELD release policy (v6.1 onwards)

Background

Required work/decisions to be made

dsOmics

Background

Status

non-parametric package

Background

Status

Future work

DataSHIELD Graphical User Interface (GUI)

Background

Status

DataSHIELD server status at remote sites

Background

Required work/decisions to be made

DataSHIELD integration with health text

Background

Status