Continuous integration
Rationale
DataSHIELD has a complex stack in which problems may creep in at various levels (e.g. a change in the operating system, a shared library, opal, R, other DataSHIELD functions etc). We also have a range of code authors working on variety of platforms. With this in mind we like to check new code works before it is accepted into the main code base, and we like to build the entire stack at regular intervals to make sure nothing has changed underneath us. We do this with continuous integration.
Process overview
In overview, we do this by:
- Start with a vanilla VM,
- Install dsBase, dsBaseClient and all relevant dependencies (opal, R, loads of R libraries etc),
- Run all the tests,
- Check the output.
Azure pipelines
We use the Azure DevOps pipelines continuous integration service to do this. It allows us to spin up a vanilla ubuntu VM (currently 16.04) with sudo privileges that we can run for an arbitrary length of time. We did initially use Travis, however the time limits became a barrier to running our tests which take ~40mins at time of writing (April 2020).
The behaviour of the pipeline is governed by a combination of settings in the Azure account (this is just permissions really) and the contents of a yml file in the root of the dsBaseClient - https://github.com/datashield/dsBaseClient/blob/master/azure-pipelines.yml The yml file is (hopefully) well documented outlining what each of the stages are.
The Azure pipeline is connected to a GitHub repository from where it pulls in its configuration (via the yml file) and looks out for any changes which might require the pipeline to run.
Triggers
The pipeline is started either manually (logging into the azure pipeline account and clicking run) or when a predefined trigger criteria is met:
- A commit to a dsBaseClient branch.
- A pull request to a dsBaseClient branch.
- A scheduled build (1:32am daily).
Build process
The idea here is to fail fast, so that's why we install dsBaseClient first, do some tests on that then install Opal/dsBase etc only if the first tests pass.
- Azure starts a vanilla Ubuntu 16.04 VM.
- Remove MySQL package - this is necessary as the version that comes with the VM is incompatible with what we need.
- Tweak local R environment - the default VM has two virtual CPUs, when R packages are installed they need to be compiled, telling R that multiple CPUs are available lets it compile multiple packages in parallel.
- Install DataSHIELD client (dsBaseClient and all its dependencies).
- Make sure man pages have been updated before committing to GitHub (i.e. run devtools::document() and see if anything has changed).
- Runs devtools::check()
- Install DataSHIELD server -i.e. Opal, dsBase and all their dependencies.
- Runs devtools::test() on dsBaseClient - this runs the testthat tests which check the client side functions behave how they should when the interact with the server side code.
- Parse the devtools::test() output and build an HTML table of results which is committed back to GitHub (https://datashield.github.io/testStatus/dsBaseClient/master/latest/).
Build status
You can set up CI using your own Azure account
Full instructions with screenshots can be found here.
The key step is that you will need to set the endpoints so they match those defined in the azure-pipelines.yml file. Pick 'grant authorization'.
A guide on understanding and using CI once set up can be found here.
Future development
- Extend tests to other repos (outside of dsBetaTestClient etc).
- Look at a set of versions of software e.g. run on permutations of last few releases of R, opal, DataSHIELD etc.
- Extend trigger criteria - should run if dsBase changes for instance.
- Extend the azure file to have a separate job which could run dsDanger (last), maybe another job to run DSLite.
DataSHIELD Wiki by DataSHIELD is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Based on a work at http://www.datashield.ac.uk/wiki