Linux Containers for Reproducible Research
By Kaitlin Logie, January 2017
Why is reproducible research important?
Reproducibility in research is the only thing that an investigator can guarantee about a study. It is regarded as one of the foundations of the entire scientific method! It provides a benchmark on which reliability can be tested.
The idea is that for any research project, an investigator or an independent researcher should be able to replicate the experiment under the same conditions and receive the same results. Reproducibility ensures valid and reliable methodology.
(image by Vanderbilt University)
What is a Linux container?
Linux containers (LCX for short) were developed as a means of isolating a workspace consisting of applications in Linux that require a more secure operating environment. Linux containers can be used to create a layer of separation between the operating system and the application. Linux containers are an operating system level virtualisation method for running isolated applications through a virtual environment that has its own process and network, instead of creating a full-fledged virtual machine. They run their own processes, filesystems and network stacks, which are virtualised using the root operating system (OS) running on the hardware.
There are 2 types of Linux containers:
- Operating system containers (share the host operating system’s kernel).
- Application containers (contain application services with their resources such as required libraries).
In comparison to a VM, Linux containers split the use of a single kernel by utilising namespaces that run on a central host OS. They are extremely light weight and can benefit your research workflow in a number of ways.
- Lightweight and resource-friendly.
- Comprehensive process and resource isolation.
- Run multiple versions of an operating system on a single server independently.
- Rapid and Easy deployment.
What has this got to do with reproducibility?
Using multiple identical Linux containers to compartmentalise your experiments removes experimental inconsistencies across environments. This allows independent researchers to attempt to replicate research in an environment identical to that of the original with no hidden variables.
Where can I begin to use Linux containers?
(image by Saifi Khan)
Docker is an open-source project that automates the deployment of applications inside software containers. Docker containers wrap a piece of software in a complete filesystem that contains everything needed to run: code, runtime, system tools, system libraries – anything that can be installed on a server. This guarantees that the software will always run the same, regardless of its environment.
Docker is currently the world’s leading provider of Linux containers!
Singularity enables users to have full control of their operating system environment. Singularity was designed around the notion of extreme mobility of compute and reproducible science.This means that a non-privileged user can “swap out” the operating system on the host for one they control. So if the host system is running RHEL6 but your application runs in Ubuntu, you can create an Ubuntu image, install your applications into that image, copy the image to another host, and run your application on that host in it’s native Ubuntu environment!
Singularity containers are purpose built, with a main focus on portability (available on many versions of Linux) and reproducibility (encapsulation of the entire user-space environment).
Singularity is currently getting researchers excited and involved for its use on HPC clusters!
I still need advice/help and a push in the right direction with my research!