Andrew Silver

In 2015, geneticist Guy Reeves was trying to configure a free software system called Galaxy to get his bioinformatics projects off the ground. After a day or two of frustration, he asked members of his IT department for help. They installed Docker, a technology for simulating computational environments, which enabled him to use a special version of Galaxy that came packaged with everything he needed — called a container. A slight tweak to the Galaxy settings, and he was “done before lunch”.

Reeves, at the Max Planck Institute for Evolutionary Biology in Plön, Germany, is one of many scientists adopting containers. As science becomes ever more data intensive, more software is being written to extract knowledge from those data. But few researchers have the time and computational know-how to make full use of it. Containers, packages of software code and the computational environment to run it, can close that gap. They help researchers to use a wider array of software, accelerate experiments and promote reproducibility.

Containers are essentially lightweight, configurable virtual machines — simulated versions of an operating system and its hardware, which allow software developers to share their computational environments. Researchers use them to distribute complicated scientific software systems, thereby allowing others to execute the software under the same conditions that its original developers used. In doing so, containers can remove one source of variability in computational biology. But whereas virtual machines are relatively resource-intensive and inflexible, containers are compact and configurable, says C. Titus Brown, a bioinformatician at the University of California, Davis. Although configuring the underlying containerization software can be tricky, containers can be modified to add or remove tools according to the user’s need — flexibility that has boosted their popularity, he says. “I liked the idea of having something that works out of the box,” says Reeves.

Read more.