Docker best practice recommendations

Since tool executions on the Platform rely on Docker as the environment that contains a tool and the required dependencies for its execution, our bioinformaticians have created a set of best practices to follow in order to create Docker environments that are optimized in terms of size, resource usage, traceability and reproducibility, while still maintaining full functionality and the potential of performing all your analyses at scale.

Docker Commands Cheat Sheet

The code block below contains most commonly used Docker commands on the Seven Bridges Platform.

### Log in to the Seven Bridges image registry 
docker login images.sbgenomics.com
# Username: <division-slug>/<username>, and Password: Authentication Token

### Create a Dockerfile and run the following command in the folder where Dockerfile is:
docker build -t images.sbgenomics.com/<division-slug>/<repo-name>/<toolkit-name>:[tag] .
# Example: docker build -t images.sbgenomics.com/my_division/my_repo/python3.8:0 .

### Push the Docker image to the Seven Bridges image registry
docker push images.sbgenomics.com/<division-slug>/<repo-name>/<toolkit-name>:[tag]
# Example: docker push images.sbgenomics.com/my_division/my_repo/bwa_biobambam2:0.7.17

### Copy files from host to a Docker container:
docker cp <file> <container-id>:/opt/<file>
# Example: docker cp job.json c5753bf2cd0d:/opt/job.json

### Add Changelog file to a Docker container:
docker cp Changelog <container-id>:/opt/Changelog
# Example: docker cp Changelog c5753bf2cd0d:/opt/Changelog

### Start stopped container:
# List stopped containers: 
docker ps -a

# Start a container: 
docker start -i <container-name>

### Run docker container from image:
docker run -it images.sbgenomics.com/<division-slug>/<repo-name>/<toolkit-name>:[tag]
# Example docker run -it images.sbgenomics.com/my_division/my_repo/python3.8:0

### For mounting: 
docker run -ti -v <mount-folder>:<mount-point>:<options> <image-id>
# Example: docker run -ti -v ~/path/to/mount_folder:/mount-test:rw -w /mount-test <image-id>

### Delete all stopped containers:
docker rm -f $(docker ps -aq)

Dockerfile

Seven Bridges recommends and utilizes the following set of guidelines when creating a Dockerfile:

  • Dockerfile has to be named Dockerfile.
  • Downloaded or copied files, toolkits, scripts that are part of a toolkit, (not custom scripts as they shouldn't be part of the wrapper), along with the Dockerfile should be stored at /opt/ within the Docker image. Keeping the Dockerfile as part of the image helps with reproducibility as you can easily rebuild the image.
  • The folder containing copied toolkits or scripts should have a version included in the folder name e.g. /opt/GATK_3.7/.
  • Single Dockerfile should be used for a whole toolkit. Tools that are not part of some toolkit are treated as a single tool toolkit and have their own Dockerfile. However, Seven Bridges recommends creating separate wrappers for different commands or tools in the toolkit.
  • Order the instructions. You may regularly build an image during the development of your workflow. You can take advantage of build caching to avoid the execution of time-consuming instructions. You should keep instructions in order of least likely to change to most likely to change.
  • Multi-line arguments should be sorted alphanumerically to avoid duplication of packages.
  • Consolidate instructions to keep the number of layers to a minimum. Use && to chain two commands, and \ to write commands in multiple lines. This is good practice especially when it comes to the apt-get update command, which should be in the same line (layer) as apt-get install:
RUN apt-get update && apt-get install <package>

This way, if we decide to add a package to install, the updating will not be skipped. This would not be the case if these two commands were in separate lines/layers.

📘

Seven Bridges recommends not to use apt-get upgrade.This will update all your packages to their latest versions - which is a poor practice as it prevents your Dockerfile from creating consistent and reproducible builds.

  • Avoid installing unnecessary packages to reduce complexity, image size, and build time.
  • In order to reduce the image size by deleting filesperform the deletionin the same line/layer on which the files to be removed were downloaded or added and processed. Deleting a file in a different Docker instruction (layer) will not reduce the image size. For example, after downloading and unpacking an archive, it is good to also remove it in the same line.
  • The reproducibility of your Dockerfile heavily depends on how well you define the versions of software to be installed in the image. The more specifically you can define them, the better. The practice of specifying versions of software is called version pinning.
    • For example, here is how to clone a specific release tag (1.6) from the samtools repository:
RUN git clone -b 1.6 https://github.com/samtools/samtools
  • In the case that you want to clone and checkout a specific commit, you can use the checkout command:
RUN git clone https://github.com/samtools/samtools && \
cd samtools && \
git checkout 1ea60adbf492d0596a8fb01fd44bafe8fcee5fc0 && \
# install steps go here
  • When you install several system libraries, it is good practice to add comments about why the dependencies are needed. This way, if a piece of software is removed from the container, it will be easier to remove the system dependencies that are no longer needed.
  • It can be helpful to include comments about commands that did not work so you do not repeat past mistakes.
  • Seven Bridges currently utilises ubuntu:18.04 as the base image. If a tool works only with a different version of Ubuntu/OS, an image should be created starting from that specific stable release OS base image.
  • When cloning a GitHub repo, it is strongly recommended to add checkout to a specific version, tag, or commit, to ensure version pinning.
  • Package managers are a good option if you need to install packages or dependencies for a specific language. In some cases, the package manager is able to make decisions about what versions to install, which is likely to lead to a non-reproducible build. Because of this, it is necessary to pin the dependency versions:
RUN pip install \
  pandas==0.25.3 \
  seaborn==0.11.1
  • Every instruction should be described by using comments (comments start with #).
  • The Docker image maintainer should be set using LABEL.
# Set maintainer
LABEL description=’Dockerfile for Python 2.7. and Sambamba 0.6.6’ \
maintainer=’Rosalind Franklin, Seven Bridges, <[email protected]>’ \

Here is an example of a properly written Dockerfile for a tool indended for use on the Seven Bridges Platform.

Building, naming, versioning, and pushing Docker images

Building

Building an image should be done in the directory containing the relevant Dockerfile. The directory used for building an image ideally should only contain necessary files for the building process, besides Dockerfile and the Changelog file, to ensure reduced build time.

docker build -t images.sbgenomics.com/<division-slug>/<repo-name>/<toolkit-name>:[tag] .

Naming

To push an image to the Seven Bridges Image Registry, the image repo and tag should follow the following naming convention:

images.sbgenomics.com/<division-slug>/<repo-name>/<tool-name>:[tag]

In details, Seven Bridges utilises and recommends the following conventions:

  • <division-slug> is your division name on the Platform, as displayed on the Platform URL when you navigate to one of your projects.

  • <repo-name> (repository name) is the name of your chosen repository, which can be an existing one or will be created at the time of pushing.

  • <tool-name> reflects the tool package (toolkit) name and version written as lowercase, dash-separated.

  • [tag] represents an (internal) revision of the image which should be incremented every time an image for a specific toolkit is changed. An example would be fixing a bug on the image or adding a missing module.
    Example:

  • First stable build: images.sbgenomics.com/division/repo/some-toolkit-1-0:0

  • New stable build: images.sbgenomics.com/division/repo/some-toolkit-1-0:1

Example - Docker image for GATK 4.1.0.0 tools:

  • images.sbgenomics.com/my_division/gatk/gatk-4-1-0-0:0

Alternatively, if you are using the Platform outside of the Enterprise context, image naming should be in line with the following pattern:

images.sbgenomics.com/<username>/<repository-name>[:tag]

Example:

  • images.sbgenomics.com/rfranklin/picard:2.27

Docker repository names can include only lowercase and uppercase letters of the English alphabet, numbers from 0 to 9, dash (-) and underscore (_), while use of dots is not allowed. However, apart from the listed characters allowed in repository names, tags can also contain dots.

Versioning

In addition to incrementing a tag once there is a new revision of the image, it is advisable  to also add a revision note in the Changelog file. For example, in this case, a revision note in the Changelog file could be:

## [1] - 1.1.2021
### Added
- Module XY

Instructions for writing a Changelogfile can be found at https://keepachangelog.com/en/0.3.0/.

Changelog file is recommended to be copied to the /opt/ directory together with the Dockerfile when building an image.

Pushing the Docker image to a registry

Once the image is built, it can be pushed and made ready to be referenced from the Platform. In order to push images to the Seven Bridges Image Registry, you have to log in using docker login images.sbgenomics.com (or eu-images.sbgenomics.com if you are using the Seven Bridges EU Platform), by entering Platform username as the username and your authentication tokenas the password. After the successful login, you can now push the image using the docker push command:

docker push images.sbgenomics.com/division/repo/toolkit:tag

📘

We strongly advise against re-pushing a different image to the same Docker repository with the same tag. Even if you are the repository owner, your image may be used by someone else at some stage. Instead, it is recommended that you increment the image tag, thus allowing full reproducibility.

External Docker images and registries

We would recommend you avoid pushing to, or using images from external image registries (e.g. DockerHub) because we cannot guarantee repository stability and availability. An alternative to using an image from an external registry would be to re-push the image to the Seven Bridges Image Registry and use it from there.

The recommended way for doing this is by re-building a new Docker image from a Dockerfile using the external image as a base (FROM). A label with the maintainer and Changelog file should also be included in the image. This image should be pushed to the Seven Bridges Image Registry, following the internal naming and tagging conventions.

Dockerizing custom scripts

You may implement a custom naming schema to separate repositories that contain runtime environments for custom scripts. For instance, Seven Bridges uses the sbg prefix separated with a dash in the repo name to denote an image for custom scripts. The version in the repo name is the script version. You should be aware that the script has to reside in Create Files, while the image is only containing an environment for running the script. For example:

images.sbgenomics.com/platform_username/sbg-tool:0