Apptainer/Singularity

The advantage of Apptainer or Singularity over other solutions is its focus on HPC environment, which includes support for parallel execution and GPUs. It is also more feature rich and user friendly.

CHPC provides containers for some applications. Users can also bring in their own containers, provided they include a few plugs for our systems, such as mount points for home and scratch file systems. Finally, Apptainer/Singularity also allows to import Docker containers, most commonly from container repositories such as DockerHub.

Note: As of late 2022, Singularity is being replaced by Apptainer, provided by the apptainer module. Apptainer has the same code base as Singularity but is being developed independently and as such the code base is expected to diverge over time. The singularity command is still being defined in Apptainer so users can either use the apptainer or the singularity command to obtain the same functionality.

Importing Docker container
Running CHPC provided containers
Building your own Singularity container

Importing Docker Container

Singularity/Apptainer has a direct support for Docker containers. Singularity and Docker page provide a good overview of Singularity's Docker container support. Below we list some basics with some local caveats. It is assumed that the Singularity or Apptainer module is loaded, module load singularity or module load apptainer.

Running Docker container directly in Singularity/Apptainer

To start a shell in a Docker container using Singularity/Apptainer, simply point to the DockerHub container URL, e.g

singularity shell docker://ubuntu:latest

Singularity scans the host file systems and mounts them into the container automatically, which allows CHPC's non-standard /uufs and /scratch file systems to be visible in the container as well. This obviates the necessity to create the mount points for these file systems manually in the container and makes the DockerHub containers very easy to deploy with Singularity.

Similarly, we can run a program that's in a DockerHub container as

singularity exec docker://biocontainers/blast:2.2.31 blastp -help

Note that the Biocontainers repositories require the version number tag (following the colon) for Singularity to pull them correctly. The version can be found by finding the tag on the container DockerHub page.

A good strategy in finding a container for a needed program is to go to hub.docker.com and search for the program name.

Converting docker image to Singularity

This approach is useful to speed up a container startup, because Singularity will at each pull or exec from Dockerhub build a new Singularity container file, which may take a while if the container is large. The drawback of this approach is that one has to build the Singularity container manually again if the DockerHub image is updated.

The process is also described in Singularity and Docker page. For example, we can built a local bioBakery container by running:

singularity build bioBakery.sif docker://biobakery/workflows

This newly created bioBakery.sif container can then be run as:

singularity exec bioBakery.sif humann2 --help

This command will execute much faster than executing from the DockerHub pulled image:

singularity exec docker://biobakery/workflows humann2 --help

Modifying Docker container

Sometimes it is necessary to make modifications to a container that is downloaded from a public space (e.g. Dockerhub). For example, the container may not include a program that is needed to run part of the workflow for which the container is designed. To do that, we can build the sandbox container, which is a flat file system representation of the container, shell into the container in a writeable mode, make the modification, and then build the container sif file from the sandbox. This is possible in the user space with Apptainer 1.2.5 and newer.

module load apptainer
apptainer build --sandbox mycontainer docker://ubuntu:latest
mkdir mycontainer/uufs
apptainer shell -w mcontainer
... make the necesary modifications and exit
apptainer build mycontainer.sif mycontainer

Note that we are also running mkdir mycontainer/uufs, this is necessary because running the container as user needs to mount the /uufs space where the user home directory is, for which there needs to be a mount point in the container.

Checking if container already exists

The container upload and build can be automated by utilizing a shell script that we wrote, update-container-from-dockerhub.sh. This script can be run before every time the container is run to ensure that the latest container version is used without unnecessary uploading if no newer version exists.

The approach described above can be wrapped into a SLURM script that checks if the sif file exists, or if there is an updated container on DockerHub. The SLURM script then may look like this:

#SBATCH -N 1
#SBATCH -p ember
#SBATCH -t 1:00:00

# check if the container exists or is newer and pull if needed
/uufs/chpc.utah.edu/sys/installdir/singularity3/update-container-from-dockerhub.sh biobakery/workflows bioBakery.sif 
# run a program from the container
singularity exec bioBakery.sif humann2 --help

Setting up a module file for a downloaded container

To make the commands that run from the container easier to use, we can build an Lmod module and wrap up the commands to be run in the container in this module. This way the commands and program located inside of the container can be called by their original name, not through the singularity command. First we create user based modules, and there put a directory named after our container, here my_new_container . Then copy our template to this new user modules directory, and name it based on the container program version, here 1.0.0 :

mkdir $HOME/MyModules/my_new_container
cd $HOME/MyModules/my_new_container
cp /uufs/chpc.utah.edu/sys/modulefiles/templates/container-template.lua 1.0.0.lua

Then edit the new module file, 1.0.0.lua, to modify the container name, the command(s) to call from the container and the module file meta data:

-- required path to the container sif file
local CONTAINER="/uufs/chpc.utah.edu/common/home/u0123456/containers/my_new_container.sif"
-- required text array of commands to alias from the container
local COMMANDS = {"command"}
-- these optional lines provide more information about the program in this module file
whatis("Name : Program name")
whatis("Version : 1.0.0")
whatis("Category : Program's category")
whatis("URL : Program's URL")
whatis("Installed on : 10/05/2021")
whatis("Installed by : Your Name")

When we have the module file created, we can activate the user modules and load the module:

module use $HOME/MyModules
module load my_new_container/1.0.0

This way we can use just the command to run this program inside of the container, without the need of having he long singularity execution line.

It may be difficult to find what are the programs in the container that need to be placed in the COMMANDS list. Our strategy is to look for the location of the program inside the container, finding the directory where this program is, and get a list of all programs, that are in this directory (oftentimes this directory name is some kind of bin). This directory usually contains the programs/commands, that are supplied by the given package.

To get a list of these programs, first load the newly created container module and execute the containerShell to get a shell to the container. Then run which command, which finds the directory where the command program is located. cd to this directory, and run script /uufs/chpc.utah.edu/sys/modulefiles/templates/grab_path_progs.sh that produces the list of files in this directory. Scrutinize this list, removing programs that look uneeded, and paste it into the COMMANDS list of the module file. Make sure that the quotes in all the COMMANDS list items are correctly placed.

NOTE: Some packages execute programs from scripts, which effectively calls the program command, (e.g. in the medaka/1.7.2 container, binary called medaka), that may be defined in the COMMANDS list, inside of the container (medaka is listed in COMMANDS in the medaka/1.7.2 module file, however, some other commands such as medaka_consensus call medaka, which results in medaka being executed inside the container). This may result in the following error: “singularity: command not found”. This is because we define the alias to the command in the module file to use Singularity to call the program in the container, (e.g. in the medaka/1.7.2 container, this is singularity exec /uufs/chpc.utah.edu/sys/installdir/r8/medaka/1.7.2/medaka.1.7.2.sif medaka). Inside of the container, the "singularity" command is not defined, but, the command is set up to call "singularity", because Singularity by default inherits the parent shell environment (this includes the re-definition of the medaka command to call the long singularity line listed above). To fix this problem, open the module file, locate the following section:

local run_function = 'singularity exec ' .. nvswitch .. CONTAINER .. "

and change it to :

local run_function = 'singularity exec  -e' .. nvswitch .. CONTAINER .. "

This will not import the runtime environment into the container, thus the "singularity" aliases will not be defined container. The negative of this approach is that potential environmental variables defined by user before executing the command, for example to modify how the program should execute, may not be available. The way around that is to either see if the program's behavior is modifyable by a runtime argument instead, or, alternatively, use the SINGULARITYENV_name_of_the_variable to define the environment variable to be brought into the container (e.g. if we define an environment variable DATA=$HOME/mydata, then we also need to set environment variable SINGULARITYENV_DATA=$DATA to make the DATA environment variable available to the program command that runs inside of the container.

Example: Finding and running a Docker container

Frequently we get requests to install complex programs that may not even run on the OS that runs CHPC Linux machines. Before writing to CHPC support consider following the example below with your application.

An user wants to install a program called guppy, which installation and use are described at a blog post. They want to run it on a GPU since it's faster. From the blog post we know the program's name, have a hint on a provider of the program, and how to install it in Ubuntu Linux. After some web searching we find out that the program is mainly available commercially, so it has no publicly available dowload section and, likely there is no CentOS version. That leaves us with a need for an Ubuntu based container.

We can build the container ourselves based on the instructions in the blog post, but, we would need to either build with Docker or Singularity on a local machine with root, or use DockerHub automated build through GitHub repository. This can be time consuming and cumbersome, so we leave it as a last resort.

We do some more web searching to see if guppy has a container. First we search for guppy dockerhub, we get lots of hits like this one, but, none for the GPU (looking at the Dockerfile, there's no mention of GPU in the base image or what's being installed). Next we try "guppy gpu" dockerhuband find this container. We don't know yet if it does indeed support GPU, and since the Dockerfile is missing, we suspect that it is hosted on GitHub. So, we search "guppy-gpu" githuband find this repository, which based on the repository name and source looks like a match to the DockerHub image. Examining the Dockerfile we see that the container is based on nvidia/cuda9.0, which means it's being set up for a GPU. This is looking hopeful so we get the container and try to run it.

$ ml singularity
$ singularity pull docker://aryeelab/guppy-gpu
$ singularity shell --nv guppy-gpu_latest.sif
$ nvidia-smi
...
| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1
... to check if the GPU works
$ guppy_basecaller --help
: Guppy Basecalling Software, (C) Oxford Nanopore Technologies, Limited.
Version 2.2.2
... to check that the program is there.

Above we have loaded the Singularity module and used Singularity to pull the Docker container. This has downloaded the Docker container image layers and has created Singularity container file guppy-gpu_latest.sif. Then we opened a shell in this container (using the --nv flag to bring in the host GPU stack into the container), and tested the GPU visibility with nvidia-smi followed by running the command guppy_basecaller to verify that it exists. With these positive outcomes, we can proceed to run the program with our data, which can be done directly with

$ singularity exec --nv guppy-gpu_latest.sif guppy_basecaller -i <fast5_dir> -o <output_folder> -c dna_r9.4.1_450bps -x "cuda:0"

As mentioned above, the singularity pull command creates a Singularity container based on a Docker container image. To guarantee that we will always get the latest version, we can use the shell script we have described above, e.g.

$ /uufs/chpc.utah.edu/sys/installdir/singularity3/update-container-from-dockerhub.sh aryeelab/guppy-gpu guppy-gpu_latest.sif
$ singularity exec --nv guppy-gpu_latest.sif guppy_basecaller -i <fast5_dir> -o <output_folder> -c dna_r9.4.1_450bps -x "cuda:0"

To make this even easier to use, we build an Lmod module and wrap up the commands to be run in the container in this module. First we create user based modules. Then copy our template to the user modules directory:

mkdir $HOME/MyModules/guppy
cd $HOME/MyModules/guppy
cp /uufs/chpc.utah.edu/sys/modulefiles/templates/container-template.lua 3.2.2.lua

and edit the new module file, 3.2.2.lua, to modify the container name, the command(s) to call from the container and the module file meta data:

-- required path to the container sif file
local CONTAINER="/uufs/chpc.utah.edu/common/home/u0123456/containers/guppy-gpu_latest.sif"
-- required text array of commands to alias from the container
local COMMANDS = {"guppy_basecaller"}
-- these optional lines provide more information about the program in this module file
whatis("Name : Guppy")
whatis("Version : 3.2.2")
whatis("Category : genomics")
whatis("URL : https://nanoporetech.com/nanopore-sequencing-data-analysis")
whatis("Installed on : 10/05/2021")
whatis("Installed by : Your Name")

When we have the module file created, we can activate the user modules and load the guppy module:

module use $HOME/MyModules
module load guppy/3.2.2

This way we can use just the guppy_basecaller command to run this program inside of the container.

Running CHPC provided containers

We provide containers for applications that are difficult to build natively on the operation system that our clusters run. Most of these applications are being developed on Debian based Linux systems (Ubuntu and Debian) and rely on their software stack. Some containers are simply DockerHub images converted to the Singularity sif format, while others are built manually by CHPC staff.

Running a CHPC provided container is as simple as running the application command itself. We provide an environment module that sets up aliases for the application's commands that calls the container behind the scenes. We also provide a command to start a shell in the container, "containerShell", which allow to open a shell in the container, and from which the user can call the commands needed to execute their processing pipeline.

In the containers, the user can access storage in their home directories, or in the scratch file servers.

List of containerized modules can be found by running grep -R --include "*.lua" singularity /uufs/chpc.utah.edu/sys/modulefiles/CHPC-r8/Core | grep depends_on.

Building your own Singularity container

As of Apptainer version 1.2.5, one can build a container completely in user space, which means that container builds can be done on CHPC Linux systems. Since large containers require more CPU and memory resources, it is recommended to do so in an interactive job.

salloc -N 1 -n 16 -A notchpeak-shared-short -p notchpeak-shared-short -t 2:00:00 --gres=gpu:1080ti:1
module load apptainer
unset APPTAINER_BINDPATH
apptainer build --nv mycontainer.sif Singularity

We first ask for an interactive job session and optionally request a GPU as well. Then we load the Apptainer module and unset the module pre-set APPTAINER_BINDPATH environment variable - having this set results in the build process erroring out due to non-existent bind path. Then we build the container based on the definition file called Singularity. The --nv flag which initializes the GPU support during the build is optional, but needed for the GPU programs to be set up correctly.

Note that in our apptainer module, we define two environment variables:

APPTAINER_SHELL=/bin/bash - this sets the container shell to bash (easier to use than default sh)
APPTAINER_BINDPATH=/scratch,/uufs/chpc.utah.edu - this binds mount points to all the /scratch file servers and to /uufs file servers (sys branch, group spaces).

If you prefer to use a different shell, or not bind the file servers, set these variables differently or unset them.