Singularity
The advantage of Singularity over other solutions is its focus on HPC environment, which includes support for parallel execution and GPUs. It is also more feature rich and user friendly.
CHPC provides containers for some applications. Users can also bring in their own containers, provided they include a few plugs for our systems, such as mount points for home and scratch file systems. Finally, Singularity also allows to import Docker containers, most commonly from container repositories such as DockerHub.
Importing Docker Container
Singularity has a direct support for Docker containers. Singularity and Docker page provide a good overview of Singularity's Docker container support. Below we list some basics with some local caveats.
Running Docker container directly in Singularity
To start a shell in a Docker container using Singularity, simply point to the DockerHub container URL, e.g
singularity shell docker://ubuntu:latest
Singularity scans the host file systems and mounts them into the container automatically, which allows CHPC's non-standard /uufs and /scratch file systems to be visible in the container as well. This obviates the necessity to create the mount points for these file systems manually in the container and makes the DockerHub containers very easy to deploy with Singularity.
Similarly, we can run a program that's in a DockerHub container as
singularity exec docker://biocontainers/blast:2.2.31 blastp -help
Note that the Biocontainers repositories require the version number tag (following the colon) for Singularity to pull them correctly. The version can be found by finding the tag on the container DockerHub page.
A good strategy in finding a container for a needed program is to go to hub.docker.com and search for the program name.
Converting docker image to Singularity
This approach is useful to speed up a container startup, because Singularity will at each pull or exec from Dockerhub build a new Singularity container file, which may take a while if the container is large. The drawback of this approach is that one has to build the Singularity container manually again if the DockerHub image is updated.
The process is also described in Singularity and Docker page. For example, we can built a local bioBakery container by running:
singularity build bioBakery.sif docker://biobakery/workflows
This newly created bioBakery.sif container can then be run as:
singularity exec bioBakery.sif humann2 --help
This command will execute much faster than executing from the DockerHub pulled image:
singularity exec docker://biobakery/workflows humann2 --help
Checking if container already exists
The container upload and build can be automated by utilizing a shell script that we wrote, update-container-from-dockerhub.sh. This script can be run before every time the container is run to ensure that the latest container version is used without unnecessary uploading if no newer version exists.
The approach described above can be wrapped into a SLURM script that checks if the sif file exists, or if there is an updated container on DockerHub. The SLURM script then may look like this:
#SBATCH -N 1
#SBATCH -p ember
#SBATCH -t 1:00:00
# check if the container exists or is newer and pull if needed
/uufs/chpc.utah.edu/sys/installdir/singularity3/update-container-from-dockerhub.sh biobakery/workflows bioBakery.sif
# run a program from the container
singularity exec bioBakery.sif humann2 --help
Example: Finding and running a Docker container
Frequently we get requests to install complex programs that may not even run on CentOS7. Before writing to CHPC support consider following the example below with your application.
An user wants to install a program called guppy, which installation and use are described at a blog post. He wants to run it on a GPU since it's faster. From the blog post we know the program's name, have a hint on a provider of the program, and how to install it in Ubuntu Linux. After some web searching we find out that the program is mainly available commercially, so it has no publicly available dowload section and, likely there is no CentOS version. That leaves us with a need for an Ubuntu based container.
We can build the container ourselves based on the instructions in the blog post, but, we would need to either build with Docker or Singularity on a local machine with root, or use DockerHub automated build through GitHub repository. This can be time consuming and cumbersome, so we leave it as a last resort.
We do some more web searching to see if guppy
has a container. First we search for guppy dockerhub
, we get lots of hits like this one, but, none for the GPU (looking at the Dockerfile, there's no mention of GPU in the base image or what's being installed). Next we
try "guppy gpu" dockerhub
and find this container. We don't know yet if it does indeed support GPU, and since the Dockerfile is missing,
we suspect that it is hosted on GitHub. So, we search "guppy-gpu" github
and find this repository, which based on the repository name and source looks like a match to the DockerHub
image. Examining the Dockerfile we see that the container is based on nvidia/cuda9.0,
which means it's being set up for a GPU. This is looking hopeful so we get the container
and try to run it.
$ ml singularity
$ singularity pull docker://aryeelab/guppy-gpu
$ singularity shell --nv guppy-gpu_latest.sif
$ nvidia-smi
...
| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1
... to check if the GPU works
$ guppy_basecaller --help
: Guppy Basecalling Software, (C) Oxford Nanopore Technologies, Limited.
Version 2.2.2
... to check that the program is there.
Above we have loaded the Singularity module and used Singularity to pull the Docker
container. This has downloaded the Docker container image layers and has created Singularity
container file guppy-gpu_latest.sif
. Then we opened a shell in this container (using the --nv
flag to bring in the host GPU stack into the container), and tested the GPU visibility
with nvidia-smi
followed by running the command guppy_basecaller
to verify that it exists. With these positive outcomes, we can proceed to run the
program with our data, which can be done directly with
$ singularity exec --nv guppy-gpu_latest.sif guppy_basecaller -i <fast5_dir> -o <output_folder> -c dna_r9.4.1_450bps -x "cuda:0"
As mentioned above, the singularity pull
command creates a Singularity container based on a Docker container image. To guarantee
that we will always get the latest version, we can use the shell script we have described
above, e.g.
$ /uufs/chpc.utah.edu/sys/installdir/singularity3/update-container-from-dockerhub.sh aryeelab/guppy-gpu guppy-gpu_latest.sif
$ singularity exec --nv guppy-gpu_latest.sif guppy_basecaller -i <fast5_dir> -o <output_folder> -c dna_r9.4.1_450bps -x "cuda:0"
If we want to make this even easier to use, we can build an Lmod module and wrap up the commands to be run in the container in this module. First we create user based modules. Then copy our template to the user modules directory:
mkdir $HOME/MyModules/guppy
cd $HOME/MyModules/guppy
cp /uufs/chpc.utah.edu/sys/modulefiles/templates/container-template.lua 3.2.2.lua
and edit the new module file, 3.2.2.lua
, to modify the container name, the command(s) to call from the container and the
module file meta data:
-- required path to the container sif file
local CONTAINER="/uufs/chpc.utah.edu/common/home/u0123456/containers/guppy-gpu_latest.sif"
-- required text array of commands to alias from the container
local COMMANDS = {"guppy_basecaller"}
-- these optional lines provide more information about the program in this module file
whatis("Name : Guppy")
whatis("Version : 3.2.2")
whatis("Category : genomics")
whatis("URL : https://nanoporetech.com/nanopore-sequencing-data-analysis")
whatis("Installed on : 10/05/2021")
whatis("Installed by : Your Name")
When we have the module file created, we can activate the user modules and load the guppy module:
module use $HOME/MyModules
module load guppy/3.2.2
This way we can use just the guppy_basecaller
command to run this program inside of the container.
Running CHPC provided containers
We provide containers for applications that are difficult to build natively on CentOS7 that our clusters run. Most of these applications are being developed on Debian based Linux systems (Ubuntu and Debian) and rely on their software stack. Some containers are simply DockerHub images converted to the Singularity sif format, while others are built manually by CHPC staff.
Running a CHPC provided container is as simple as running the application command itself. We provide an environment module that sets up alias for this command that calls the container behind the scenes. If the container provides more commands, we provide a command to start a shell in the container, from which the user can call the commands needed to execute their processing pipeline.
In the containers, the user can access storage in their home directories, or in the scratch file servers.
Below is a sample of containers that we provide. Complete list can be found by running
module -r spider 'singularity$'
.
bioBakery
bioBakery is a set of tools, one can either use them separately or in a pipeline.
After loading the module file, module load bioBakery
, we have created a shortcut to run a single command called runbioBakery
, e.g. runbioBakery humann2 parameters
. To start a shell in the container and run multiple commands in the shell, run startbioBakery
.
samviewer
Samviewer is an electron microscopy image and analysis program. After loading the
module file, module load samviewer
, we define a shortcut, sv
which maps the sv
command in the container. We also have a shortcut sam
which executes any command inside of the container, e.g. sam python
runs the python from the container.
Bringing in own Singularity container
One can build singularity container on their own machine and scp it to CHPC's systems. Singularity runs on Linux, on MacOS or Windows one can create a Linux VM using e.g. VirtualBox and install Singularity in it. For details on how this can be done, see our Building Singularity containers locally page.
For security reasons we don't allow building containers on CHPC systems, as building
a container requires sudo access. However, we do have a standalone Linux machine where
we allow users to build containers, called singularity.chpc.utah.edu
.
Singularity only supports Linux containers, so, we do not support users importing Windows or MacOS containers.
To ensure portability of your Singularity container to CHPC systems, during the build process create mount points for CHPC file systems
mkdir /uufs /scratch
Then scp the container to a CHPC file server, e.g. your home directory, and run it e.g. as:
module load singularity
singularity shell my_container.sif
Or, if you have defined the %runscript
section in your container, then simply execute it, or use singularity run
:
my_container.sif
singularity run my_containers.sif
Note that in our singularity module, we define two environment variables:
- SINGULARITY_SHELL=/bin/bash - this sets the container shell to bash (easier to use than default sh)
- SINGULARITY_BINDPATH=/scratch,/uufs/chpc.utah.edu - this binds mount points to all the /scratch file servers and to /uufs file servers (sys branch, group spaces).
If you prefer to use a different shell, or not bind the file servers, set these variables differently or unset them.