User Installed Python (and Other Software)
As Python libraries evolve rapidly and may have specific dependencies, it is becoming increasingly difficult to support the Python distribution centrally within the CHPC. As a result, we encourage users to install and maintain their own Python packages with the simple methods we describe below.
On this page
The table of contents requires JavaScript to load.
Why does the CHPC Suggest Users Maintain their own Python Module?
The CHPC maintains the Module System for all of our users at the CHPC, as it helps users selectively maintain their computational environment to their own needs without the worry for software dependency conflicts. We maintain these modules for the ease of our users.
As the Python ecosystem is growing rapidly, it has become difficult to keep the CHPC Python module up to date. Additionally, some Python packages create version and/or dependency conflicts with other Python packages.
For these reasons, the CHPC has decided to maintain a Python module with select Python packages installed and now encourage users to maintain their own Python software, as described below.
IMPORTANT NOTE: While Anaconda and Miniconda are licensed as free, the "default" software channel that they use is subject to usage restrictions. In particular, it is free to use for Personal, Educational, Open-Source or Small Businesses (all under 200 employees) as set out in the Anaconda license agreement. However, it is NOT free for research in organizations with more than 200 employees. For that reason, we encourage users to use Miniforge/Mambaforge instead of Miniconda/Anaconda, or, in the worst case, use the "conda-forge" channel, which is free.
User Space Python Choices
There are a variety of options for users to pick from when installing and maintaining their own software installations, both within and outside of Python. While there are pros and cons to each solution, which we have summarized below, the CHPC suggest the mamba tool through Miniforge for its light-weight installation, broad range of software support, and ease of use.
Miniforge/Mambaforge is a minimalistic Python packaging distribution which includes conda and mamba package managers and their dependencies. It is using the "conda-forge" package repository by default which means that it's free for research and teaching. Additional packages need to be installed manually. The base installation size is about 0.8 GB.
Micromambais a tiny version of the mamba
package manager. It is a statically linked C++ executable with a separate command
line interface. It does not need a base
environment and does not come with a default version of Python. It is a good candidate for packaging the whole
Python environment in a container, as described below.
Miniconda is a minimal Anaconda distribution that comes with base Python and the conda package manager. This makes the base installation rather small at 0.3 GB. Because it defaults to the "defaults" channel that has license restrictions, we discourage users from using miniconda.
Anaconda is the most popular Python distribution. It is well optimized and comes with Intel MKL for fast and threaded numerical calculations. It also comes with a package manager system, conda, and includes many commonly used Python packages. For this reason, it occupies 3.2 GB when installed, which is a sizeable amount given our default 50 GB home directory quota. For this reason it is not our top choice. Like Miniconda, its license has restrictions, which is why we recommend against using it.
Intel Distribution for Python is provided by Intel with performance similar to Anaconda. Also, similarly to Anaconda, it includes select numerical modules. It can be either installed as a standalone package or through the conda package manager. Recently, Intel contributions have been incorporated into the main package repositories which makes it unnecessary to have this installed separately.
Miniforge Installation and Usage
While the CHPC maintains their own modules, which you can view through the module spider command, users can also install and use their own modules within their home directories. Below, we will walk you through installing Miniforge as a module and utilizing that new module to install software.
Miniforge Installation
Download the Miniforge installer using the wget command and run the installer, pointing
it to the directory where you want to install it. We recommend $HOME/software/pkg/miniforge3
for easy integration into user defined environment modules.
wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
bash ./Miniforge3-Linux-x86_64.sh -s -b -p $HOME/software/pkg/miniforge3
The flag '-b' forces unattended installation, the '-p' flag marks the installation directory, and the -s flag prevents Miniforge from being added to your default environment. Miniforge will be added to your environment in the next step, via environment modules.
Upon successfully executing these commands above, you should see a new mamba binary file named located at: $HOME/software/pkg/miniforge3/bin/mamba . Verify that this file exists.
Creating a Miniforge Module
To set up a Miniforge module, first create a directory where the module file will reside and then copy our miniconda module file to that directory:
mkdir -p $HOME/MyModules/miniforge3
cp /uufs/chpc.utah.edu/sys/installdir/python/modules/miniforge3/latest.lua $HOME/MyModules/miniforge3
If miniforge is installed in a different location than$HOME/software/pkg/miniforge3
, the module file should be edited to set the local mymambapath
variable to the full path where this miniforge installation is.
To use the new module in your home directory, make the module accessible to your environment
with the module use
command. After that, we can load your new miniforge3 module:
module use $HOME/MyModules
module load miniforge3/latest
By using the 'module load' command, you place Miniforge into your environment.
Do not run the conda init
or mamba init
commands, even though conda/mamba sometimes suggest this. These commands hardcode the Miniforge
environment in your default environment, which is what we are trying to avoid by using
the module setup.
To make your mamba installation and/or virtual environments available to be used by others, please look at the documentation here.
To make the user module environment available in all your future sessions, edit ~/.custom.csh (for tcsh shell) or ~/.custom.sh (for bash shell)
and insert the module use
command just below the #!/bin/tcsh
or #!/bin/bash
. Do not put the module load miniforge3
command in these files since custom Python installations can break the remote connections
using FastX.
Mamba Package Manager Basics
When installing software with mamba, there are two general methods that you could use: installing software to the mamba base environment or installing software to separate virtual environments. If you plan to use multiple packages that each require separate software dependencies, the CHPC recommendeds that you install each package into a separate virtual environment.
To find common use cases outside of the scope of this documentation, take a look at the Conda cheat sheet. More detailed documentation is in the Mamba User Guide or Conda User Guide. Mamba is a drop in replacement for Conda, so Conda and Mamba syntax is the same.
Miniforge Virtual Environments
Because Miniforge supports virtual environments (venvs), one can leverage multiple environments from a single Miniforge module installation. We can list existing environments with:
mamba env list
Assuming one uses a bash shell, we can, for example, install the Intel distribution for Python into a separate environment:
mamba update -y mamba
mamba create -n py311 python=3.11
'mamba update' updates conda ("-y" answers "yes" to individual package updates automatically). The 'mamba create' command, in conjunction with the -n flag, creates a new environment named 'py311', which includes Python 3.11.
We can then activate the environment as:
conda activate py311
NOTE: mamba activate
does not work, use conda activate
instead.
All conda package commands can be used within the activated environment, meaning additional
conda packages can be installed in this environment using the mamba install
command.
To exit from the environment, run:
mamba deactivate
To delete an environment, such as the idp3 environment we created above, you would run:
conda env remove --name py311
Installing Additional Packages with Mamba
This section includes instructions for installing additional software to your base Miniforge3 software installation installed via the commands above. The CHPC does not recommend users install additional software within the base Miniforge3 software installation as this can create software dependency conflicts. The solution to these conflicts is to install a fresh Miniforge3 environment and create a separate module file for each fresh install, however this takes up unneeded space in your home directory. We therefore would instead recommend creating mamba virtual environments.
Miniforge3 only comes with a basic Python distribution. Therefore, you need to install your additional Python modules to the base environment by hand. After following the instructions for installing and using the miniforge module above, you can list the currently installed Python modules as follows:
To install a new package, run
mamba install [packagename]
For example, to install the latest version of the SciPy module, run
mamba install scipy
The installer will think for a little while and then install SciPy and a number of other packages on which SciPy depends, such as NumPy and the accelerated OpenBLAS library.
To install a specific version of scipy, run:
mamba install scipy=1.14.1
To uninstall a package run
mamba uninstall [packagename]
NOTE: If you notice that the Intel Math Kernel Library (MKL) is installed instead, since
the MKL library by defaults utilizes all the processor cores on the system, if you
are planning to run many independent parallel Python calculations, set the environment
variable OMP_NUM_THREADS=1 (setenv OMP_NUM_THREADS 1
for tcsh or export OMP_NUM_THREADS=1
for bash).
Conda Packages in other Channels
If the mamba install
command cannot find the requested package, chances are that it will be in a non-default
conda repository (also called a conda channel). Independent distributors can create
their own package channels that house their software. The best approach to find a
package that is not in an official conda channel is to do a websearch for it.
For example, to look for a package named Fasta, which is used for biological sequence alignment, we web search for "anaconda fasta". Top search hit suggests the following installation line:
mamba install -c biobuilds fasta
The "-c" option specifies the channel name, in our case biobuilds
.
To add a channel to the default channel list, we can:
mamba config add channels biobuilds
However, this puts a channel at the top of the list, with the highest priority. We can add a new channel to the bottom of the list instead in the following way:
mamba config --append channels biobuilds
If you are using Anaconda or Miniconda, it is better to add the free conda-forge
channel and remove the commercial defaults
channel with the commands below. Although as of October 2024, Anaconda, Inc, which
manages the defaults
channel says that it is free for academic institutions, there have been some controversies
around that.
conda config --add channels conda-forge
conda config --set channel_priority strict
conda config --remove channels defaults
To verify that the defaults
channel is removed:
conda config --show channels
channels:
- bioconda
- conda-forge
- r
Installing Python Modules with Pip
When a Python package does not exist as a conda package, one can use Python's pip
installer. We recommend using pip
only as a last resort because you lose the flexibility of the conda packaging environment,
as conda works towards automatic conflict resolution between packages and version
upgrade/downgrade).
To install a module using pip
, we suggest you run:
pip install bibtexparser
Another install option includes:
python -m pip install bibtexparser
For additional methods of installing Python packages, see our older documentation.
Using Conda/Mamba Virtual Environments in Open OnDemand
Just as we can install software packages within mamba virtual environments, we can also install Jupyter notebook into a virtual environment to start up Jupyter notebook session. It is not recommended to run the Jupyter notebook command from the terminal, as it will require us to create an SSH tunnel to the machine where we run the terminal to access Jupyter in our client's web browser. The Open OnDemand Jupyter app simplifies this problem greatly by launching Jupyter directly in the client's browser.
We can begin this process by first following the mamba installation instructions above, then by creating a virtual environment and installing the software into it. In this example, we will install the software scipy into a Mamba virtual environment that we plan to use in a Jupyter notebook session.
module use $HOME/MyModules
module load miniforge3/latest
mamba update -y mamba
mamba create -n my_venv jupyterlab scipy
conda activate my_venv
pip install jupyter notebook
conda deactivate
The module commands above load the newly created miniforge module into our environment. In the 'mamba create' command, we name our virtual environment 'my_venv' and install the jupyterlab and scipy software. We then activate the virtual environment with conda, use pip to install jupyter notebook tools into the environment, and finish by deactivating the environment.
To use the virtual environment in Open OnDemand's Jupyter app, we choose the "Custom (Environment setup below)" option for the "Jupyter Python version", and in the "Environment Setup for Custom Python" text box, put:
module use $HOME/MyModules
module load miniforge3/latest
conda activate my_venv
Once your Jupyter session begins, you should now be able to successfully connect to a Jupyter notebook and access the commands within your virtual environment.
Setting up Miniforge for Others to Use
Sometimes a researcher needs to install a Miniforge package that they want their co-workers to use as well. This helps with reproducibility, as the same software stack is used by all the researchers. To achieve this, we follow the Miniforge installation and usage instructions, but with a few modifications outlined below.
First, we install Miniforge to a different location and change its name to the package
that we want to install to easily recognize the environment. For example, if we plan
to create a Miniforge environment with the BLAST+ bioinformatics package, we could
name the directory miniconda3-blast
.
wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
bash ./Miniforge3-Linux-x86_64.sh -b -p $HOME/software/pkg/miniforge3-blast -s
Next, we create the module file. We should name it along the same lines as the package
name; in this example it would be miniconda3/blast
. We get the module template from CHPC as follows:
mkdir -p $HOME/MyModules/miniforge3
cp /uufs/chpc.utah.edu/sys/installdir/python/modules/miniforge3/latest.lua $HOME/MyModules/miniforge3/blast.lua
The template module file is the same as CHPC's template, so it only points to the
miniforge in the owner's $HOME
, and, in its current state, would be unable to be used by other users. As we want
this miniforge environment to be available to other users, we need to modify the new
module file to include the absolute path to the location of this new miniforge3-blast
installation and not the local path based on your $HOME directory.
To change this, open the module file in a text editor and look at lines 25-30:
-- change myanapath if the installation is in a different place in your home
-- note that this is a relative path from the base of your home directory
local mymambapath = "software/pkg/miniforge3"
-- if you want to share this miniforge installation with others, use the full
-- path
--local mymambapath = "/uufs/chpc.utah.edu/common/home/u0123456/software/pkg/miniforge3"
Notice that the mymambapath
is a relative path, not an absolute one. The $HOME
variable will be added to this relative path later in the module file. As each user
has their own unique $HOME
, this prevents other users from using this module file to set up this miniforge environment.
To fix this, we have to put the absolute path to our new minforge in the mymambapath
variable, as follows:
-- change myanapath if the installation is in a different place in your home
-- note that this is a relative path from the base of your home directory
-- local mymambapath = "software/pkg/miniforge3"
-- if you want to share this miniforge installation with others, use the full
-- path
local mymambapath = "/uufs/chpc.utah.edu/common/home/u0123456/software/pkg/miniforge3-blast"
Changing u0123456 to your uNID. Later in the module file, the logic recognizes this change in the local mymambaapath and keeps this absolute path.
Once we save the module file, we are ready to use it and initialize the miniconda environment:
module use $HOME/MyModules
module load miniforge3/blast
Other users can do the same, except that they have to use the full path to your home directory, again replacing u0123456 with your uNID:
module use /uufs/chpc.utah.edu/common/home/u0123456/MyModules
module load miniforge3/latest
Once the miniforge3/blast
module is loaded, we can run the conda
commands to install the needed packages, e.g. for BLAST:
mamba install -c bioconda blast
and then make it available to co-workers.
Please, note that we recommend not to install Conda or Python Virtual Environments (venvs) into a singular module-based Miniforge installation. A more robust approach is to create multiple Miniforge installations with corresponding module names. Within these separate Miniforges, install the packages that would otherwise be installed in the venvs. There are several advantages to this approach. First, the packages installed into the separate Miniforges are completely independent from each other, which alleviates possible dependency and versioning problems between the base Conda environment and the Virtual Environment. Second, the tcsh shell has been historically less supported with the venvs, creating problems for tcsh users. Finally, the modules can be freely loaded and unloaded, allowing to completely clear the environment, which is more problematic with the venvs, as there is always the base environment that remains.
Micromamba in a Container - Installation and Use
Installing the whole conda environment in a container has several benefits: 1) The conda environment is packaged in a single file, so it's easy to move the whole environment somewhere else. 2) The whole environment is static (fixed during the build of the container), so it won't get accidentally changed when trying to install or update a package, as updates require building a new container. 3) While it is equaly possible to create a container based on miniconda, the Micromamba installation is smaller and is using the better performing mamba package manager instead of conda.
Below, we outline steps in building a Micromamba container with some bioinformatics tools and a Jupyter Notebook as an example and use this environment on CHPC's Open OnDemand Jupyter app.
Creating a Micromamba Container
We will use Apptainer to create the container. First, we create a recipe file for building the container,
an example of which is linked here, and name it Singularity
:
Bootstrap: docker
From: mambaorg/micromamba
%post
micromamba install --yes --name base -c bioconda -c conda-forge \
python=3.9.1 notebook samtools bwa
micromamba clean -aqy
%runscript
micromamba run -p /opt/conda "$@"
In this recipe, we are pulling the Micromamba container from DockerHub, installing
the needed tools in the post
section, and executing the micromamba run ...
command to execute whenever the container is executed.
Python environment packages can be also built into a container by specifying them
within an environment.yml
file, e.g.
channels:
-defaults
-conda-forge
dependencies:
-matplotlib
-python=3.9
-pip
In that case, we can modify the micromamba install
command in the %post section as:
micromamba create --yes --name base --file environment.yml
We build the container by running the following code (for example, in a bash shell):
module load apptainer
unset APPTAINER_BINDPATH
apptainer build mymamba.sif Singularity
Unsetting the APPTAINER_BINDPATH is necessary to avoid a build error that complains about missing mount points in the container. This environment variable ensures /scratch and /uufs get mounted automatically when the container is executed.
The container .sif file has executable permissions, so we can run the container directly along with the command we want to run within the container:
$ ./mymamba_jup.sif bwa
Program: bwa (alignment via Burrows-Wheeler transformation)
Version: 0.7.17-r1188
...
Micromamba GPU Container
For a container that interacts with GPUs, one has to use the --nv
flag during the container build, which imports the GPU stack from the host into the
container to ensure that the mamba package manager picks up the GPU/CUDA dependencies
and installs the GPU version of PyTorch. To see an example of a container that has
the PyTorch environment installed, visit this link.
We build the container by running the following code (for example, in a bash shell):
module load apptainer
unset APPTAINER_BINDPATH
apptainer build --nv mymamba_gpu.sif Singularity.gpu
To test that the correct GPU version of PyTorch is installed, we export the environment
variable APPTAINER_NV as an alternative to --nv
flag. This ensures that we can directly run the container file and include the correct
GPU environment:
module load apptainer
export APPTAINER_NV=true
./mymamba_gpu.sif /opt/conda/bin/python -c "import torch; print(torch.cuda.is_available())"
True
Using the Micromamba Container in Open OnDemand
Just as we have run the bwa command above, we can also run the Jupyter notebook command to start Jupyter. It is not recommended to run the Jupyter notebook command from the terminal, as it will require us to create an SSH tunnel to the machine where we run the terminal to access Jupyter in our client's web browser. The Open OnDemand Jupyter app simplifies this problem greatly by launching Jupyter directly in the client's browser.
To run our container, we choose the "Custom (Environment setup below)" option for the "Jupyter Python version", and in the "Environment Setup for Custom Python" text box, put:
shopt -s expand_aliases
module load apptainer
alias jupyter="$HOME/containers/mymamba.sif jupyter"
The first command is a bash option to enable aliases in the shell script. We then
load the Apptainer module and follow this with creating an alias for the jupyter
command to call it from the container instead. This jupyter
alias is then passed to the Open OnDemand Jupyter app and launches the Jupyter server.
Notice that we use full path to the container, as the OpenOnDemand app starts at the
root of user's $HOME directory.
As noted above, if we need to use GPUs, we need to add the environment variable APPTAINER_NV=true
to initialize the GPUs in the container:
shopt -s expand_aliases
module load apptainer
export APPTAINER_NV=true
alias jupyter="$HOME/containers/mymamba_gpu.sif jupyter"
Miniconda Installation Examples
Interactive Machine Learning Environment with Tensorflow, Keras and Jupyter Lab
Tensorflow is a widely used deep learning framework. Keras is an add-on to the framework that attempts to make it more user friendly. Jupyter Lab allows to run Jupyter notebooks, e.g. in our Open Ondemand web portal.
As Tensorflow performs best on GPUs, we will be installing the GPU version. Once Miniconda3 and its module are installed, look at the Tensorflow installation requirements to note the CUDA and CUDNN versions that the latest Tensorflow requires. As of this writing (September 2024), Tensorflow 2.5 requires CUDA 12.1 and CUDNN 8.9.
- Load the CUDA and CUDNN modules, and the newly installed Miniconda module (named tf25.lua)
module load cuda/12.1.0
module load cudnn/8.9.7.29-12-gpu
module use $HOME/MyModules
module load miniforge3/tf25 - Install Jupyter Lab in a virtual environment and activate the virtual environment.
mamba create -n my_venv jupyterlab
conda activate my_venv - Install Tensorflow. Note that we are using pip, not conda, as Google provides its
tensorflow builds in pip repositories and the conda repositories don't have all of
the versions. This pip installed Tensorflow also includes Keras. Test that Tensorflow
can access the GPU(s).
pip install jupyter notebook
pip install tensorflow
python -c "from tensorflow.python.client import device_lib; print(device_lib.list_local_devices())"
conda deactivate - In the OpenOnDemand Jupyter Lab app launch window, put the following in the Environment
Setup:
module load cuda/11.2
module load cudnn/8.1.1
module use $HOME/MyModules
module load miniconda3/tf25
conda activate my_venv