Skip to content

User installed Python

As Python libraries evolve rapidly and may have specific dependencies, it is becoming increasingly difficult to support the Python distribution centrally. Therefore we encourage users to maintain their own Python stack as described below.

 Why are we moving away from a central Python installation?

The Python ecosystem is growing rapidly and it has become difficult to keep centrally maintained Python distributions up to date. Furthermore, some Python modules depend on specific versions of modules which may be incompatible with others. Finally, user space Python distributions, and specifically Anaconda/Miniconda, are actively incorporating peformance improvements and are comparable or better to hand tuned Python builds.

For these reasons we are deprecating centrally maintained Python distributions and encourage users to maintain their own Python stack as described below.

However, please, be aware that there are some corner cases when Anaconda/Miniconda Python library stack is difficult to install, mostly when there are conflicts between dependent libraries. In those cases, we recommend to research if the particular stack is offered in a form of a Docker container, which can be imported and loaded in our HPC environment using Singularity.

User space Python choices

Miniconda is a minimal Anaconda distribution, which ships with base Python and the conda package manager. This makes the base installation rather small, at 0.3 GB. Additional packages need to be installed manually (described below). Its small base size and selectivity makes it our choice for user space installation.

Anaconda is the most popular Python distribution. It is well optimized and ships with the Intel MKL for fast and threaded numerical calculations. It also comes with a package manager system, conda, and includes many commonly used Python modules. For this reason it occupies 3.2 GB as installed, which is a sizeable amount given our default 50 GB home directory quota. For this reason it is not our top choice.

Intel Distribution for Python is provided by Intel with performance similar to Anaconda. Also similarly to Anaconda, it includes select numerical modules. It can be either installed as a standalone or using the conda package manager.

Miniconda installation and usage

Miniconda installation

Download the Miniconda installer using the wget command and run the installer, pointing it to the directory where you want to install it. We recommend $HOME/software/pkg/miniconda3 for easy integration into user defined environment modules.

bash ./ -b -p $HOME/software/pkg/miniconda3 -s

The flag '-b' forces unattended installation, which among other things does not add Miniconda to your default environment - we will do it in the next step via environment modules. The '-p' marks the installation directory. The '-s' will not automatically set up your environment to use this miniconda - we will do this in the next section using the environment module.

Miniconda environment module

To easily set up the Miniconda environment, create an user environment module. First create a directory where the user environment module hierarchy will reside, and then copy our miniconda module file to this directory.

mkdir -p $HOME/MyModules/miniconda3
cp /uufs/ $HOME/MyModules/miniconda3

The user module environment must be loaded into the default module environment with the module use command. After that, we can load the user space miniconda module.

module use $HOME/MyModules
module load miniconda3/latest

To make the user module environment available in all your future sessions, edit ~/.custom.csh (for tcsh shell) or ~/ (for bash shell) and insert the module use command just below the #!/bin/tcsh or #!/bin/bash. Do not put the module load miniconda3   command in these files since that is known to break the remote connections using FastX.

Conda package manager basics

The conda package manager is recommended for maintaining your user Miniconda Python distribution. Take look at and use the Conda cheat sheet,  which lists the most commonly used commands. More detailed documentation is in the Conda User Guide.

Installing additional Python packages

Miniconda only comes with a basic Python distribution. Therefore one needs to install the needed Python modules. You can list the currently installed Python modules as follows:

To install a new package, run

conda install [packagename]

For example, to install the latest version of the SciPy module, run

conda install scipy

The installer will think for a little while and then install SciPy and a number of other packages on which SciPy depends, such as NumPy and the accelerated Intel Math Kernel Library (MKL). This will cause the Miniconda distribution to grow to about 1.5 GB, but, it will include all the packages needed for high performance numerical analysis with SciPy.

NOTE: Since the MKL library by defaults utilizes all the processor cores on the system, if you are planning to run many independent parallel Python calculations, set the environment variable OMP_NUM_THREADS=1 (setenv OMP_NUM_THREADS 1 for tcsh or export OMP_NUM_THREADS=1 for bash).

Another common Python package is Jupyter, which allows one to run Jupyter notebooks. This can be installed as:

conda install jupyter

To uninstall a conda package run

conda uninstall [packagename]

Conda packages in other channels

If the conda install command can not find the requested package, chances are high it will be in a non-default conda repository (channel). Independent distributors can create their own package channels that house their products. The best approach to find a package which is not in an official conda channel is to do a websearch for it.

For example, to look for a package named Fasta, which is used for biological sequence alignment, we web search for "anaconda fasta". Top search hit suggests the following installation line:

conda install -c biobuilds fasta

The "-c" option specifies the channel name, in our case biobuilds.

To add a channel to the default channel list, we can:

conda config--addchannels biobuilds

However this puts a channel at the top of the list, with the highest priority. We can add a new channel to the bottom of the list instead in the following way:

conda config --append channels biobuilds

Installing Python modules with pip

When a Python package does not exist as a conda package, one can use the Python pip installer. We recommend using pip only as a last resort since this way one loses the flexibility of the conda packaging environment (automatic conflict resolution and version upgrade/downgrade). To install a module using pip, either run:

python -m pip install bibtexparser

or, preferably:

pip install bibtexparser

For other ways how to install Python packages, see our older document.

Miniconda Python environments

Miniconda supports Python virtual environments (VEs). Therefore one can leverage multiple Python instances from a single Miniconda installation. As of conda 4.6, released at the end of June 2019, both bash and tcsh shells are supported provided miniconda module presented above is used.

We can list existing environments with

conda env list

Assuming one uses bash shell, we can for example install the Intel distribution for Python into a separate environment:

conda update -y conda
conda create -c intel -n idp3 intelpython3_core python

This updates conda ("-y" answers Yes to question what packages we want to update), appends the intel conda channel to our channel list, and creates an environment named "idp3" based on the  intelpython3_core package.

We then activate the environment as:

conda activate idp3

All conda package commands can be used within the activated environment, e.g. newly installed conda packages will then be installed in this environment using the conda install command.

To exit from the environment, run:

conda deactivate

While the virtual environments provide convenient way to install different Python modules, their versions and dependencies, we occasionally see conflicts that are hard to figure out and resolve. Furthermore, it is more complicated to wrap the VEs into the Lmod modules and Open Ondemand Jupyter notebooks. For that reason we recommend to use different miniconda installations instead of using virtual environments. For each independent miniconda installation, modify the miniconda module name (e.g. cp latest.lua mynewconda.lua), and in the new module file, specify the path where this particular miniconda is installed in the myanapath variable.


Interactive machine learning environment with Tensorflow, Keras and Jupyter Lab

Tensorflow is a widely used deep learning framework. Keras is an add-on to the framework that attempts to make it more user friendly. Jupyter Lab allows to run Jupyter notebooks, e.g. in our Open Ondemand web portal.

Since Tensorflow performs the best on the GPUs, we will be installing the GPU version. Once the Miniconda3 and its module are installed, look at the Tensorflow installation requirements to note the CUDA and CUDNN versions that the latest Tensorflow requires. As of this writing (December 2021), Tensorflow 2.5 requires CUDA 11.2 and CUDNN 8.1.

  • Load the CUDA and CUDNN modules, and the newly installed Miniconda module (named tf25.lua)
    ml cuda/11.2 cudnn/8.1.1
    ml use $HOME/MyModules
    ml miniconda3/tf25
  • Install the Jupyter Lab.
    conda install jupyterlab
  • Install Tensorflow. Note that we are using pip, not conda, since Google provides its tensorflow builds in pip repositories, and the conda repositories don't have all the versions. This pip installed Tensorflow also includes Keras. Test that Tensorflow can access the GPU(s).
    pip install tensorflow==2.5
    python -c "from tensorflow.python.client import device_lib; print(device_lib.list_local_devices())"
  • In the OpenOnDemand Jupyter Lab app launch window, put the following in the Environment Setup:
    ml use $HOME/MyModules
    ml cuda/11.2 cudnn/8.1.1
    ml miniconda3/tf25
    Note that in order to use GPU with Tensorflow, you need to request a GPU.

Parallel machine learning environment with Tensorflow, Keras and Horovod

Tensorflow is a widely used deep learning framework. Keras is an add-on to the framework that attempts to make it more user friendly. Horovod allows for easy and flexible distributed parallelization of Keras/Tensorflow. Installing these packages and their dependencies form a nice but rather complicated example of using Miniconda environment.

NOTE: These instructions are old so likely may not work exactly as described.

There are a few constraints:

  • We use the defaults conda channel explicitly to install Tensorflow. Other channels, e.g. intel, do not supply the GPU version.
  • horovod requires to be built with MPI and does not appear to be on anaconda channels.

The steps to set up and test this environment are as follows (valid as of November 2018):

  • Install base Miniconda as shown above
  • Prepare the right environment, load the appropriate CUDA and CUDNN for Tensorflow
    ml cuda cudnn
  • Install the appropriate Python version (3.7 as of May 2020), conda based Numpy to get the accelerated MKL support and a few other packages Tensorflow needs. Make sure to use the "defaults" channel since that is the one that has the GPU binaries. Other channels, e.g. "intel", may have a preference if you have added them earlier. Check your ~/.conda.rc for the list of conda channels and their search order (from top down).
    conda install -c defaults python=3.7 numpy six wheel
  • Install tensorflow-gpu from the conda "defaults" channel and test if it can see the GPUs. If the "defaults" channel is not speficied, the "intel" channel tensorflow build is installed which does not support GPUs.
    conda install -c defaults tensorflow-gpu
    python -c "from tensorflow.python.client import device_lib; print(device_lib.list_local_devices())"
  • (NOTE - currently we have issues with MPI based install, so use the default GLOO framework) Install Horovod with pip. Load gcc/6.4.0, Tensorflow requires gcc >= 5.4.0 while the OS default is 4.8.5. (TODO - install NCCL for GPU communication, now it's using only GLOO)
    ml gcc/6.4.0
    pip install horovod
  • Check what parameters was Horovod built with
horovodrun --check-build
Available Controllers:
    [ ] MPI
    [X] Gloo

Available Tensor Operations:
    [ ] NCCL
    [ ] DDL
    [ ] CCL
    [ ] MPI
    [X] Gloo    
  • To run a parallel calculation on multiple GPUs, make sure to use a SLURM shell script rather than interactive job - since interactive job on GPU nodes using gres does not work with mpirun
    #SBATCH -N 1
    #SBATCH -n 4
    #SBATCH -A owner-gpu-guest
    #SBATCH -p notchpeak-gpu-guest
    #SBATCH -t 2:00:00
    #SBATCH --gres=gpu:titanv:4
    #SBATCH --mem=0
    ml use $HOME/MyModules
    ml cuda cudnn
    ml miniconda3/latest
    ml gcc/6.4.0
    horovodrun -np 4 python


Last Updated: 12/16/21