slurm node sharing - Center for High Performance Computing

Specifying requested resources

For node sharing on the GPU nodes, see the GPU page.

For node sharing on the non-GPU nodes, node sharing requires that users explicitly request the number of cores and the amount of memory that should be allocated. The remaining cores and memory will then be available for other jobs. The requested number of cores and amount of memory will be used to set up the “cgroup” for the job, which is the mechanism used to enforce these limits.

Try using our tool that helps users find which accounts, partitions and qualities of service you can use when submitting jobs on Center for High Performance Computing systems.

The number of cores requested must be specified using the ntasks sbatch directive:

#SBATCH --ntasks=2

will request 2 cores.

The amount of memory requested can be specified with the memory batch directive.

#SBATCH --mem=32G

This can also be specified in MB (which is the assumed unit if none is specified):

#SBATCH --mem=32000

If there is no memory directive used, the default behavior is that 2G/core will be allocated to the job.

With node sharing, when using sinfo , you will notice that there is an additional state for jobs that are partially allocated: mix.

Task affinity

Node sharing automatically sets task to CPU core affinity, allowing the job to run only on as many cores as there are requested tasks. To find out what cores was the job pinned to, run

cat /cgroup/cpuset/slurm/uid_$SLURM_JOB_UID/job_$SLURM_JOB_ID/cpuset.cpus

Note that since we have CPU hyperthreading on (allowing two logical cores per one physical core), this command will report a pair of logical cores for each physical core, e.g. (2,30) corresponds to core number 3 (numbering starts from 0) on a 28 core node, and its associated hypercore. Node core numbering from the system perspective is obtained by running numactl -H.

Potential implications for performance

Despite the task affinity, the performance of any job run in a shared manner can be impacted by other jobs run on the same node due to shared infrastructure being used, in particular the I/O to storage and the shared communication paths between the memory and CPUs. If you are doing benchmarking using only a portion of a node, you should not use node sharing but instead request the entire node.

Implications for the amount of allocation used

The allocation usage of a shared job is based on the maximum of (a) the fraction of cores used, (b) the fraction of memory used, and (c) the fraction of GPUs used. On clusters predating granite, GPUs are not allocated, so only (a) and (b) apply for this calculation. We calculate usage in this manner because using one resource—cores, memory, or GPUs—may affect other researchers' ability to use other resources. As an extreme example, if a node had available GPUs but no available CPU cores, a job requiring a GPU would be unable to start; in effect, the other job(s) running on the node are tying up resources, even if they are not used. The same principle applies to memory and GPUs.

In the case of owner resources, where the quarterly allocation listed at https://www.chpc.utah.edu/usage/cluster/current-project-general.php is based on the number of cores or GPUs and the number of hours in the quarter, this can lead to scenarios where more core or GPU hours than are available for a given quarter are used. Note that this will not cause any issues, but that it can lead instances where the usage is greater than the allocated amount and therefore a negative balance will show.

As an example, consider a CPU-only node with 24 cores and 128 GB. Without node sharing, during any 1-hour period, the maximum usage would be 24 core hours. However, with node sharing, one could envision that a serial job which requires a lot of memory requests 1 core and 96 GB of memory along with a second job requesting 20 cores and the remaining 32 GB of memory. As all of the memory has been allocated, the remaining 3 cores will stay idle. If both jobs run simultaneously for 1 hour, a total of 38 core hours of allocation will be used in this hour (18 core hours for the 96 GB memory job, based on 3/4 of the memory of the node being used, along with 20 core hours for the 20 core job).

Node Sharing

Specifying requested resources

Task affinity

Potential implications for performance

Implications for the amount of allocation used