Using Slurm to Submit Jobs
On this page
The table of contents requires JavaScript to load.
#SBATCH Directives
To create a batch script, use your favorite text editor to create a text file that details job requirements and instructions on how to run your job.
All job requirements passed to Slurm are prefaced by #SBATCH directives. The #SBATCH commands are used to pass computational requirements of your job to Slurm, which Slurm uses to determine what resources to give to your job.
The different #SBATCH directives for your slurm job are:
| Try using our tool that describes accounts and partitions for help finding accounts, partitions, and qualities of service you can use when submitting jobs on Center for High Performance Computing systems. |
Find out which Slurm Accounts you are in
The easiest method to find the accounts and partitions you have access to at the CHPC
is to use the mychpc batch command. This command will output the cluster, the applicable account and partition
for that cluster, and your allocation status for that partition.
An example would look like the below:
GENERAL
CPU --partition=kingspeak-shared --qos=kingspeak --account=baggins [21% idle]
The above shows a general (i.e. non-preemptable) allocation on the kingspeak cluster under the baggins account within the kingspeak-shared partition. It also indicates how much of the partition is available without a wait - in this example, 21% of the CPUs within the kingspeak-shared partition are idle and available for jobs.
If you notice anything incorrect in the output from the mychpc batch command that you feel should be changed, please let us know.
Where to Run Your Slurm Job
There are three main places you can run your job: your home directory, /scratch spaces, or group spaces (available if your group has purchased group storage). This will determine where I/O is handled during the duration of your job. Each has its own benefits, outlined below:
| Home | Scratch | Group Space |
|---|---|---|
| Free | Free | $150/TB without backups |
| Automatically provisioned per user | 60 day automatic deletion of untouched files | $450/TB with backups |
| 50 GB soft limit | Two files systems: vast and nfs1 | Is shared among your group |
Due to the memory limits in each users home directory, we recommend setting up your jobs to run in our scratch file systems. It must be noted that files in the CHPC's scratch file systems will be deleted if untouched for 60 days.
To run jobs in the CHPC scratch file systems (vast or nfs1), place the following commands in your Slurm batch script. The commands that you use depend on what Linux shell you have.
Unsure? Type ' echo $SHELL ' in your terminal.
- Replace <file-system> with either vast or nfs1.
- $USER points to your uNID and $SLURM_JOB_ID points to the job ID that Slurm assigned your job.
Putting it all Together: An Example Slurm Script
Below is an example job that combines all of the information from above. In this example below, we will suppose your PI is Frodo Baggins (group ID baggins) and is requesting general user access to 1 lonepeak node with at least 8 cpus and 32GB of memory. The job will run for two hours.
#!/bin/bash
#SBATCH --account=baggins
#SBATCH --partition=lonepeak
#SBATCH --time=02:00:00
#SBATCH --ntasks=8
#SBATCH --mem=32G
#SBATCH -o slurmjob-%j.out-%N
#SBATCH -e slurmjob-%j.err-%N
#set up scratch directory
SCRDIR=/scratch/general/vast/$USER/$SLURM_JOB_ID
mkdir -p $SCRDIR
#copy input files and move over to the scratch directory
cp inputfile.csv myscript.r $SCRDIR
cd $SCRDIR
#load your module
module load R/4.4.0
#run your script
Rscript myscript.r inputfile.csv
#copy output to your home directory and clean up
cp outputfile.csv $HOME
cd $HOME
rm -rf $SCRDIR
For more examples of SLURM jobs scripts see CHPC MyJobs templates.
Submitting your Job to Slurm
In order to submit a job, one has to be logged onto the CHPC systems. Once logged
on, job submission is done with the sbatch command in slurm.
For example, to submit a script named SlurmScript.sh, type:
sbatch SlurmScript.sh
NOTE: sbatch by default passes all environment variables to the compute node, which differs
from the behavior in PBS (which started with a clean shell). If you need to start
with a clean environment, you will need to use the following directive in your batch
script:
#SBATCH --export=NONE
This will still execute .bashrc/.tcshrc scripts, but any changes you make in your
interactive environment will not be present in the compute session. As an additional
precaution, if you are using modules, you should use module purge to guarantee a fresh environment.
Checking the Status of your Job
To check the status of your job, use the squeue command. The output from the squeue command on its own will output all jobs currently
submitted to the cluster you are logged onto. You can filter the output of squeue
to jobs that only pertain to you in a number of ways:
squeue --me
squeue -u uNID
squeue -j job#
Adding -l (for "long" output) gives more details in the squeue output.
Slurm Job Arrays
Slurm arrays enable quick submission of many related jobs. In this case, Slurm provides an environment variable, SLURM_ARRAY_TASK_ID, which differentiates Slurm jobs with an array by a given index number.
For example, if we need to run the same program against 30 different samples, we can utilize Slurm arrays to run the program across the 30 different samples with a naming convention such as sample_[1-30].data using the following script:
#!/bin/bash
#SBATCH -n 1 # Number of tasks
#SBATCH -N 1 # All tasks on one machine
#SBATCH -p PARTITION # Partition on some cluster
#SBATCH -A ACCOUNT # The account associated with the above partition
#SBATCH -t 02:00:00 # 2 hours (D-HH:MM)
#SBATCH -o myprog%A%a.out # Standard output
#SBATCH -e myprog%A%a.err # Standard error
#SBATCH --array=1-30
./myprogram input_$SLURM_ARRAY_TASK_ID.data
You can also limit the number of jobs that can be running simultaneously to "n" by adding a %n after the end of the array range:
#SBATCH --array=1-30%5
Apart from $SLURM_ARRAY_TASK_ID, Slurm also utilizes a few environmental variables to represent various variables important to Slurm arrays. These include:
- %A and %a, which represent the job ID and the job array index, respectively. These can be used in the #SBATCH parameters to generate unique names.
- SLURM_ARRAY_TASK_COUNT is the number of arrays.
- SLURM_ARRAY_TASK_MAX is the highest job array index value.
- SLURM_ARRAY_TASK_MIN is the lowest job array index value.
When submitting jobs that use less than the full CPU count per node, use the shared partitions to allow multiple array jobs on one node. For more information, see the Node Sharing page.
Depending on the characteristics of your job, there may be a number of other solutions you could use, detailed on the running multiple serial jobs page.