Contact Contact Us
Contact Us

Using Slurm

What is Slurm?

Slurm is a system on the HPC that helps manage the cluster and schedule jobs. It allocates resources on the cluster to monitor, execute and start your code. You can find documentation on Slurm here and a quick start guide here. The compute nodes, like chela01 and chela-g01 are split into partitions. We have two partitions on the HPC. guru ranges from chela01 to chela05.mahaguru contains the gpu node, chela-g01.

Running Slurm Jobs

To submit a job to the job queue, use sbatch. We recommend you use the following format for your batch scripts.

    
#!/bin/bash
#SBATCH --partition=guru #the partition we're using.
#SBATCH --nodes=2 #the number of nodes to use.
#SBATCH --ntasks=4 #the number of tasks to run.
#SBATCH --ntasks-per-node=2 #the number of tasks on a single node.
#SBATCH --mem-per-cpu=1gb #the amount of memory to use per cpu.
#SBATCH --cpus-per-task=1 #The number of cpu cores per task.
#SBATCH --time=00:05:00 #The time limit of the job.
#SBATCH --job-name=Template #Job name.
#SBATCH --output=slurm-%j.out #The standard output of our console logs.
#SBATCH --error=slurm_error-%j.out #The output of the error log.
    
    
Next, if you wanted to run a job on the GPU node, make sure you use the following options.
    
#SBATCH --partition=mahaguru #The partition of the gpu nodes.
#SBATCH --gres=gpu:4 #The number of gpus to use.
#SBATCH --nodes=1 #the number of nodes to use.
#SBATCH --ntasks=4 #the number of tasks to run.
#SBATCH --ntasks-per-node=4 #the number of tasks on a single node.
#SBATCH --mem-per-cpu=1gb #the amount of memory to use per cpu.
#SBATCH --cpus-per-task=1 #The number of cpu cores per task.
#SBATCH --time=00:05:00 #The time limit of the job.
#SBATCH --job-name=Template #Job name.
#SBATCH --output=slurm-%j.out #The standard output of our console logs.
#SBATCH --error=slurm_error-%j.out #The output of the error log.
    
    
You can also run sbatch with additional options like,
  • --cpus-per-gpu=2
  • --job-name=test
  • etc
Here is an example for a script named, myScript.sh.

sbatch --cpus-per-gpu=2 --job-name=test myScript.sh
Submitted batch job 48250

Afterwards, you can find the terminal output at slurm-48250.out.

Here is another example, sbatch blastScript.sh

You can find additional documentation for sbatch here. Finally, you can run an interactive slurm session with
  • salloc
You can find more information on it here.

Additional Slurm Commands

  • sinfo to get a list of partitions on the system, along with their state and the nodes they contain.
  • squeue to get info on all the queued jobs along with their state.
  • scancel <jobId> to stop your job. Example: scancel 48250
  • srun <script> to run a parallel job in Slurm. It uses the same allocation of the environment the command is run in. It is recommended that you use it in a slurm script to run tasks. Example: srun myScript.sh,
    srun ls