Slurm scheduler
Roar uses Slurm to fairly and efficiently distribute resources (CPUs, memory, and GPUs) among many different compute jobs.
Slurm directives
Slurm directives are used to specify the resources a job needs, such as cores, memory, and execution time. The can also be used to control how a job behaves, with options such as email alerts, job dependencies, and more.
Slurm directives are required for both interactive jobs and batch jobs. Portal sessions also use Slurm directives, which are typically specified via the request form that launches the session.
The most common directives are:
| Short option | Long option | Description |
|---|---|---|
-J |
--job-name |
name the job |
-A |
--account |
charge to an account |
-p |
--partition |
request a partition |
-N |
--nodes |
number of nodes |
-n |
--ntasks |
number of tasks (cores) |
| NA | --ntasks-per-node |
number of tasks per node |
| NA | --mem |
memory per node |
| NA | --mem-per-cpu |
memory per core |
-t |
--time |
maximum run time |
| NA | --gres |
GPU request |
-C |
--constraint |
required node features |
-e |
--error |
direct standard error to a file |
-o |
--output |
direct standard output to a file |
By default, standard output and standard error are both directed to slurm-<jobID>.out.
Output filenames can be customized: --output=<outFile>
directs standard error and output to <outFile>;
adding --error=<errFile> directs standard error to its own file <errFile>.
Specifying Slurm directives
Slurm directives appear at the top of a batch script using #SBATCH,
or as options for interactive jobs launched with salloc or srun.
On the Portal, you can use resource directives to further customize your job requests.
Environment variables
Slurm defines environment variables for jobs, which can be accessed in batch scripts to make them respond more flexibly:
| Environment Variable | Description |
|---|---|
SLURM_JOB_ID |
ID of the job |
SLURM_JOB_NAME |
Name of job |
SLURM_NNODES |
Number of nodes |
SLURM_NODELIST |
List of nodes |
SLURM_NTASKS |
Total number of tasks |
SLURM_NTASKS_PER_NODE |
Number of tasks per node |
SLURM_QUEUE |
Queue (partition) |
SLURM_SUBMIT_DIR |
Directory of job submission |
Replacement symbols
Replacement symbols can be used in Slurm directives, to customize filenames with information specific to the job being run:
| Symbol | Description |
|---|---|
%j |
Job ID |
%x |
Job name |
%u |
Username |
%N |
Hostname where the job is running |
For example, --output=%x.%j.out directs batch output to a file
named with the job name and jobID.
For more information on Slurm directives, environment variables, and replacement symbols, see Slurm sbatch documentation for batch jobs and Slurm salloc documentation for interactive jobs.