Slurm scheduler
Roar uses Slurm (Simple Linux Utility for Resource Management) to schedule submitted jobs and allocate compute resources.
Resource directives
Resource directives specify resources needed by a job, including the hardware to use (nodes, cores, GPUs, memory) and the run time. They are required for both interactive jobs and batch jobs. Interactive jobs via the Portal can also use resource directives.
The most common directives are:
Short option | Long option | Description |
---|---|---|
-J |
--job-name |
name the job |
-A |
--account |
charge to an account |
-p |
--partition |
request a partition |
-N |
--nodes |
number of nodes |
-n |
--ntasks |
number of tasks (cores) |
NA | --ntasks-per-node |
number of tasks per node |
NA | --mem |
memory per node |
NA | --mem-per-cpu |
memory per core |
-t |
--time |
maximum run time |
NA | --gres |
GPU request |
-C |
--constraint |
required node features only for paid accounts |
-e |
--error |
direct standard error to a file |
-o |
--output |
direct standard output to a file |
Environment variables
Slurm defines environment variables within the scope of a job:
Environment Variable | Description |
---|---|
SLURM_JOB_ID |
ID of the job |
SLURM_JOB_NAME |
Name of job |
SLURM_NNODES |
Number of nodes |
SLURM_NODELIST |
List of nodes |
SLURM_NTASKS |
Total number of tasks |
SLURM_NTASKS_PER_NODE |
Number of tasks per node |
SLURM_QUEUE |
Queue (partition) |
SLURM_SUBMIT_DIR |
Directory of job submission |
Replacement symbols
Replacement symbols can be used in Slurm directives, to build job names and filenames with information specific to the job being run:
Symbol | Description |
---|---|
%j |
Job ID |
%x |
Job name |
%u |
Username |
%N |
Hostname where the job is running |
For more information on Slurm directives, environment variables, and replacement symbols, see Slurm sbatch documentation for batch jobs and Slurm salloc documentation for interactive jobs.
Job output files
By default, batch job standard output and standard error
are both directed to slurm-%j.out
, where %j
is the jobID.
But output and error filenames can be customized:
#SBATCH -e = <file>
redirects standard error to <file>
,
and #SBATCH -o
likewise redirects standard output.
SLURM variables %x
(job name) and %u
(username)
are useful for this purpose. For example,
#SBATCH -eo = %u_%x.out
<username>_<jobname>.out
.