Hardware requests
Users with paid credit accounts or allocations
can request GPU nodes,
and fine-tune their hardware requests with constraint
directives.
GPUs
GPUs are only available to paid credit accounts,
or allocations that include GPU nodes.
For a batch job paid by a credit account, to request a single GPU:
--partition=standard
--gres=gpu:1
To request n GPUs, replace 1 above by n.
To request a specific model of GPU, use --gres=gpu:a100:1
.
For an interactive job paid by a credit account, use salloc
:
salloc -A <account> -p standard --gres=gpu:a100:1 ...
If the job is paid by an allocation, use -p sla-prio
instead of -p standard
.
For information on available GPU nodes, see Compute hardware. For the names of different GPU types (a100, a40, v100, p100,...) see Hardware info.
Make sure your application is GPU-enabled.
If your application does not use GPUs,
requesting GPUs will do nothing except deplete your accounts.
Hardware info
Even within different hardware partitions, not all nodes on Roar are identical. Often, software compiled for one type of CPU or GPU will not run on another, older type.
To find out about hardware on different nodes, there are several options.
If you are logged onto a compute node with an interactive job,
the command lscpu
displays information about the CPUs;
nvidia-smi
displays information about the GPUs (if present).
The SLURM command sinfo
displays information about all Roar nodes.
Its output is more easily read with some formatting options,
sinfo --Format=features:30,nodelist:20,cpus:5,memory:10,gres:30
On Roar, sinfo
output would look like:
AVAIL_FEATURES NODELIST CPUS MEMORY GRES
bc,basic,broadwell,open p-bc-[5001-5240] 24 126400 (null)
sc,standard,broadwell,open p-hc-[6001-6002] 56 1024000 (null)
sc,standard,haswell,open p-sc-[2337-2569] 24 257800 (null)
bc,basic,sapphirerapids p-bc-[5401-5520] 64 255000 (null)
standard,a100_1g,mig,cascadelap-gc-3037 48 380000 gpu:a100_1g:14(S:0-11,36-47)
sc,standard,icelake p-sc-[2169-2308] 48 512000 (null)
sc,standard,cascadelake p-sc-[2001-2156] 48 380000 (null)
standard,a100,cascadelake p-gc-[3001-3004,300648 380000 gpu:a100:2(S:0-11,36-47)
sc,standard,genoa p-zc-[7001-7035] 64 380000 (null)
standard,a100_3g,mig,cascadelap-gc-3036 48 380000 gpu:a100_3g:4(S:0-11,36-47)
standard,a100_3g,a100_1g,cascap-gc-3005 48 380000 gpu:a100_3g:3(S:0-11,36-47),gp
standard,p100,broadwell p-gc-[3101-3109,311228 256000+ gpu:p100:1(S:0-13)
standard,p100,broadwell p-gc-[3110-3111,311328 256000 gpu:p100:1(S:14-27)
standard,v100,haswell p-gc-[3192-3193] 24 256000 gpu:v100:1(S:0-11)
standard,v100,skylake p-gc-[3201-3202] 28 770000 gpu:v100:4(S:0-27)
standard,a40 p-ic-[4001-4012] 36 500000 gpu:a40:1(S:0-17)
hc,himem,icelake p-sc-[2309-2336] 48 1024000 (null)
ma p-cl-0001 20 3045000 (null)
interactive,p100,broadwell p-gc-[3161-3176] 28 256000 (null)
v100s,mri,mgc p-mc-[3470-3472] 40 1540000 gpu:v100s:10(S:0-39)
t4,nih,mgc p-mc-[3477-3480] 40 1540000 gpu:t4:16(S:0-39)
rtx6000,mri,mgc p-mc-[3473-3474] 40 770000 gpu:rtx_6000:3(S:0-9,20-39)
v100nv,mri,mgc p-mc-[3475-3476] 40 770000 gpu:v100:4(S:0-39)
Evidently, node attributes serve to identify nodes with a given
- CPU type (broadwell, haswell, ...)
- GPU type (a100, a40, v100, p100)
- partition (bc, sc, hc, gc, ic,...)
- specific hardware combinations (p100_256, 3gc20gb, ...)
Constraints
Users with paid credit accounts and allocations can fine-tune their hardware requests
with constraint
directives. In a batch script, constraints take the form:
#SBATCH --constraint=<feature>
where <feature>
is one of the features listed by sinfo
(or multiple features, separated by commas).
For example, to request cascadelake
hardware, use --constraint=cascadelake
.
For an interactive job, constraints are given
with a -C
option to salloc
:
salloc -N 1 -n 4 -A <alloc> -C <feature> -t 1:00:00
Resource requests must match the allocation.
For paid allocations, constraint directives must be consistent with the terms of the allocation. For credit accounts, any hardware can be requested.