Batch jobs
For compute jobs that take hours or days to run, instead of sitting at the terminal waiting for the results, we submit a "batch job" to the workload manager, which runs the job when resources are available.
Slurm commands
On Roar, the queue manager is Slurm (Simple Linux Utility for Resource Management).
Besides salloc for interactive jobs),
the basic Slurm commands are:
| Command | Effect |
|---|---|
sbatch <script> |
submit batch job <script> |
squeue -u <userid> |
check on jobs submitted by <userid> |
scancel <jobID> |
cancel the job |
When you execute sbatch myJob.sh, Slurm responds with something like
Submitted batch job 25789352
squeue -u <userID>; Slurm responds with something like
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
25789352 open myJob.sh abc123 R 1:18:31 1 p-sc-2008
To cancel the job, execute
scancel 25789352.
Batch scripts
Jobs submitted to Slurm are in the form of a "batch script".
A batch script is a shell script that executes commands,
with a preamble of Slurm resource directives
#SBATCH... to specify
- a Slurm account id to charge;
- a partition (type of nodes) to run on;
- nodes, cores, memory, GPUs, and time;
- and other job-related parameters.
For more information on partitions, see Partitions.
For more information on resource requests, see Resource requests.
An example batch script:
#!/bin/bash
#SBATCH --account=account_id
#SBATCH --partition=basic
#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --mem=1gb
#SBATCH --time=4:00:00
#SBATCH --job-name=example-job
#SBATCH --output=example-job.%j.out
# load software
module load python/3.11.2
python3 myscript.py
To use a paid allocation, use --partition=sla-prio
Jobs under a paid allocation do not specify the basic, standard, high-memory, or interactive partitions. Instead, --partition=sla-prio tells the job to use the hardware in your allocation.
The first line #!/bin/bash is the "shebang", which says the script
should be run under bash (a Linux shell).
Everything after the last #SBATCH are commands to be executed;
lines with # other than #SBATCH are ordinary bash script comments.
Arguments in batch scripts
sbatch can pass arguments to batch scripts like this:
sbatch myScript.sh arg1 arg2
In the script, arguments arg1 and arg2 can be accessed with $1 and $2 as usual:
#!/bin/bash
#SBATCH --account=account_id
...
python3 myscript.py $1 $2
sbatch can also pass values by assigning variables like this:
sbatch --export=VAR1=arg1, VAR2=arg2 myScript.sh
In the script, $VAR1 and $VAR2 are set to arg1 and arg2.
#!/bin/bash
#SBATCH --account=account_id
...
python3 myscript.py $VAR1 $VAR2
Batch Script Examples
ICDS offers a curated repository of example submit scripts for many of our most popular software packages, including StarCCM, COMSOL, MATLAB, R, python, and more.
Estimating resource usage
For credit accounts, it is helpful to estimate how many credits a batch job would use
before you actually run it. For this, use job_estimate:
job_estimate <submit file>
which reports the cost in credits.
For more on job_estimate, execute job_estimate --help.
The Slurm command sacct
reports the resources used by a completed batch job,
which helps users learn what resources to request next time.
At the bottom of a batch script, the command
sacct -j $SLURM_JOB_ID --format=JobID,JobName,MaxRSS,Elapsed,TotalCPU,State
generates a report in the batch output file of resources used. Using the
--format flag controls what content is displayed. See sacct --helpformat
for more formatting options.
Selecting nodes and cores
Choosing the right number of cores (--ntasks) and nodes (--nodes) depends on how your software is designed to run in parallel. It's important to understand if your job is built for a distributed environment or a shared-memory environment.
Most parallel software is multi-threaded, meaning it's designed to run on a single computer and use multiple cores that share the same memory. If this describes your workflow, you should almost always set --nodes=1 and then set --ntasks to the number of independent tasks your job can run at the same time. Requesting more cores than your application can actually use will not speed it up and only wastes resources.
Some advanced applications (often using MPI) are designed to run across multiple, separate
computers at once, communicating over the network. Only if your software is specifically
built for this should you set --nodes to a value greater than one.
Selecting memory for Your Job
Correctly estimating memory (--mem) can be tricky, but it is critical for ensuring your job runs successfully. Requesting too little will cause your job to fail, while requesting too much can increase your queue time and cost.
A good starting point is to calculate the size of the data your application needs to load into memory at one time. Once you have an estimate, it is safe practice to request about 20% more memory than you think you need. This extra buffer accommodates the operating system and other side processes that run alongside your job.
The most reliable method is to run a short test job with a generous memory allocation and then check the actual peak usage. You can use the sacct command after your job finishes to see the MaxRSS (Maximum Resident Set Size). This will tell you precisely how much memory your job used, allowing you to make very accurate requests for future runs.
Timing jobs
It is good practice to test a new workflow by running small short jobs before submitting big long jobs. To help plan your compute usage, it is helpful to time such test jobs.
Many well-designed applications display timing information
at the end of the log files they generate.
If this is not the case for your application,
you can find out how long a batch job takes
by sandwiching the commands you execute
between date commands:
date
<commands>
date
time <command>,
which will write timing information to standard output.