Commit a787c304 authored by Dietz, Colin's avatar Dietz, Colin
Browse files

Update challenges/slurm/README.md

parent 843c2be0
......@@ -6,34 +6,34 @@ Slurm is a job scheduler and resource management program with the combined funct
There are a few key differences to be aware of between Moab/Torque and Slurm:
* Terminology
** This is largely the same with a few key differences
*** Moab Queues are referred to as Partitions in Slurm
*** PBS parameters in Moab job scripts are analogus to SBATCH parameters in Slurm
* Scheduler Policy
** Resources that are not requested are not allocated
*** Resource requests are enforced through cgroups
** Nodes, cores/tasks, memory, walltime, account, and queue information must be specified in job scripts
*** No default values are set for these resources
** "Burst" job submission is simplified in Slurm
*** One central burst queues is used, and user account and qos no longer need to be specified in your job script
* Commands
** Functions from multiple Moab/Torque commands are typically combined in Slurm commands
*** `qsub -> sbatch`
*** `qsub/pbsdsh -> srun`
*** `qstat/showq -> squeue`
*** `checknode/showbf -> sinfo`
*** `checkjob/mschedctl -> scontrol`
* Command Examples
** Here are a few examples of equivalent commands betwene the two schedulers
*** `qsub test.sh -> sbatch test.sh`
*** `showq -u <uid> -> squeue -u <uid>`
*** `checkjob <job_id> -> scontrol show job <job_id>`
*** `showbf -f gpu -> sinfo -p gpu`
*** `qsub -I -A cades-birthright -w group_list=birthright -q gpu -> srun -A birthright -p gpu --pty /bin/bash`
- Terminology
- This is largely the same with a few key differences
- Moab Queues are referred to as Partitions in Slurm
- PBS parameters in Moab job scripts are analogus to SBATCH parameters in Slurm
- Scheduler Policy
- Resources that are not requested are not allocated
- Resource requests are enforced through cgroups
- Nodes, cores/tasks, memory, walltime, account, and queue information must be specified in job scripts
- No default values are set for these resources
- "Burst" job submission is simplified in Slurm
- One central burst queues is used, and user account and qos no longer need to be specified in your job script
- Commands
- Functions from multiple Moab/Torque commands are typically combined in Slurm commands
- `qsub -> sbatch`
- `qsub/pbsdsh -> srun`
- `qstat/showq -> squeue`
- `checknode/showbf -> sinfo`
- `checkjob/mschedctl -> scontrol`
- Command Examples
- Here are a few examples of equivalent commands betwene the two schedulers
- `qsub test.sh -> sbatch test.sh`
- `showq -u <uid> -> squeue -u <uid>`
- `checkjob <job_id> -> scontrol show job <job_id>`
- `showbf -f gpu -> sinfo -p gpu`
- `qsub -I -A cades-birthright -w group_list=birthright -q gpu -> srun -A birthright -p gpu --pty /bin/bash`
## Slurm Challenge 1
......@@ -41,19 +41,19 @@ There are a few key differences to be aware of between Moab/Torque and Slurm:
These modification must be made to existing PBS jobs scripts to make them compatible with Slurm:
* `$PBS_O_WORKDIR -> $SLURM_SUBMIT_DIR`
** Environment variables such as PBS_O_WORKDIR will need to be replaced with Slurm equivalents, or defined manually
* `-A birthright-burst -> -A birthright`
** The account used to submit the job may or may not need to be updated. Valid Slurm account names can be found using this command:
*** sacctmgr show assoc where user=<uid> format=account
* `-q gpu -> -q gpu`
** The queue that the job is submitted to may need to be updated. Valid queue names can be found with the sinfo command.
* `-l walltime=<time>`
** A maximum walltime request is required
* `-l mem=<number>[unit]`
** A memory request is required
* `-l nodes=1:ppn=1`
** Nodes and ppn requests are required
- `$PBS_O_WORKDIR -> $SLURM_SUBMIT_DIR`
- Environment variables such as PBS_O_WORKDIR will need to be replaced with Slurm equivalents, or defined manually
- `-A birthright-burst -> -A birthright`
- The account used to submit the job may or may not need to be updated. Valid Slurm account names can be found using this command:
- sacctmgr show assoc where user=<uid> format=account
- `-q gpu -> -q gpu`
- The queue that the job is submitted to may need to be updated. Valid queue names can be found with the sinfo command.
- `-l walltime=<time>`
- A maximum walltime request is required
- `-l mem=<number>[unit]`
- A memory request is required
- `-l nodes=1:ppn=1`
- Nodes and ppn requests are required
### Challenge
......@@ -62,13 +62,13 @@ These modification must be made to existing PBS jobs scripts to make them compat
```
/lustre/or-hydra/cades-birthright/<user_id>/cades-spring-training-master/slurm/example1/
```
** This directory contains ex1_job_script.pbs, an example “Hello World” PBS job script
- This directory contains ex1_job_script.pbs, an example “Hello World” PBS job script
3. Make a copy of the example script and name it ex1_job_script.sbatch
4. Using the previous slide as an reference, update the job script to run under Slurm
** Add a walltime request of 10 minutes to the script
** Add a memory request of 10 gigabytes to the script
** Change the queue name from gpu to testing
*** The testing queue is a limited queue for short-running test jobs
- Add a walltime request of 10 minutes to the script
- Add a memory request of 10 gigabytes to the script
- Change the queue name from gpu to testing
- The testing queue is a limited queue for short-running test jobs
5. After updating the script, try to submit it using this command:
```
sbatch ex1_job_script.sbatch
......@@ -107,7 +107,7 @@ echo “Hello World” ...................... echo "Hello World"
```
/lustre/or-hydra/cades-birthright/<user_id>/cades-spring-training-master/slurm/example2/
```
** This directory contains ex2_job_script.pbs, an example PBS job script for running Quantum Espresso
* This directory contains ex2_job_script.pbs, an example PBS job script for running Quantum Espresso
3. Make a copy of the example script and name it ex2_job_script.sbatch
4. Using the previous two slides as an reference, translate the job script from PBS to SBATCH
5. After converting the job script, try to submit it using this command:
......@@ -117,7 +117,7 @@ sbatch ex2_job_script.sbatch
6. If any errors occur when submitting, try to fix the job script and re-submit to test. Feel free to ask for help if you encounter an error you can’t get past.
About this challenge:
* Quantum Espresso is a suite of electron-structure calculation and material modeling tools. The job script and data files used in the challenge are slightly modified for this training, but are meant to demonstrate how you could run these programs on the CADES condos for a real production run.
- Quantum Espresso is a suite of electron-structure calculation and material modeling tools. The job script and data files used in the challenge are slightly modified for this training, but are meant to demonstrate how you could run these programs on the CADES condos for a real production run.
The solution to this challenge is available in the solutions folder under ex2_job_script.sbatch
......@@ -149,9 +149,9 @@ input_files=(in in2)
mpirun pw.x -in "../data/${input_files[$PBS_ARRAYID]}"
```
* Important Notes
** Slurm uses the parameter #SBATCH -a to specify job arrays, but has the same syntax as PBS for specifying the array index range and slot limit.
** The Slurm equivalent of PBS_ARRAYID is SLURM_ARRAY_TASK_ID
- Important Notes
- Slurm uses the parameter #SBATCH -a to specify job arrays, but has the same syntax as PBS for specifying the array index range and slot limit.
- The Slurm equivalent of PBS_ARRAYID is SLURM_ARRAY_TASK_ID
### Challenge
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment