Commit 588d5341 authored by Papatheodore, Thomas's avatar Papatheodore, Thomas
Browse files

added comments to example_map.sh and further clarification in the README

parent 9430148c
Loading
Loading
Loading
Loading
+1 −1
Original line number Diff line number Diff line
@@ -12,4 +12,4 @@ To run, simply launch the executable with your favorite job launcher.

> NOTE: `HIP_VISIBLE_DEVICES` must be set.

> NOTE: On Lyra, the current Slurm doesn't easily allow for fine-grained process/thread placement so an example mapping script is also included in this repo. It can be modifed and called "in front of" `hello_jobstep` (or any other executable really). The script uses `numactl` to map hardware threads and GPUs to node-local MPI ranks.
> [OPTIONAL]: On Lyra, the current Slurm doesn't easily allow for fine-grained process/thread placement so an example mapping script is also included in this repo. It can be modifed and called "in front of" `hello_jobstep` (or any other executable really). The script uses `numactl` to map hardware threads and GPUs to node-local MPI ranks. NOTE: You will need to use the `srun` argument `--ntasks_per_gpu` with this script.
+34 −4
Original line number Diff line number Diff line
#!/bin/bash

export APP=$1
#------------------------------------------------------
# You'll need to read in more command line args if your
# executable takes arguments
#------------------------------------------------------
APP=$1

lrank=$(($SLURM_PROCID % 4))
#------------------------------------------------------
# The number of node-local MPI ranks
# The `--ntasks_per_node` flag to srun should be used
#------------------------------------------------------
lrank=$(($SLURM_PROCID % $SLURM_NTASKS_PER_NODE))

#------------------------------------------------------
# Ideally, the number of hardware threads set below
# for each rank with numactl should be the same as
# OMP_NUM_THREADS
#------------------------------------------------------
export OMP_NUM_THREADS=4
export OMP_PLACES=cores

#------------------------------------------------------
# Set hardware threads and GPUs for each node-local
# MPI rank. NOTE: For more than 4 MPI ranks per node, 
# additional cases would need to be added.
#------------------------------------------------------
case ${lrank} in
[0])
export HIP_VISIBLE_DEVICES=0,1
export HIP_VISIBLE_DEVICES=0
numactl --physcpubind=64,65,66,67 $APP
  ;;

[1])
export HIP_VISIBLE_DEVICES=2,3
export HIP_VISIBLE_DEVICES=1
numactl --physcpubind=68,69,70,71 $APP
  ;;

[2])
export HIP_VISIBLE_DEVICES=2
numactl --physcpubind=72,73,74,75 $APP
  ;;

[3])
export HIP_VISIBLE_DEVICES=3
numactl --physcpubind=76,77,78,79 $APP
  ;;

esac