@@ -12,9 +12,36 @@ To compile, you'll need to have HIP and MPI installed, and you'll need to use an
* CC + CrayMPI
> NOTE: When using Cray's MPI, you must set `export MV2_ENABLE_AFFINITY=0` to properly use Slurm's binding flags. Otherwise, the Cray MPI binding will take precedence and might give unexpected/undesired results.
## Usage
To run, simply launch the executable with your favorite job launcher.
To run, simply launch the executable with your favorite job launcher. For example...
> NOTE: Since there are 4 OpenMP threads per MPI rank, I've included `-c 8` to make sure each MPI rank has 4 physical CPU cores to spawn the 4 OpenMP threads on. The `-c` option counts hardware threads, not physical CPU cores (there are 2 hardware threads per physical core).
> NOTE: If the output comes out garbled, you likely don't have `ROCR_VISIBLE_DEVICES` set. This can be set manually before running, or set implicitly with the `--gpus-per-node` flag or `--ntasks-per-gpu` flag (although the latter is currently broken). It is always recommended to add a `| sort` at the end of the job step line for easier parsing (see some examples below).