Commit eb7b58d8 authored by Papatheodore, Thomas's avatar Papatheodore, Thomas
Browse files

Added check for NULL ROCR_VISIBLE_DEVICES value to always output cleanly

parent f1545b52
Loading
Loading
Loading
Loading
+3 −1
Original line number Diff line number Diff line
@@ -41,7 +41,9 @@ MPI 3 - OMP 3 - HWT 80 - Node lyra17 - RT_GPU_ID 0,1,2,3 - GPU_ID 0,1,2,3 -

> NOTE: Since there are 4 OpenMP threads per MPI rank, I've included `-c 8` to make sure each MPI rank has 4 physical CPU cores to spawn the 4 OpenMP threads on. The `-c` option counts hardware threads, not physical CPU cores (there are 2 hardware threads per physical core).

> NOTE: If the output comes out garbled, you likely don't have `ROCR_VISIBLE_DEVICES` set. This can be set manually before running, or set implicitly with the `--gpus-per-node` flag or `--ntasks-per-gpu` flag (although the latter is currently broken - see below for work around). It is always recommended to add a `| sort` at the end of the job step line for easier parsing (see some examples below).
> NOTE: RT_GPU_ID shows the "HIP runtime" numbering of the GPUs and GPU_ID shows the "node-level" numbering of the GPUs. The node-level numbering is what you would intuitively think of - the values 0, 1, 2, and 3 on a Lyra node - but as labeled by the HIP runtime, the GPUs visible to each MPI rank are numbered starting from 0. So if 2 MPI ranks have 2 GPUs available to them, MPI 0 might have GPUs 0 and 1 and MPI 1 might have GPUs 2 and 3. But the HIP runtime (as seen from `hipGetDevice`) will show GPU IDs 0 and 1 for both MPI ranks. This is not to say that both MPI ranks have access to the same 2 GPUs; just that the runtime labels the GPUs this way. In fact, the Bus ID for each GPU can be used to verify the MPI ranks do, in fact, have access to different GPUs.

> NOTE: If the value of GPU_ID is reported as "N/A", the environment varible `ROCR_VISIBLE_DEVICES` is not set. The program will still run fine without it - it's really only there to try to capture the node-level GPU IDs rather than the runtime GPU IDs. But the Bus IDs can be used to verify different GPUs. If desired though, `ROCR_VISIBLE_DEVICES` can be set manually before running, or set implicitly with the `--gpus-per-node` flag or `--ntasks-per-gpu` flag (although the latter is currently broken - see below for work around). It is always recommended to add a `| sort` at the end of the job step line for easier parsing (see some examples below).

### [OPTIONAL] `gpu_map.sh`

+9 −4
Original line number Diff line number Diff line
/**********************************************************

"Hello World"-type program to test different srun layouts.

Written by Tom Papatheodore

**********************************************************/

#include <stdlib.h>
@@ -41,8 +39,15 @@ int main(int argc, char *argv[]){
	int resultlength;
	MPI_Get_processor_name(name, &resultlength);

    // Find how many GPUs are set in environment variable
    const char* gpu_id_list = getenv("ROCR_VISIBLE_DEVICES");
    // If ROCR_VISIBLE_DEVICES is set, capture visible GPUs
    const char* gpu_id_list; 
    const char* rocr_visible_devices = getenv("ROCR_VISIBLE_DEVICES");
    if(rocr_visible_devices == NULL){
        gpu_id_list = "N/A";
    }
    else{
        gpu_id_list = rocr_visible_devices;
    }

	// Find how many GPUs HIP runtime says are available
	int num_devices = 0;