Commit 0f98a85e authored by Parete-Koon, Suzanne's avatar Parete-Koon, Suzanne
Browse files

Update README.md

parent d53a3e43
# Cades Condo GPUs # This Challenge
CADES Condo's have several GPUS This example and challenge are designed to show you an example of how to submit gpu jobs on the CADES condos. They are not designed to teach you fully how to use OpenACC. For that and other GPU porting techniques please register for NVIDA’s GPU class that runs one day a month for the next few months. Register at https://www.olcf.ornl.gov/calendar/cuda-shared-memory/
#### Open Condo Queues for Condo Groups on SLURM
| Name | Max Walltime (D-H:M:S) | # Nodes | Cores | Micro arch. | RAM | Local Scratch | GPU |
| --------------- | ----------------------- | ------- | ----------- | ------------------ | ----------------- | ----------------- | ---------------- |
| `testing` | 4:0:0 | 4 | 32 | Broadwell | 62G | 160G | N/A |
| `batch` | 14-0:0:0 | 38, 25 | 32, 36 | Haswell, Broadwell | 125G | 233G, 2T | N/A |
| `high_mem` | 14-0:0:0 | 52 | 36 | Broadwell | 250G | 2T | N/A |
| `high_mem_cd` | 14-0:0:0 | 6 | 36 | Skylake | 375G | 2T | N/A |
| `gpu` | 14-0:0:0 | 9 | 32 | Haswell | 250G | 233G | 2x K80 (GK210) |
| `gpu_p100` | 14-0:0:0 | 8 | 36 | Broadwell | 500G | 2T | 2x P100 (GP100) |
The birthright nodes have 2 GPUS each. We will use one of them to do a vector addition in this challenge. The birthright nodes have 2 GPUS each. We will use one of them to do a vector addition for this challenge.
# OpenACC on Condo setup
For this example and challenge you will need to have done the Globus challenge to get the files. For this example and challenge you will need to have done the Globus challenge to get the files.
Otherwise, please login to or-slurm-login01 and cp -r /lustre/or-hydra/cades-birthright/world-shared/GPU-Basics . Otherwise, please login to or-slurm-login01 and cp -r /lustre/or-hydra/cades-birthright/world-shared/GPU-Basics .
## VecAdd.c
## VecAdd.c VecAdd.c adds two vectors A and B to produce C, where Ci = Ai + Bi.
To see the code
```
$ cd GPU-Basics
$ vi VecAcc.c
VecAdd.c adds two vectors A and B to produce C, where Ci = Ai + Bi. This version of code was brought to you Lustre Birthright scratch space during the Globus challenge. ```
It uses OpenACC directives to copy two vectors, a and b to the GPU and add them, It uses OpenACC directives, called pragmas, to copy the two vectors, a and b, to the GPU add them them, and
then copies the result back to the GPU. If you want to see a more complete explanation copy the result back to the CPU. If you want to see a more complete explanation
of this see the CADES self-guided tutorial in the SLRUM Migration section of our Condo User Guide: of this see the CADES self-guided tutorial in the SLRUM Migration section of our Condo User Guide:
https://support.cades.ornl.gov/user-documentation/_book/condos/slurm/openacc-tutorial.html https://support.cades.ornl.gov/user-documentation/_book/condos/slurm/openacc-tutorial.html
...@@ -33,112 +29,131 @@ https://support.cades.ornl.gov/user-documentation/_book/condos/slurm/openacc-tut ...@@ -33,112 +29,131 @@ https://support.cades.ornl.gov/user-documentation/_book/condos/slurm/openacc-tut
#include <stdio.h> #include <stdio.h>
#include <stdlib.h> #include <stdlib.h>
#include <math.h> #include <math.h>
int main( int argc, char* argv[] ) int main( int argc, char* argv[] )
{ {
// Size of vectors // Size of vectors
int n = 10000; int n = 10000;
// Input vectors // Input vectors
double *restrict a; double *restrict a;
double *restrict b; double *restrict b;
// Output vector // Output vector
double *restrict c; double *restrict c;
// Size, in bytes, of each vector // Size, in bytes, of each vector
size_t bytes = n*sizeof(double); size_t bytes = n*sizeof(double);
// Allocate memory for each vector // Allocate memory for each vector
a = (double*)malloc(bytes); a = (double*)malloc(bytes);
b = (double*)malloc(bytes); b = (double*)malloc(bytes);
c = (double*)malloc(bytes); c = (double*)malloc(bytes);
// Initialize content of input vectors, vector a[i] = sin(i)^2 vector b[i] = cos(i)^2 // Initialize content of input vectors, vector a[i] = sin(i)^2 vector b[i] = cos(i)^2
int i; int i;
for(i=0; i<n; i++) { for(i=0; i<n; i++) {
a[i] = sin(i)*sin(i); a[i] = sin(i)*sin(i);
b[i] = cos(i)*cos(i); b[i] = cos(i)*cos(i);
} }
// sum component wise and save result into vector c // sum component wise and save result into vector c
#pragma acc kernels copyin(a[0:n],b[0:n]), copyout(c[0:n]) #pragma acc kernels copyin(a[0:n],b[0:n]), copyout(c[0:n])
for(i=0; i<n; i++) { for(i=0; i<n; i++) {
c[i] = a[i] + b[i]; c[i] = a[i] + b[i];
} }
// Sum up vector c and print result divided by n, this should equal 1 within error // Sum up vector c and print result divided by n, this should equal 1 within error
double sum = 0.0; double sum = 0.0;
for(i=0; i<n; i++) { for(i=0; i<n; i++) {
sum += c[i]; sum += c[i];
} }
sum = sum/n; sum = sum/n;
printf("final result: %f\n", sum); printf("final result: %f\n", sum);
// Release memory // Release memory
free(a); free(a);
free(b); free(b);
free(c); free(c);
return 0; return 0;
} }
``` ```
## Compiling VecAdd.c with OpenAcc ## Compiling VecAdd.c with OpenAcc
The PGI compiler is the only CADES compiler that has full OpenACC Support.
To set up your progrmming enviroment: The PGI compiler is the only CADES compiler that has full OpenACC support.
To set up your programming environment:
``` ```
$ module purge $ module purge
$ module load pgi/19.4 $ module load pgi/19.4
``` ```
To compile this code with the pgcc compiler and OpenAcc use the `-acc` flag.
To compile this code with the pgcc compiler and OpenAcc use the `-acc` flag. The pgi compiler’s `-Minfo` flag allows you to see what the compiler is doing with your code
The pgi compiler `-Minfo` flag allow you to see what the compiler is doing with your code
## Challenge part 1: ## Challenge part 1:
Set up your programming enviroment and fill in the blanks to compile the code with the -acc flag and the - Minfo flag. Set up your programming environment and fill in the blanks to compile the code with the -acc flag and the - Minfo flag.
``` ```
$ pgcc -___ -_____ vecAdd.c -o VecAdd.c $ pgcc -___ -_____ VecAdd.c -o VecAdd.o
``` ```
Did the compiler generate any code for the GPU? How can you tell? Did the compiler generate any code for the GPU? How can you tell?
Run the code. Your vector should sum to a value of 1.
```
$ ./VecAdd.o
```
## GPU Batch Scripting ## GPU Batch Scripting
#### CADES Birthright
Standard SBATCH directives: Standard SBATCH directives for CADES Birthright:
``` ```
#SBATCH -A birthright #SBATCH -A birthright
#SBATCH -p gpu #SBATCH -p gpu
#SBATCH --gres=gpu:2 #SBATCH --gres=gpu:2
``` ```
You must use --gres=gpu:2 to tell the job to use the node's GPUs. You must use --gres=gpu:2 to tell the job to use the node's GPUs.
We will also use NVprof, NVIDIA's built-in profiler. It will show you that your code is running on the GPU and also give you performance information about the code.
### Challenge 2 Submit a GPU Job with SLRUM You must be on a compute node to use NVprof and you must load the cuda module.
To use nvprof inside your batch script :
```
. . .
#SBATCH --mail-user=<your_email>
module purge
module load pgi/19.4
module load cuda
Below is a batch script to run the VecAdd.o, that you compiled on the GPU, in challange 1. Note that we are using NVProf. Fill in the blanks using your knowledge of the SLURM batch directives
and the CADES software environment.
In the sample folder the script is called gpu.sbatch. nvprof ./VecAdd.o
```
### Challenge 2 Submit a GPU Job with SLRUM
Issue: Below is a batch script to run the VecAdd.o, that you compiled in challenge 1. Note that we are using NVProf. Fill in the blanks using your knowledge of the SLURM batch directives
and the CADES software environment. If your compilation failed copy the VecAcc.o file from the “answers” folder into your work area to do this part.
In your work area the sbatch script is called gpu.sbatch.
Edit the script with vi by filling in the blanks as described below.
``` ```
vi gpu.sbatch $ vi gpu.sbatch
``` ```
Fill in the missing parts of the script as described below. **gpu.sbatch **
**run_vecadd.sbatch**
``` ```
#!/bin/bash #!/bin/bash
...@@ -157,8 +172,24 @@ Fill in the missing parts of the script as described below. ...@@ -157,8 +172,24 @@ Fill in the missing parts of the script as described below.
#SBATCH --mail-user=<your_email> #SBATCH --mail-user=<your_email>
module purge module purge
module load pgi/19.4 module load _____
./vecAdd.o module load _____
nvprof ./VecAdd.o
```
To see where you stand in the Queue issue:
```
squeue -u <userID>
```
A job state of “PD” means your job is waiting in the queue.
A job state of “R” means that it is running.
To see details about your job, issue
```
scontrol show job <jobID>
``` ```
When you code runs open your output files and see if the vectors summed to 1.0. After the code runs, open your gpu-error.txt file and see if the code ran on the GPU.
How long did it spend copying data to the GPU?
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment