README.md 4.97 KB
Newer Older
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
1
# This Challenge
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
2

Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
3
This example and challenge are designed to show you an example of how to submit gpu jobs on the CADES condos. They are not designed to teach you fully how to use OpenACC. For that and other GPU porting techniques please register for NVIDA’s GPU class that runs one day a month for the next few months. Register at https://www.olcf.ornl.gov/calendar/cuda-shared-memory/
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
4

Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
5
The birthright nodes have 2 GPUS each. We will use one of them to do a vector addition for this challenge.
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
6
7

For this example and challenge you will need to have done the Globus challenge to get the files.
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
8
9
Otherwise, please login to or-slurm-login01 and cp -r /lustre/or-hydra/cades-birthright/world-shared/GPU-Basics .

Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
10
# VecAdd.c
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
11

Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
12
13
14
15
16
17
VecAdd.c adds two vectors A and B to produce C, where Ci = Ai + Bi. 

To see the code 
```
$ cd GPU-Basics
$ vi VecAcc.c
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
18

Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
19
```
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
20

Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
21
22
23
It uses OpenACC directives, called pragmas, to copy the two vectors, a and b, to the GPU add them them, and 
 copy the result back to the CPU. If you want to see a more complete explanation
of this see the CADES self-guided tutorial in the SLRUM Migration section of our Condo User Guide:
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
24
25
26
27
28
29
30
31
https://support.cades.ornl.gov/user-documentation/_book/condos/slurm/openacc-tutorial.html



```
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
32

Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
33
34
int main( int argc, char* argv[] )
{
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
35

Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
36
37
    // Size of vectors
    int n = 10000;
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
38
    
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
39
40
41
    // Input vectors
    double *restrict a;
    double *restrict b;
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
42
    // Output vector 
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
43
    double *restrict c;
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
44
    
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
45
46
    // Size, in bytes, of each vector
    size_t bytes = n*sizeof(double);
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
47
    
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
48
    // Allocate memory for each vector
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
49
50
    a = (double*)malloc(bytes); 
    b = (double*)malloc(bytes); 
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
51
    c = (double*)malloc(bytes);
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
52
    
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
53
54
55
56
57
    // Initialize content of input vectors, vector a[i] = sin(i)^2 vector b[i] = cos(i)^2
    int i;
    for(i=0; i<n; i++) {
        a[i] = sin(i)*sin(i);
        b[i] = cos(i)*cos(i);
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
58
59
    }   
    
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
60
61
62
63
    // sum component wise and save result into vector c
    #pragma acc kernels copyin(a[0:n],b[0:n]), copyout(c[0:n])
    for(i=0; i<n; i++) {
        c[i] = a[i] + b[i];
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
64
65
66
    }   
    
    // Sum up vector c and print result divided by n, this should equal 1 within error 
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
67
68
69
    double sum = 0.0;
    for(i=0; i<n; i++) {
        sum += c[i];
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
70
    }   
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
71
72
    sum = sum/n;
    printf("final result: %f\n", sum);
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
73
    
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
74
75
76
77
    // Release memory
    free(a);
    free(b);
    free(c);
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
78
    
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
79
    return 0;
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
80
}   
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
81
82
83
```

## Compiling VecAdd.c with OpenAcc
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
84
85
86
87

The PGI compiler is the only CADES compiler that has full OpenACC support.
To set up your programming environment:

Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
88
89
90
91
92
```
$ module purge
$ module load pgi/19.4

```
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
93
To compile this code with the pgcc compiler and OpenAcc use the `-acc` flag.
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
94

Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
95
The pgi compiler’s `-Minfo` flag allows you to see what the compiler is doing with your code
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
96

Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
97
# Challenge part 1: Compile Using PGI and OpenACC
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
98
99


Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
100
Set up your programming environment and fill in the blanks to compile the code with the -acc flag and the - Minfo flag.
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
101
102
```

Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
103
$ pgcc -___ -_____ VecAdd.c -o VecAdd.o
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
104
105
106

```

Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
107
Did the compiler generate any code for the GPU? How can you tell?
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
108

Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
109
Run the code. Your vector should sum to a value of 1.
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
110

Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
111
112
113
```
$ ./VecAdd.o
```
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
114
# GPU Batch Scripting
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
115
116


Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
117
Standard SBATCH directives for CADES Birthright: 
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
118
119
120
121
122
```
#SBATCH -A birthright
#SBATCH -p gpu
#SBATCH --gres=gpu:2
```
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
123
You must use --gres=gpu:2 to tell the job to use the node's GPUs.
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
124

Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
125
We will also use NVprof, NVIDIA's built-in profiler. It will show you that your code is running on the GPU and also give you performance information about the code. 
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
126

Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
127
128
129
130
131
132
133
134
135
136
137
You must be on a compute node to use NVprof and you must load the cuda module.

To use nvprof inside your batch script :
```
. . . 

#SBATCH --mail-user=<your_email>

module purge
module load pgi/19.4
module load cuda
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
138
139


Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
140
141
142
nvprof ./VecAdd.o
```

Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
143
# Challenge Part 2: Submit a GPU Job with SLRUM
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
144

Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
145
146
Below is a batch script to run the VecAdd.o, that you compiled in challenge 1. Note that we are using NVProf. Fill in the blanks using your knowledge of the SLURM batch directives
and the CADES software environment. If your compilation failed copy the VecAcc.o file from the “answers” folder into your work area to do this part. 
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
147

Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
148
149
In your work area the sbatch script is called gpu.sbatch.
Edit the script with vi by filling in the blanks as described below. 
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
150
```
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
151
152
$ vi gpu.sbatch 

Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
153
154
```

Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
155
**gpu.sbatch **
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
156

Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
```
#!/bin/bash

#SBATCH -A _______
#SBATCH -p _______
#SBATCH ___________
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -c 1
#SBATCH -J gpu-test-job
#SBATCH --mem=00
#SBATCH -t 10:00
#SBATCH -o ./%j-gpu-output.txt
#SBATCH -e ./%j-gpu-error.txt
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=<your_email>

module purge
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
module load _____
module load _____

nvprof ./VecAdd.o

```
To see where you stand in the Queue issue:
```
squeue -u <userID>

``` 
A job state of “PD” means your job is waiting in the queue. 
A job state of “R” means that it is running. 

To see details about your job, issue
```
scontrol show job <jobID> 
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
192
193

```
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
194
195
After the code runs, open your gpu-error.txt file and see if the code ran on the GPU.
How long did it spend copying data to the GPU?