README.md 4.82 KB
Newer Older
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
1
2
3
4
5
6
7
8
9
10
11
CADES Condo's have several GPUS
#### Open Condo Queues for Condo Groups on SLURM
| Name            | Max Walltime  (D-H:M:S) | # Nodes | Cores       | Micro arch.        | RAM               | Local Scratch     | GPU              |
| --------------- | ----------------------- | ------- | ----------- | ------------------ | ----------------- | ----------------- | ---------------- |
| `testing`       | 4:0:0                   | 4       | 32          | Broadwell          | 62G               | 160G              | N/A              |
| `batch`         | 14-0:0:0                | 38, 25  | 32, 36      | Haswell, Broadwell | 125G              | 233G, 2T          | N/A              |
| `high_mem`      | 14-0:0:0                | 52      | 36          | Broadwell          | 250G              | 2T                | N/A              |
| `high_mem_cd`   | 14-0:0:0                | 6       | 36          | Skylake            | 375G              | 2T                | N/A              |
| `gpu`           | 14-0:0:0                | 9       | 32          | Haswell            | 250G              | 233G              | 2x K80 (GK210)   |
| `gpu_p100`      | 14-0:0:0                | 8       | 36          | Broadwell          | 500G              | 2T                | 2x P100 (GP100)  |

Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
12
The birthright nodes have 2 GPUS each. We will use one of them to do a vector addition in this challenge. 
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
## VecAdd.c 

VecAdd.c adds two vectors A and B to produce C, where Ci = Ai + Bi. This version of code was brought to you Lustre Birthright scratch space during the Globus challenge. 

It uses OpenACC directives to copy two vectors, a and b to the GPU and add them, 
then copies the result back to the GPU. If you want to see a more complete explanation 
of this see the CADES self-guided tutorial in the SLRUM Migration section of our Condo User Guide: 
https://support.cades.ornl.gov/user-documentation/_book/condos/slurm/openacc-tutorial.html



```
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
 
int main( int argc, char* argv[] )
{
 
    // Size of vectors
    int n = 10000;
 
    // Input vectors
    double *restrict a;
    double *restrict b;
    // Output vector
    double *restrict c;
 
    // Size, in bytes, of each vector
    size_t bytes = n*sizeof(double);
 
    // Allocate memory for each vector
    a = (double*)malloc(bytes);
    b = (double*)malloc(bytes);
    c = (double*)malloc(bytes);
 
    // Initialize content of input vectors, vector a[i] = sin(i)^2 vector b[i] = cos(i)^2
    int i;
    for(i=0; i<n; i++) {
        a[i] = sin(i)*sin(i);
        b[i] = cos(i)*cos(i);
    }  
 
    // sum component wise and save result into vector c
    #pragma acc kernels copyin(a[0:n],b[0:n]), copyout(c[0:n])
    for(i=0; i<n; i++) {
        c[i] = a[i] + b[i];
    }
 
    // Sum up vector c and print result divided by n, this should equal 1 within error
    double sum = 0.0;
    for(i=0; i<n; i++) {
        sum += c[i];
    }
    sum = sum/n;
    printf("final result: %f\n", sum);
 
    // Release memory
    free(a);
    free(b);
    free(c);
 
    return 0;
}
```

## Compiling VecAdd.c with OpenAcc
The PGI compiler is the only CADES compiler that has full OpenACC Support. 
To set up your progrmming enviroment: 
```
$ module purge
$ module load pgi/19.4

```

To compile this code with the pgcc compiler and OpenAcc use the `-acc` flag.  

The pgi compiler `-Minfo` flag allow you to see what the compiler is doing with your code

## Challenge part 1:


Set up your programming enviroment and fill in the blanks to compile the code with the -acc flag and the - Minfo flag. 
```

$ pgcc -___ -_____ vecAdd.c -o VecAdd.c

```

Did the compiler generate any code for the GPU? How can you tell? 


## GPU Batch Scripting

#### CADES Birthright

Standard SBATCH directives:
```
#SBATCH -A birthright
#SBATCH -p gpu
#SBATCH --gres=gpu:2
```
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
115
You must use --gres=gpu:2 to tell the job to use the node's GPUs. 
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
116

Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
117
118
119
120
121
We will also use NVprof, NVIDIA's built-in profiler. It will show you that your code is running on the GPU and also give you performance information about the code. 

To use nvprof issue:
```
mpirun nvprof ./vecAdd.o
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
122
```
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
123

Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
124
125
### Challenge 2 Submit a GPU Job with SLRUM

Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
126
Below is a batch script to run the VecAdd.o, that you compiled on the GPU, in challange 1. Note that we are using NVProf. Fill in the blanks using your knowledge of the SLURM batch directives 
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
and the CADES software environment. 

**run_vecadd.sbatch**
```
#!/bin/bash

#SBATCH -A _______
#SBATCH -p _______
#SBATCH ___________
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -c 1
#SBATCH -J gpu-test-job
#SBATCH --mem=00
#SBATCH -t 10:00
#SBATCH -o ./%j-gpu-output.txt
#SBATCH -e ./%j-gpu-error.txt
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=<your_email>

module purge
module load pgi/19.4
mpirun nvprof ./vecAdd.o

```
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
152
153
When you code runs open your output files and see if the code ran on the GPU. 
How long did it spend copying data to the GPU?