README.md 4.95 KB
Newer Older
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
1
2
# Cades Condo GPUs

Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
3
4
5
6
7
8
9
10
11
12
13
CADES Condo's have several GPUS
#### Open Condo Queues for Condo Groups on SLURM
| Name            | Max Walltime  (D-H:M:S) | # Nodes | Cores       | Micro arch.        | RAM               | Local Scratch     | GPU              |
| --------------- | ----------------------- | ------- | ----------- | ------------------ | ----------------- | ----------------- | ---------------- |
| `testing`       | 4:0:0                   | 4       | 32          | Broadwell          | 62G               | 160G              | N/A              |
| `batch`         | 14-0:0:0                | 38, 25  | 32, 36      | Haswell, Broadwell | 125G              | 233G, 2T          | N/A              |
| `high_mem`      | 14-0:0:0                | 52      | 36          | Broadwell          | 250G              | 2T                | N/A              |
| `high_mem_cd`   | 14-0:0:0                | 6       | 36          | Skylake            | 375G              | 2T                | N/A              |
| `gpu`           | 14-0:0:0                | 9       | 32          | Haswell            | 250G              | 233G              | 2x K80 (GK210)   |
| `gpu_p100`      | 14-0:0:0                | 8       | 36          | Broadwell          | 500G              | 2T                | 2x P100 (GP100)  |

Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
14
The birthright nodes have 2 GPUS each. We will use one of them to do a vector addition in this challenge. 
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
15
16
17
18
19
20

# OpenACC on Condo setup

For this example and challenge you will need to have done the Globus challenge to get the files.
Otherwise, please login to or-slurm-login01 and cp -r /lustre/or-hydra/cades-birthright/world-shared/GPU-Basics . 

Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
## VecAdd.c 

VecAdd.c adds two vectors A and B to produce C, where Ci = Ai + Bi. This version of code was brought to you Lustre Birthright scratch space during the Globus challenge. 

It uses OpenACC directives to copy two vectors, a and b to the GPU and add them, 
then copies the result back to the GPU. If you want to see a more complete explanation 
of this see the CADES self-guided tutorial in the SLRUM Migration section of our Condo User Guide: 
https://support.cades.ornl.gov/user-documentation/_book/condos/slurm/openacc-tutorial.html



```
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
 
int main( int argc, char* argv[] )
{
 
    // Size of vectors
    int n = 10000;
 
    // Input vectors
    double *restrict a;
    double *restrict b;
    // Output vector
    double *restrict c;
 
    // Size, in bytes, of each vector
    size_t bytes = n*sizeof(double);
 
    // Allocate memory for each vector
    a = (double*)malloc(bytes);
    b = (double*)malloc(bytes);
    c = (double*)malloc(bytes);
 
    // Initialize content of input vectors, vector a[i] = sin(i)^2 vector b[i] = cos(i)^2
    int i;
    for(i=0; i<n; i++) {
        a[i] = sin(i)*sin(i);
        b[i] = cos(i)*cos(i);
    }  
 
    // sum component wise and save result into vector c
    #pragma acc kernels copyin(a[0:n],b[0:n]), copyout(c[0:n])
    for(i=0; i<n; i++) {
        c[i] = a[i] + b[i];
    }
 
    // Sum up vector c and print result divided by n, this should equal 1 within error
    double sum = 0.0;
    for(i=0; i<n; i++) {
        sum += c[i];
    }
    sum = sum/n;
    printf("final result: %f\n", sum);
 
    // Release memory
    free(a);
    free(b);
    free(c);
 
    return 0;
}
```

## Compiling VecAdd.c with OpenAcc
The PGI compiler is the only CADES compiler that has full OpenACC Support. 
To set up your progrmming enviroment: 
```
$ module purge
$ module load pgi/19.4

```

To compile this code with the pgcc compiler and OpenAcc use the `-acc` flag.  

The pgi compiler `-Minfo` flag allow you to see what the compiler is doing with your code

## Challenge part 1:


Set up your programming enviroment and fill in the blanks to compile the code with the -acc flag and the - Minfo flag. 
```

$ pgcc -___ -_____ vecAdd.c -o VecAdd.c

```

Did the compiler generate any code for the GPU? How can you tell? 


## GPU Batch Scripting

#### CADES Birthright

Standard SBATCH directives:
```
#SBATCH -A birthright
#SBATCH -p gpu
#SBATCH --gres=gpu:2
```
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
123
You must use --gres=gpu:2 to tell the job to use the node's GPUs. 
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
124

Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
125

Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
126
127
### Challenge 2 Submit a GPU Job with SLRUM

Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
128
Below is a batch script to run the VecAdd.o, that you compiled on the GPU, in challange 1. Note that we are using NVProf. Fill in the blanks using your knowledge of the SLURM batch directives 
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
129
130
and the CADES software environment. 

Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
131
In the sample folder the script is called gpu.sbatch. 
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
132
133
134
135
136
137
138
139

Issue:

```
vi gpu.sbatch 
```

Fill in the missing parts of the script as described below. 
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
140

Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
**run_vecadd.sbatch**
```
#!/bin/bash

#SBATCH -A _______
#SBATCH -p _______
#SBATCH ___________
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -c 1
#SBATCH -J gpu-test-job
#SBATCH --mem=00
#SBATCH -t 10:00
#SBATCH -o ./%j-gpu-output.txt
#SBATCH -e ./%j-gpu-error.txt
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=<your_email>

module purge
module load pgi/19.4
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
161
./vecAdd.o
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
162
163

```
Parete-Koon, Suzanne's avatar
Parete-Koon, Suzanne committed
164
When you code runs open your output files and see if the vectors summed to 1.0.