|
# TITAN -- It's a tricky beast, but it was [1st](https://www.top500.org/lists/2012/11/) (on the [top 500](https://www.top500.org/lists/2017/11/))!
|
|
# TITAN -- It's a tricky beast, but it was [1st](https://www.top500.org/lists/2012/11/) on the [top 500](https://www.top500.org)!
|
|
And it is still 5th.
|
|
And it is still [5th](https://www.top500.org/lists/2017/11/).
|
|
|
|
They call this a Cray XK7
|
|
|
|
|
|
|
|
## 18,688 compute nodes each with:
|
|
|
|
* 1 x 16-core 2.2GHz AMD Opteron 6274 (Interlagos) processor
|
|
|
|
* 32 GB of RAM
|
|
|
|
* 1 x NVidia Kepler K20 GPU
|
|
|
|
* Gemini high-speed interconnect
|
|
|
|
* Total: 299,008 CPU, 18688 Kepler GPU, 598 TB of memory.
|
|
|
|
|
|
## Differences from the standard Intel Linux Cluster
|
|
## Differences from the standard Intel Linux Cluster
|
|
### AMD Interlagos based machine, like OIC phase 5.
|
|
### AMD Interlagos based machine, like OIC phase 5.
|
|
The key is there are only one floating point unit per two cores. So from a science perspective there are only 8 cores per node, not the sixteen listed. Additionally your code needs to bind to just one core per pair. See [Cray XK7 CPU info](https://www.olcf.ornl.gov/support/system-user-guides/titan-user-guide/#333)
|
|
The key is there are only one floating point unit per two cores. So from a science perspective there are only 8 cores per node, not 16. Additionally your code needs to bind to just one core per pair. See [Cray XK7 CPU info](https://www.olcf.ornl.gov/support/system-user-guides/titan-user-guide/#333)
|
|
### Service nodes run mpimom processes and PBS scripts
|
|
### Service nodes run the mpi mom processes and PBS scripts
|
|
this complicates what PBS scripts can do. You can only run 50 aprun's that is 50 independent mpi applications in one job script.
|
|
this complicates what PBS scripts can do.
|
|
|
|
* Max 50 aprun's per job.
|
|
|
|
* Don't run processes that cause much load outside of aprun invocation.
|
|
### There is only one GPU per node, so 16 CPU, 8 FPU, 1 GPU.
|
|
### There is only one GPU per node, so 16 CPU, 8 FPU, 1 GPU.
|
|
It's generally on you to manage this.
|
|
It's generally on you to manage this.
|
|
|
|
|
|
# Qsub chaining jobs
|
|
# Qsub chaining jobs
|
|
```shell-session
|
|
```shell-session
|
|
qsub -W depend=afterok:JOBID fourth_1.sh
|
|
qsub -W depend=afterok:JOBID next_jobs_pbs.sh
|
|
```
|
|
```
|
|
* _afterok_ if you need previous job to be successful
|
|
* _afterok_ if you need previous job to be successful
|
|
* _afterany_ if you don't care
|
|
* _afterany_ if you don't care
|
... | | ... | |