And it is still 5th.
They call this a Cray XK7
18,688 compute nodes each with:
- 1 x 16-core 2.2GHz AMD Opteron 6274 (Interlagos) processor
- 32 GB of RAM
- 1 x NVidia Kepler K20 GPU
- Gemini high-speed interconnect
- Total: 299,008 CPU, 18688 Kepler GPU, 598 TB of memory.
Differences from the standard Intel Linux Cluster
AMD Interlagos based machine, like OIC phase 5.
The key is there are only one floating point unit per two cores. So from a science perspective there are only 8 cores per node, not 16. Additionally your code needs to bind to just one core per pair. See Cray XK7 CPU info
Service nodes run the mpi mom processes and PBS scripts
this complicates what PBS scripts can do.
- Max 50 aprun's per job.
- Don't run processes that cause much load outside of aprun invocation.
There is only one GPU per node, so 16 CPU, 8 FPU, 1 GPU.
It's generally on you to manage this.
Qsub chaining jobs
qsub -W depend=afterok:JOBID next_jobs_pbs.sh
- afterok if you need previous job to be successful
- afterany if you don't care