|
|
# CADES CONDO CLUSTER #
|
|
|
|
|
|
**LUSTRE FILE SYSTEM MIGRATION is progress, see below**
|
|
|
|
|
|
This is the CNMS's newest computing resource.
|
|
|
|
|
|
## Make yourself a slack account
|
|
|
## Make yourself a [slack account](https://cades-condos.slack.com)
|
|
|
As instructed in the getting started on Cades-Condo email @pdoak should have sent you.
|
|
|
|
|
|
If you're going to run on cades log in to it and see what is happening in the general channel. This can save you a great deal of time. Any current trouble will is discussed there in real time. Any trouble you cause that is large enough will also be discussed here. Get slack for your phone. You wouldn't leave an experiment running in a lab with no way to be contacted, you shouldn't on condo either.
|
|
|
If you're going to run on cades log in to it and see what is happening in the general channel. This can save you a great deal of time.
|
|
|
|
|
|
We "own" 1216 cores. Larger amounts can be used in short bursts.
|
|
|
Short is not days.
|
|
|
Check current usage with
|
|
|
```shell-session
|
|
|
$SOFTWARECNMS/allCNMS.sh
|
|
|
```
|
|
|
* Any current trouble will be discussed there in real time.
|
|
|
* Trouble that you cause large enough to impact other users will also be discussed there.
|
|
|
* Get slack for your phone.
|
|
|
|
|
|
You wouldn't leave an experiment running in the lab, or food on a hot stove and leave with no way to be contacted, you shouldn't do it on Cades-condo either.
|
|
|
|
|
|
---
|
|
|
|
|
|
We *own* **1216** cores.
|
|
|
|
|
|
## Policies ##
|
|
|
Walltime Limit: 48 hours
|
|
|
Simultaneous Jobs: 8
|
|
|
|
|
|
# Policies #
|
|
|
Walltime Limit: 48 hours
|
|
|
Simultaneous Jobs: 5
|
|
|
|
|
|
* Run test jobs to establish timing and parallelization parameters.
|
|
|
* Try to make your walltimes tight. (not always 48:00:00).
|
|
|
* Do not flood the queue with jobs, if you have many small jobs batch them. Ask how.
|
|
|
|
|
|
If your usage is deemed excessive and you do not respond promptly to queries your jobs will be held or killed.
|
|
|
|
|
|
## login
|
|
|
```shell-session
|
|
|
or-condo-login.ornl.gov
|
|
|
bash $ ssh or-condo-login.ornl.gov
|
|
|
[you@or-condo-login02 ~]$ module load env/cades-cnms
|
|
|
```
|
|
|
|
|
|
### Check current usage with
|
|
|
```shell-session
|
|
|
[you@or-condo-login02 ~]$ $SOFTWARECNMS/allCNMS.sh
|
|
|
```
|
|
|
|
|
|
|
|
|
## Environment ##
|
|
|
Haswell/Broadwell Based
|
|
|
MOAB Torque Cluster
|
|
|
|
|
|
## Job Submission ##
|
|
|
There is only one queue:
|
|
|
There is only one queue: batch
|
|
|
|
|
|
See this gitlab repo's examples for the pbs commands, there are more than on most clusters and they matter.
|
|
|
|
|
|
## Sample PBS Scripts ##
|
|
|
There are examples for most of the installed codes in the repo.
|
|
|
```shell
|
|
|
#PBS -q batch
|
|
|
[you@or-condo-login02 ~]$ cd your-code-dir
|
|
|
[you@or-condo-login02 ~]$ git clone git@code.ornl.gov:CNMS/CNMS_Computing_Resources.git
|
|
|
```
|
|
|
**You can contribute to the examples.**
|
|
|
|
|
|
There are two QOS levels useable on Cades:
|
|
|
## File System ##
|
|
|
**This has Changed**
|
|
|
Run your jobs from
|
|
|
```
|
|
|
/lustre/or-hydra/cades-cnms/you
|
|
|
```
|
|
|
The old lustre file system *pfs1* is full and very laggy from being too full. Migrate you old data soon.
|
|
|
**Use a pbs job or an interactive job, do not use the login nodes.**
|
|
|
|
|
|
## CODES ##
|
|
|
These are the codes that have been installed so far. You can request additional codes.
|
|
|
|
|
|
Instructions for codes:
|
|
|
Please read these, you can waste a great deal of resources if you do not understand how to run even familiar codes optimally in this hardware environment.
|
|
|
|
|
|
[VASP](VASP) -- Be Careful with this one: The vanilla optimized version can experience matrix issues. If you have them try the 5.4.1.2 build or the debug build. This is a known issue with high optimizations and the intel compiler.
|
|
|
|
|
|
[ESPRESSO](ESPRESSO) -- Performs well on cades, consider it as an alternative to VASP
|
|
|
|
|
|
[ABINIT](ABINIT) -- New build use caution
|
|
|
|
|
|
===
|
|
|
|
|
|
## Advanced
|
|
|
In theory there are two QOS levels useable on Cades:
|
|
|
|
|
|
But at the moment the core counting is not working properly so
|
|
|
|
|
|
Just use:
|
|
|
* **condo** -- allows use of up to our purchased amount of nodes.
|
|
|
|
|
|
ignore:
|
|
|
* **burst** -- allows use of more nodes but job can be preempted by condo QOS jobs
|
|
|
|
|
|
Use them by adding:
|
... | ... | @@ -44,17 +99,3 @@ Use them by adding: |
|
|
#PBS -l qos=condo
|
|
|
```
|
|
|
|
|
|
## Sample PBS Scripts ##
|
|
|
There are examples for most of the installed codes in the repo.
|
|
|
```shell
|
|
|
cd your-code-dir
|
|
|
git clone git@code.ornl.gov:CNMS/CNMS_Computing_Resources.git
|
|
|
```
|
|
|
|
|
|
## CODES ##
|
|
|
Instructions for codes:
|
|
|
Please read these, you can waste a great deal of resources if you do not understand how to run even familiar codes optimally in this hardware environment.
|
|
|
|
|
|
[VASP](VASP) -- Be Careful with this one: The vanilla optimized version can experience matrix issues. If you have them try the 5.4.1.2 build or the debug build. This is a known issue with high optimizations and the intel compiler.
|
|
|
|
|
|
[ABINIT](ABINIT) -- New build use caution |
|
|
\ No newline at end of file |