Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in
  • CNMS_Computing_Resources CNMS_Computing_Resources
  • Project information
    • Project information
    • Activity
    • Labels
    • Planning hierarchy
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 1
    • Issues 1
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Code review
    • Issue
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • CNMS
  • CNMS_Computing_ResourcesCNMS_Computing_Resources
  • Wiki
  • Cades

Last edited by Doak, Peter W Oct 01, 2020
Page history

Cades

CADES CONDO CLUSTER

This is the CNMS's newest computing resource.

Gaining Access

Questions? email.

If your usage is abusive, purposefully or not, and you do not respond promptly to queries your jobs will be held or killed.

login

You will need to use you UCAMS password or if you are an XCAMS only user that password. Logging into CADES CONDO itself is not an RSA token login.

Onsite login

bash $ ssh or-slurm-login.ornl.gov
[you@or-slurm-login01 ~]$ module load env/cades-cnms

The env/cades-cnms gives you some standard environment variables and puts the CNMS modules in your $MODULEPATH

Offsite login

If you have a UCAMS account and an RSA token you should be getting to cades via the VPN or login1.ornl.gov

If you are using the VPN while it is up your experience should be just as onsite + some additional latency. This can make X11 forwarding require patience.

If you don't have an ORNL machine offsite to run the VPN on you'll need to come through login1.ornl.gov The best way to do this is by jumping through the login1 node.

$ ssh -X -J your-uid@login1.ornl.gov your-uid@or-slurm-login.ornl.gov

to copy files

$ scp -J your-uid@login1.ornl.gov your-uid@or-slurm-login:/path/to/files ./your/local/path

The facts

We own ~2400 cores. Many users must share these, think before you submit.

  • Run test jobs to establish timing and parallelization parameters.
  • Try to make your walltimes tight. (not always 48:00:00).
  • Do not flood the queue with jobs, if you have many small jobs batch them. Ask how.
  • We will actively thwart gaming the scheduling policies.

Policies

Subject to change at anytime

Walltime Limit: 48 hours

Simultaneous Jobs: 6

Max processors * remaining seconds running at anytime: 36495360 or 640 cores for ~15 hours.

You may notice some variation from this. Since experience frequent changes in number of users and intensity of use. Policies are adjusted to maximize utilization and responsiveness.

For up to the minute policies

[you@or-slurm-login01 ~]$ sacctmgr list Qos | grep cnms

Check current qos=std condo with

[you@or-slurm-login01 ~]$ squeue -q cnms-batch

Check current qos=std class=high_mem condo with

[you@or-slurm-login01 ~]$ squeue -q cnms-high_mem

Environment

32-36 core Haswell/Broadwell Based

We have two guaranteed blocks of compute: 1 . 1216 on -q batch (includes both hw32 and bw36) 2 . 1152 on -q high_mem (these are all bw36)

  • unless stated modules are optimized for hw32 but run just as well on bw36

CNMS CADES resources have moved to the slurm scheduler, Read Below!

using the old PBS headnode will just waste your time

** Slurm Cluster **

Job Submission

queues

There are now two partitions for cnms jobs.

  • batch
  • high_mem To use high_mem be sure to replace the -p batch with -p high_mem

Quality of service (QOS)

  • std - generally this is what you want
  • devel - short debug, build and experimental runs
  • burst - premptable jobs that run on unused CONDO resources (you must request access from @epd)

This is the obligatory slurm header for a job.

Basic job header -- for CCSD

#!/bin/bash -l
#SBATCH -J test2
#SBATCH --nodes 2
#SBATCH --ntasks-per-node 32
#SBATCH --cpus-per-task 1
#SBATCH --exclusive
#SBATCH --mem=100g
#SBATCH -p batch
#SBATCH -A cnms
#SBATCH -t 00:30:00

Sample Slurm Scripts

So far only the VASP example is updated!

There are examples for most of the installed codes in the repo.

[you@or-condo-login02 ~]$ cd your-code-dir
[you@or-condo-login02 ~]$ git clone git@code.ornl.gov:CNMS/CNMS_Computing_Resources.git

You can contribute to the examples.

File System

Run your jobs from

/lustre/or-hydra/cades-cnms/you

If your directory is missing ask @michael.galloway or another Cades admin in #general for it to be created.

Interactive Jobs

salloc -A cnms -p batch --nodes=1 --mem=80G --exclusive -t 00:30:00

then wait. Try -p high_mem if the wait is too long. Then you can run jobs interactively by basically entering the commands in your submission script. If it fails you can correct and try again.

examples

nwchem

module load PE-gnu/3.0
module load nwchem/6.6_p3
srun --cpu-bind=cores nwchem input 2>&1 >nwchem_out &
tail -f nwchem_out

CODES

These are the codes that have been installed so far. You can request additional codes.

Instructions for codes: These are all being revised due to the slurm migration.

VASP -- Much greater care needs to be taken to get proper distribution of tasks with slurm, recompilation should eventually ease this.

ESPRESSO -- Pending slurm instructions

LAMMPS -- Pending slurm instructions

ABINIT -- Pending slurm instructions

Advanced

Burst QOS will work somewhat differently with slurm, see Cades docs.

The default action when this occurs is to resubmit the job. If your code cannot recover from a dirty halt this is method should not be used. In the near future it will be possible to alter this behavior.

Bench Marking

You can contribute here

VASP MD

Who's who on Slack

Slack to uid

Clone repository
  • ABINIT
  • BASH
  • Cades
  • ESPRESSO
  • Git Best Practices
  • LAMMPS
  • Learning Resources
  • Metis
  • Monitoring Codes
  • Nersc
  • OIC
  • OLCF
  • OSX
  • Outside_Users
  • PBS Tricks
View All Pages