Skip to content
Snippets Groups Projects
Commit f9ab44d9 authored by Juve, Gideon's avatar Juve, Gideon
Browse files

Add setup script

parent 059baf0e
No related branches found
No related tags found
No related merge requests found
......@@ -6,77 +6,45 @@ Pegasus workflow for ACME climate models.
Consult the [CESM User's Guide](http://www.cesm.ucar.edu/models/cesm1.2/cesm/doc/usersguide/book1.html)
for more information about the climate code used by this workflow.
On Hopper
---------
You need to create the case while logged into Hopper.
In the following, assume that these variables are set to the values below:
ACMEPATH = /global/project/projectdirs/m2187/acme_code/ACME
CASENAME = F1850.g37.case2
SCRATCHDIR = /scratch/scratchdirs/juve/
CASEDIR = $SCRATCHDIR/$CASENAME
Note that the "shared-scratch" directory in your site catalog must match the
value for SCRATCHDIR above. This is because we use CASENAME as the --relative-dir
option for the Pegasus planner so that CASEDIR will become Pegasus' scratch dir.
1. Create the case directory. From $ACMEPATH/scripts run:
$ ./create_newcase -case $CASEDIR -mach hopper -compset F1850 -res T31_g37 -project m2187
In this case "m2187" is our Hopper project ID, and "mach" means "machine"
and should be set to "hopper". The "compset" defines the component set, or the
set of model components and configuration you want to use, and "res" specifies
the resolution, or grid setting. Possible compsets include: F1850, B1850.
Possible grids include: ne30_g16, T31_g37.
2. Make any manual changes to the case that are required for your simulation.
3. Setup the case. From $CASEDIR run:
$ ./cesm_setup
4. Compile the code. This should take about 20 minutes. From $CASEDIR run:
$ ./$CASENAME.build
On Submit Host
--------------
Steps to Run the Workflow
-------------------------
1. Create/edit the configuration file (e.g. test.cfg)
a. Set "casename" to match the name of your case.
b. Set "mppwidth" to match the $CASENAME.run script in your case dir
b. Set "mppwidth" to the number of cores that your run requires.
c. Set "stop_n" and "walltime" to create the number of stages you want
the workflow to have.
2. Generate the DAX
2. Create/edit the setup script (e.g. setup.sh)
a. Set the create_newcase parameters
3. Generate the DAX
$ python daxgen.py test.cfg $CASENAME
$ python daxgen.py test.cfg setup.sh DIRNAME
3. Edit the site catalog, sites.xml:
4. Edit the site catalog, sites.xml:
a. Update the "shared-scratch" directory entry to have your username
b. Update the "shared-storage" directory entry
4. Plan the DAX
5. Plan the DAX
$ ./plan.sh $CASENAME
$ ./plan.sh DIRNAME
5. Get NERSC grid proxy using:
6. Get NERSC grid proxy using:
$ myproxy-logon -s nerscca.nersc.gov:7512 -t 24 -T -l YOUR_NERSC_USERNAME
6. Follow output of plan.sh to submit workflow
7. Follow output of plan.sh to submit workflow
$ pegasus-run $CASENAME/submit/$CASENAME
$ pegasus-run DIRNAME/path/to/submit/dir
7. Monitor the workflow:
8. Monitor the workflow:
$ pegasus-status -l $CASENAME/submit/$CASENAME
$ pegasus-status -l DIRNAME/path/to/submit/dir
#!/bin/bash
# This script creates a new case, runs cesm_setup, and builds the code
set -e
if [ -z "$ACMEROOT" ]; then
echo "ERROR: Set the ACMEROOT environment variable in sites.xml"
exit 1
fi
if ! [ -d "$ACMEROOT" ]; then
echo "ERROR: ACMEROOT does not point to a valid path"
exit 1
fi
function usage () {
echo "Usage: $0 -case CASENAME -setup SETUP_SCRIPT"
}
if [ $# -eq 0 ]; then
usage
exit 1
fi
CONTINUE_RUN=FALSE
while [ "$#" -ne 0 ]; do
case "$1" in
-case)
shift
CASENAME=$1
;;
-setup)
shift
SETUP_SCRIPT=$1
;;
*)
usage
exit 1
;;
esac
shift
done
if [ -z "$CASENAME" ]; then
echo "ERROR: Specify -case"
exit 1
fi
if [ -z "$SETUP_SCRIPT" ]; then
echo "ERROR: Sepcify -setup"
exit 1
fi
SCRATCHDIR=$PWD
export CASEROOT=$SCRATCHDIR/$CASENAME
echo "CASENAME is $CASENAME"
echo "Setting CASEROOT to $CASEROOT"
echo "Setup script is $SETUP_SCRIPT"
# Clean up after any previous failed run
if [ -d "$CASEROOT" ]; then
echo "WARNING: CASEROOT already exists: $CASEROOT"
echo "Removing existing CASEROOT"
rm -rf $CASEROOT
fi
# Mark the setup script as executable
chmod 755 $SETUP_SCRIPT
# Run the setup script
./$SETUP_SCRIPT
......@@ -2,6 +2,7 @@
import os
import sys
import re
import shutil
import string
from ConfigParser import ConfigParser
from Pegasus.DAX3 import *
......@@ -27,9 +28,10 @@ def format_template(name, outfile, **kwargs):
f.close()
class ACMEWorkflow(object):
def __init__(self, outdir, config):
def __init__(self, outdir, setup, config):
"'outdir' is the directory where the workflow is written, and 'config' is a ConfigParser object"
self.outdir = outdir
self.setup = setup
self.config = config
self.daxfile = os.path.join(self.outdir, "dax.xml")
self.replicas = {}
......@@ -71,6 +73,18 @@ class ACMEWorkflow(object):
f = open(path, "w")
try:
f.write("""
tr acme-setup {
site local {
pfn "file://%s/bin/acme-setup.sh"
arch "x86_64"
os "linux"
type "STAGEABLE"
profile globus "count" "1"
profile globus "jobtype" "single"
profile hints "globusScheduler" "auxiliary"
}
}
tr acme-run {
site local {
pfn "file://%s/bin/acme-run.sh"
......@@ -107,7 +121,7 @@ tr acme-amwg {
profile pegasus "exitcode.failuremsg" "Segmentation fault"
}
}
""" % (DAXGEN_DIR, self.mppwidth, DAXGEN_DIR, DAXGEN_DIR))
""" % (DAXGEN_DIR, DAXGEN_DIR, self.mppwidth, DAXGEN_DIR, DAXGEN_DIR))
finally:
f.close()
......@@ -129,23 +143,32 @@ tr acme-amwg {
"Generate a workflow (DAX, config files, and replica catalog)"
dax = ADAG(self.casename)
last = None
if self.stop_option in ["nyear", "nyears"]:
amwg = True
else:
print "WARNING: Diagnostics not added to workflow unles stop option is 'nyears'. Current setting is '%s'" % self.stop_option
amwg = False
# Add the setup stage
setupscript = File(self.setup)
setup = Job(name="acme-setup")
setup.addArguments("-case", self.casename, "-setup", setupscript)
setup.uses(setupscript, link=Link.INPUT, register=False, transfer=True)
dax.addJob(setup)
self.add_replica(self.setup, os.path.join(self.outdir, self.setup))
last = None
tot_years = 0
i = 1
for stop_n, walltime in zip(self.stop_n, self.walltime):
stage = Job(name="acme-run")
if i > 1:
stage.addArguments("-continue")
stage.addArguments("-stage %s -stop %s -n %s" % (i, self.stop_option, stop_n))
stage.addProfile(Profile(namespace="globus", key="maxwalltime", value=walltime))
dax.addJob(stage)
if i == 1:
dax.depends(stage, setup)
else:
stage.addArguments("-continue")
if last is not None:
dax.depends(stage, last)
......@@ -198,34 +221,42 @@ tr acme-amwg {
dax.writeXMLFile(self.daxfile)
def generate_workflow(self):
if os.path.isdir(self.outdir):
raise Exception("Directory exists: %s" % self.outdir)
# Create the output directory
self.outdir = os.path.abspath(self.outdir)
os.makedirs(self.outdir)
self.generate_dax()
self.generate_replica_catalog()
self.generate_transformation_catalog()
self.generate_env()
def main():
if len(sys.argv) != 3:
raise Exception("Usage: %s CONFIGFILE OUTDIR" % sys.argv[0])
if len(sys.argv) != 4:
raise Exception("Usage: %s CONFIGFILE SETUP OUTDIR" % sys.argv[0])
configfile = sys.argv[1]
outdir = sys.argv[2]
setup = sys.argv[2]
outdir = sys.argv[3]
if not os.path.isfile(configfile):
raise Exception("No such file: %s" % configfile)
raise Exception("Invalid CONFIGFILE: No such file: %s" % configfile)
if not os.path.isfile(setup):
raise Exception("Invalid SETUP script: No such file: %s" % setup)
outdir = os.path.abspath(outdir)
if os.path.isdir(outdir):
raise Exception("Directory exists: %s" % outdir)
# Read the config file
config = ConfigParser()
config.read(configfile)
# Create the output directory
os.makedirs(outdir)
# Save a copy of the config file and setup script
shutil.copy(configfile, outdir)
shutil.copy(setup, outdir)
# Generate the workflow in outdir based on the config file
workflow = ACMEWorkflow(outdir, config)
workflow = ACMEWorkflow(outdir, os.path.basename(setup), config)
workflow.generate_workflow()
......
......@@ -36,7 +36,6 @@ pegasus-plan \
--conf $PP \
--dax $DAX \
--dir $SUBMIT_DIR \
--relative-dir $CASENAME \
--sites $SITE \
--output-site $OUTPUT_SITE \
--cleanup none \
......
#!/bin/bash
# This is an example ACME setup script. The workflow runs this script with
# several environment variables:
#
# ACMEROOT: This is the path to the ACME/CESM source code
# CASENAME: The name of the case
# CASEROOT: This is the path to the case directory
#
# This script should run create_newcase, cesm_setup and $CASENAME.build.
#
set -e
$ACMEROOT/scripts/create_newcase -case $CASEROOT -mach hopper -compset F1850 -res T31_g37
cd $CASEROOT
./cesm_setup
./$CASENAME.build
......@@ -10,14 +10,16 @@
<site handle="hopper" arch="x86_64" os="LINUX">
<grid type="gt5" contact="hoppergrid.nersc.gov/jobmanager" scheduler="Fork" jobtype="auxillary"/>
<grid type="gt5" contact="hoppergrid.nersc.gov/jobmanager-pbs" scheduler="PBS" jobtype="compute"/>
<!-- Your case dir must be in shared-scratch. Put your username here instead of "juve". -->
<!-- This is where the casedir goes. Put your username here instead of "juve". -->
<directory type="shared-scratch" path="/scratch/scratchdirs/juve">
<file-server operation="all" url="gsiftp://hoppergrid.nersc.gov/scratch/scratchdirs/juve"/>
</directory>
<!-- This is where output files go. -->
<directory type="shared-storage" path="/project/projectdirs/m2187/pegasus">
<file-server operation="all" url="gsiftp://hoppergrid.nersc.gov/project/projectdirs/m2187" />
</directory>
<profile namespace="env" key="PEGASUS_HOME">/project/projectdirs/m2187/pegasus/pegasus-4.4.0</profile>
<profile namespace="env" key="ACMEROOT">/global/project/projectdirs/m2187/acme_code/ACME</profile>
<profile namespace="env" key="DIAG_HOME">/project/projectdirs/m2187/amwg/amwg_diagnostics</profile>
<profile namespace="env" key="MAGICK_HOME">/project/projectdirs/m2187/ImageMagick-6.9.0.4</profile>
<profile namespace="globus" key="project">m2187</profile>
......
[acme]
# This is the name of the case. This should be the name of the directory
# relative to "shared-scratch" from the site catalog. For example, if
# your shared-scratch is /scratch/juve, and casename is mycase, then the
# case should be set up in /scratch/juve/mycase
# This is the name of the case.
casename = F1850.g37.case2
# This is the number of cores to use for each stage
# This is the number of cores to use for each stage. For now, this needs to
# be set manually. Eventually we might be able to derive it automatically
# from the compset and grid.
mppwidth = 24
# This is the unit of simulation time
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment