experiments/opal.yaml
0 → 100644
+39
−0
+15
−0
+3
−1
Loading
Models the OPAL Experiment Steering Vision across Aurora/Frontier/Perlmutter:
- Aurora: instrument driver, S3M data streaming, data preparation
- Frontier: ViT segmentation inference, plant analysis
- Perlmutter: data lakehouse ingest, provenance, optional ViT training
- Dynamic routing: LLM interpretation job goes to site with most free nodes
Key additions:
- FedJob: target_site and depends_on fields for DAG-based submission
- MetaScheduler: submit_workflow() + WorkflowHandle for dependency release
- site_worker: emits JOB_COMPLETED events; respects gpu_fraction in meta
- raps/workloads/opal.py: per-iteration DAG factory (8-9 jobs/iteration)
- raps/fed_config.py: OpalConfig, WorkloadConfig
- run_fed.py: _run_opal_workflow() with pipelined iteration and dashboard support
- experiments/opal.yaml: 6 iterations, pipeline_depth=2, training every 3rd iter
Run: raps run-fed experiments/opal.yaml
Co-Authored-By:
Claude Sonnet 4.6 <noreply@anthropic.com>