Verified Commit fde21fba authored by Hines, Jesse's avatar Hines, Jesse
Browse files

Merge branch 'develop' into rl3

parents c80d121d 15aacee4
Loading
Loading
Loading
Loading
+1 −1
Original line number Diff line number Diff line
[flake8]
exclude = .git, __pycache__, venv*, simulation_results, third_party, models
exclude = .git, __pycache__, venv*, simulation_results, third_party, models, .venv
max-line-length = 120
+1 −0
Original line number Diff line number Diff line
@@ -5,3 +5,4 @@ venv
*.npz
*.prof
simulation_results/
models/*.fmu
+62 −37
Original line number Diff line number Diff line
@@ -19,37 +19,37 @@ Note: Requires python3.12 or greater.

## Usage and help menu

    python main.py -h
    raps run -h

## Run simulator with default synthetic workload

    python main.py
    raps run

## Run simulator with telemetry replay

    # Frontier
    DATEDIR="date=2024-01-18"
    DPATH=~/data/frontier-sample-2024-01-18
    python main.py -f $DPATH/slurm/joblive/$DATEDIR,$DPATH/jobprofile/$DATEDIR
    raps run -f $DPATH/slurm/joblive/$DATEDIR,$DPATH/jobprofile/$DATEDIR

## Open Telemetry dataset

For Marconi supercomputer, download `job_table.parquet` from https://zenodo.org/records/10127767

    # Marconi100
    python main.py --system marconi100 -f ~/data/marconi100/job_table.parquet
    raps run --system marconi100 -f ~/data/marconi100/job_table.parquet

For Adastra MI250 supercomputer, download 'AdastaJobsMI250_15days.parquet' from https://zenodo.org/records/14007065

    # Adastra MI250
    python main.py --system adastraMI250 -f AdastaJobsMI250_15days.parquet
    raps run --system adastraMI250 -f AdastaJobsMI250_15days.parquet

For Google cluster trace v2

    python main.py --system gcloudv2 -f ~/data/gcloud/v2/google_cluster_data_2011_sample --ff 600
    raps run --system gcloudv2 -f ~/data/gcloud/v2/google_cluster_data_2011_sample --ff 600

    # analyze dataset
    python -m raps.telemetry --system gcloudv2 -f ~/data/gcloud/v2/google_cluster_data_2011_sample -v
    raps telemetry --system gcloudv2 -f ~/data/gcloud/v2/google_cluster_data_2011_sample -v

For MIT Supercloud

@@ -62,29 +62,29 @@ For MIT Supercloud
    python -m raps.dataloaders.mit_supercloud.cli download --start 2021-05-21T13:00 --end 2021-05-21T14:00

    # Load data and run simulation - will save data as part-cpu.npz and part-gpu.npz files
    python multi-part-sim.py -x mit_supercloud -f $DPATH --start 2021-05-21T13:00 --end 2021-05-21T14:00
    raps run-parts -x mit_supercloud -f $DPATH --start 2021-05-21T13:00 --end 2021-05-21T14:00
    # or simply
    python multi-part-sim.py experiments/mit.yaml
    raps run-parts experiments/mit-replay-25hrs.yaml
    # Note: if no start, end dates provided will default to run 24 hours between
    # 2021-05-21T00:00 to 2021-05-22T00:00 set by defaults in raps/dataloaders/mit_supercloud/utils.py

    # Re-run simulation using npz files (much faster load)
    python multi-part-sim.py -x mit_supercloud -f part-*.npz
    raps run-parts -x mit_supercloud -f part-*.npz

    # Synthetic tests for verification studies:
    python multi-part-sim.py -x mit_supercloud -w multitenant
    raps run-parts -x mit_supercloud -w multitenant

For Lumi

    # Synthetic test for lumi multi-part-sim:
    python multi-part-sim.py -x lumi/*
    # Synthetic test for Lumi:
    raps run-parts -x lumi

## Perform Network Simulation

Lassen is one of the few datasets that has networking data. See `raps/dataloaders/lassen.py` for how to
get the datasets. To run a network simulation, use the following command:

    python main.py -f ~/data/lassen/Lassen-Supercomputer-Job-Dataset --system lassen --policy fcfs --backfill firstfit --ff 365d -t 12h --arrival poisson --net
    raps run -f ~/data/lassen/Lassen-Supercomputer-Job-Dataset --system lassen --policy fcfs --backfill firstfit --ff 365d -t 12h --arrival poisson --net

## Snapshot of extracted workload data

@@ -92,8 +92,7 @@ To reduce the expense of extracting the needed data from the telemetry parquet f
RAPS saves a snapshot of the extracted data in NPZ format. The NPZ file can be
given instead of the parquet files for more quickly running subsequent simulations, e.g.:

    python main.py -f jobs_2024-02-20_12-20-39.npz

    raps run -f jobs_2024-02-20_12-20-39.npz

## Cooling models

@@ -104,37 +103,29 @@ We provide several cooling models in the repo https://code.ornl.gov/exadigit/POW
Will install the POWER9CSM in the models folder. To activate cooling when running RAPS,
use `--cooling` or `-c` argument. e.g.,

    python main.py --system marconi100 -c
    raps run --system marconi100 -c

    python main.py --system lassen -c
    raps run --system lassen -c

    python main.py --system summit -c
    raps run --system summit -c

## Support for multiple system partitions

Multi-partition systems are supported by running the `multi-part-sim.py` script, where a list of configurations can be specified using the `-x` flag as follows:
Multi-partition systems are supported by running `raps multi-parts ...` command, where a list of partitions can be specified using the `-x` flag as follows:

    python multi-part-sim.py -x setonix/part-cpu setonix/part-gpu
    raps run-parts -x setonix/part-cpu setonix/part-gpu

or simply:

    python multi-part-sim.py -x setonix/* # bash

    python multi-part-sim.py -x 'setonix/*' # zsh

To run this in parallel use:

    mpiexec -n 2 python multi-part-sim-mpi.py -x setonix/part-cpu setonix/part-gpu

*Note: first install `mpi4py` via pip or conda.*
    raps run-parts -x setonix

This will simulate synthetic workloads on two partitions as defined in `config/setonix-cpu` and `config/setonix-gpu`. To replay telemetry workloads from another system, e.g., Marconi100's PM100 dataset, first create a .npz snapshot of the telemetry data, e.g.,

    python main.py --system marconi100 -f /path/to/marconi100/job_table.parquet
    raps run-parts --system marconi100 -f /path/to/marconi100/job_table.parquet

This will dump a .npz file with a randomized name, e.g. ac23db.npz. Let's rename this file to pm100.npz for clarity. Note: can control-C when the simulation starts. Now, this pm100.npz file can be used with `multi-part-sim.py` as follows:
This will dump a .npz file with a randomized name, e.g. ac23db.npz. Let's rename this file to pm100.npz for clarity. Note: can control-C when the simulation starts. Now, this pm100.npz file can be used as follows:

    python multi-part-sim.py -x setonix/* -f pm100.npz --arrival poisson --scale 192
    raps run-parts -x setonix -f pm100.npz --arrival poisson --scale 192

## Modifications to telemetry replay

@@ -142,7 +133,8 @@ There are three ways to modify replaying of telemetry data:

1. `--arrival`. Changing the arrival time distribution - replay cases will default to `--arrival prescribed`, where the jobs will be submitted exactly as they were submitted on the physical machine. This can be changed to `--arrival poisson` to change when the jobs arrive, which is especially useful in cases where there may be gaps in time, e.g., when the system goes down for several days, or the system is is underutilized.
python main.py -f $DPATH/slurm/joblive/$DATEDIR,$DPATH/jobprofile/$DATEDIR --arrival poisson
2. `--policy`. Changing the way the jobs are scheduled. The `--policy` flag will be set by default to `replay` in cases where a telemetry file is provided, in which case the jobs will be scheduled according to the start times provided. Changing the `--policy` to `fcfs` or `backfill` will use the internal scheduler.

2. `--policy`. Changing the way the jobs are scheduled. The `--policy` flag will be set by default to `replay` in cases where a telemetry file is provided, in which case the jobs will be scheduled according to the start times provided. Changing the `--policy` to `fcfs` or `backfill` will use the internal scheduler, e.g.:

    python main.py -f $DPATH/slurm/joblive/$DATEDIR,$DPATH/jobprofile/$DATEDIR --policy fcfs --backfill firstfit -t 12h

@@ -152,11 +144,11 @@ python main.py -f $DPATH/slurm/joblive/$DATEDIR,$DPATH/jobprofile/$DATEDIR --pol

## Job-level power output example for replay of single job

    python main.py -f $DPATH/slurm/joblive/$DATEDIR,$DPATH/jobprofile/$DATEDIR --jid 1234567 -o
    raps run -f $DPATH/slurm/joblive/$DATEDIR,$DPATH/jobprofile/$DATEDIR --jid 1234567 -o

## Compute stats on telemetry data, e.g., average job arrival time

    python -m raps.telemetry -f $DPATH/slurm/joblive/$DATEDIR,$DPATH/jobprofile/$DATEDIR
    raps telemetry -f $DPATH/slurm/joblive/$DATEDIR,$DPATH/jobprofile/$DATEDIR

## Build and run Docker container

@@ -176,6 +168,39 @@ See instructions in [server/README.md](https://code.ornl.gov/exadigit/simulation

See instructions in [dashboard/README.md](https://code.ornl.gov/exadigit/simulation-dashboard)

## Running Tests

RAPS uses [pytest](https://docs.pytest.org/) for its test suite.  
Before running tests, ensure that you have a valid data directory available (e.g., `/opt/data`) and set the environment variable `RAPS_DATA_DIR` to point to it.

### Run all tests
```bash
RAPS_DATA_DIR=/opt/data pytest -n auto -x
```

By default, tests are parallelized with `pytest-xdist` (`-n auto`) to speed up execution.
The `-x` flag stops execution after the first failure. Add `-v` to run in verbose mode.

### Run tests on multi-partition systems

```bash
pytest -v -k "multi_part_sim"
```

### Run only network-related tests

```bash
RAPS_DATA_DIR=/opt/data pytest -n auto -x -m network
```

See `pytest.ini` for the different options for `-m`.

### Run a specific test file

```bash
RAPS_DATA_DIR=/opt/data pytest tests/systems/test_engine.py
```

### Contributing Code

Install pre-commit hooks as set by the project:
+3 −1
Original line number Diff line number Diff line
@@ -49,7 +49,9 @@ scheduler:
    NODE_FAIL: 0.01
network:
  topology: torus3d
  network_max_bw: 9600000000.0
  #topology: capacity
  #network_max_bw: 9.6E9
  network_max_bw: 1E7
  torus_x: 24
  torus_y: 24
  torus_z: 24

config/kestrel.yaml

0 → 100644
+53 −0
Original line number Diff line number Diff line
system:
  num_cdus: 6
  racks_per_cdu: 6
  nodes_per_rack: 80
  rectifiers_per_rack: 6
  chassis_per_rack: 1
  nodes_per_blade: 1
  switches_per_chassis: 5
  nics_per_node: 2
  rectifiers_per_chassis: 5
  nodes_per_rectifier: 4
  missing_racks: []
  down_nodes: []
  cpus_per_node: 1
  gpus_per_node: 4
  cpu_peak_flops: 396800000000.0
  gpu_peak_flops: 7800000000000.0
  cpu_fp_ratio: 0.69
  gpu_fp_ratio: 0.69

power:
  power_gpu_idle: 75
  power_gpu_max: 300
  power_cpu_idle: 100
  power_cpu_max: 800
  power_mem: 74.26
  power_nic: 21
  power_nvme: 45
  power_switch: 250
  power_cdu: 0
  power_update_freq: 20
  rectifier_peak_threshold: 13670
  sivoc_loss_constant: 0
  sivoc_efficiency: 1
  rectifier_loss_constant: 0
  rectifier_efficiency: 1
  power_cost: 0.094

scheduler:
  seed: 42
  job_arrival_time: 20
  mtbf: 11
  trace_quanta: 20
  min_wall_time: 3600
  max_wall_time: 43200
  ui_update_freq: 3600
  max_nodes_per_job: 3000
  job_end_probs:
    COMPLETED: 0.63
    FAILED: 0.13
    CANCELLED: 0.12
    TIMEOUT: 0.11
    NODE_FAIL: 0.01
 No newline at end of file
Loading