Commit e26481ad authored by Ferreira Da Silva, Rafael's avatar Ferreira Da Silva, Rafael
Browse files

Update README.md

parent b35d0d65
Loading
Loading
Loading
Loading
+28 −4
Original line number Diff line number Diff line
@@ -30,6 +30,33 @@ pip install -r requirements-torch.txt
pip install -r requirements.txt
```

### Software Dependencies

Dependencies for the benchmark are listed in the following files:

```
env.sh
requirements-torch.txt 
requirements.txt 
```

Specifically, the benchmark depends on:
- GCC 10.3+
- ROCm 6.0.0
- Python 3.8+
    - torch 2.3.1+rocm6.0
    - torchvision 0.18.1+rocm6.0
    - torchaudio 2.3.1+rocm6.0
    - pytorch_lightning 2.3.0
    - pytorch_forecasting
    - numpy 1.26.4
    - pandas 1.5.3
    - pyyaml
    - pyzmq
    - matplotlib
    - scikit-learn
    - optuna_integration

## Mechanics of Running Benchmark

The benchmark tests were consistently performed utilizing the Frontier supercomputer at Oak Ridge National Laboratory (ORNL).
@@ -61,12 +88,9 @@ The benchmark’s scale can be adjusted by altering the number of replicas in th

- _Weak Scaling Experiments:_ Each rank at level 1 (refer to Figure 2) trains a TFT model on 64 ([4, 4, 4]) voxels, and the level 2 rank operates on [2, 2, 2] mean voxels of level 1. Consequently, for an input with dimensions 8 × 8 × 8, a total of 9 ranks (eight in level 1 and one in level 2) are required. For a larger input of 64 × 64 × 64, a total of 4608 ranks are needed, divided into 4096 in level 1 and 512 in level 2.

- _Strong Scaling Experiments:_ The input dimensions are fixed at 32 × 32 × 32. The level 1 mapping can be altered from [2, 2, 2], [4, 2, 2] to [4, 4, 4], while maintaining the level 2 mapping at [2, 2, 2]. Under these conditions, 4608 ranks are necessary for the [2, 2, 2] mapping, while 576 ranks are required for the [4, 4, 4] configuration.


## Figure of Merit

The primary figure of merit for the ML4NSE (defined in detail in https://doi.org/10.1615/JMachLearnModelComput.2023048607) workflow benchmark is the number of voxels per second: `(#voxels * #replicas) / (workflow_makespan)`
The primary figure of merit for the ML4NSE (defined in detail in https://doi.org/10.1615/JMachLearnModelComput.2023048607) workflow benchmark is the number of voxels per second: `(#voxels * #replicas) / (workflow_makespan)` (workflow makespan represents the total time to run all replicas in the workflow.)

Some useful secondary figures of merit will be: