Dependencies for the benchmark are listed in the following files:
```
env.sh
requirements-torch.txt
requirements.txt
```
Specifically, the benchmark depends on:
- GCC 10.3+
- ROCm 6.0.0
- Python 3.8+
- torch 2.3.1+rocm6.0
- torchvision 0.18.1+rocm6.0
- torchaudio 2.3.1+rocm6.0
- pytorch_lightning 2.3.0
- pytorch_forecasting
- numpy 1.26.4
- pandas 1.5.3
- pyyaml
- pyzmq
- matplotlib
- scikit-learn
- optuna_integration
## Mechanics of Running Benchmark
The benchmark tests were consistently performed utilizing the Frontier supercomputer at Oak Ridge National Laboratory (ORNL).
@@ -61,12 +88,9 @@ The benchmark’s scale can be adjusted by altering the number of replicas in th
- _Weak Scaling Experiments:_ Each rank at level 1 (refer to Figure 2) trains a TFT model on 64 ([4, 4, 4]) voxels, and the level 2 rank operates on [2, 2, 2] mean voxels of level 1. Consequently, for an input with dimensions 8 × 8 × 8, a total of 9 ranks (eight in level 1 and one in level 2) are required. For a larger input of 64 × 64 × 64, a total of 4608 ranks are needed, divided into 4096 in level 1 and 512 in level 2.
- _Strong Scaling Experiments:_ The input dimensions are fixed at 32 × 32 × 32. The level 1 mapping can be altered from [2, 2, 2], [4, 2, 2] to [4, 4, 4], while maintaining the level 2 mapping at [2, 2, 2]. Under these conditions, 4608 ranks are necessary for the [2, 2, 2] mapping, while 576 ranks are required for the [4, 4, 4] configuration.
## Figure of Merit
The primary figure of merit for the ML4NSE (defined in detail in https://doi.org/10.1615/JMachLearnModelComput.2023048607) workflow benchmark is the number of voxels per second: `(#voxels * #replicas) / (workflow_makespan)`
The primary figure of merit for the ML4NSE (defined in detail in https://doi.org/10.1615/JMachLearnModelComput.2023048607) workflow benchmark is the number of voxels per second: `(#voxels * #replicas) / (workflow_makespan)` (workflow makespan represents the total time to run all replicas in the workflow.)