Update README.md (e26481ad) · Commits · workflow / OLCF-6 Workflow Benchmark

README.md

+28 −4

Original line number	Diff line number	Diff line
		@@ -30,6 +30,33 @@ pip install -r requirements-torch.txt
		pip install -r requirements.txt
		```

		### Software Dependencies

		Dependencies for the benchmark are listed in the following files:

		```
		env.sh
		requirements-torch.txt
		requirements.txt
		```

		Specifically, the benchmark depends on:
		- GCC 10.3+
		- ROCm 6.0.0
		- Python 3.8+
		- torch 2.3.1+rocm6.0
		- torchvision 0.18.1+rocm6.0
		- torchaudio 2.3.1+rocm6.0
		- pytorch_lightning 2.3.0
		- pytorch_forecasting
		- numpy 1.26.4
		- pandas 1.5.3
		- pyyaml
		- pyzmq
		- matplotlib
		- scikit-learn
		- optuna_integration

		## Mechanics of Running Benchmark

		The benchmark tests were consistently performed utilizing the Frontier supercomputer at Oak Ridge National Laboratory (ORNL).
		@@ -61,12 +88,9 @@ The benchmark’s scale can be adjusted by altering the number of replicas in th

		- _Weak Scaling Experiments:_ Each rank at level 1 (refer to Figure 2) trains a TFT model on 64 ([4, 4, 4]) voxels, and the level 2 rank operates on [2, 2, 2] mean voxels of level 1. Consequently, for an input with dimensions 8 × 8 × 8, a total of 9 ranks (eight in level 1 and one in level 2) are required. For a larger input of 64 × 64 × 64, a total of 4608 ranks are needed, divided into 4096 in level 1 and 512 in level 2.

		- _Strong Scaling Experiments:_ The input dimensions are fixed at 32 × 32 × 32. The level 1 mapping can be altered from [2, 2, 2], [4, 2, 2] to [4, 4, 4], while maintaining the level 2 mapping at [2, 2, 2]. Under these conditions, 4608 ranks are necessary for the [2, 2, 2] mapping, while 576 ranks are required for the [4, 4, 4] configuration.


		## Figure of Merit

		The primary figure of merit for the ML4NSE (defined in detail in https://doi.org/10.1615/JMachLearnModelComput.2023048607) workflow benchmark is the number of voxels per second: `(#voxels * #replicas) / (workflow_makespan)`
		The primary figure of merit for the ML4NSE (defined in detail in https://doi.org/10.1615/JMachLearnModelComput.2023048607) workflow benchmark is the number of voxels per second: `(#voxels * #replicas) / (workflow_makespan)` (workflow makespan represents the total time to run all replicas in the workflow.)

		Some useful secondary figures of merit will be: