Update README.md (751c60ba) · Commits · workflow / OLCF-6 Workflow Benchmark

README.md

+7 −7

Original line number	Diff line number	Diff line
		@@ -53,6 +53,10 @@ Once the job starts running, it will hold waiting for the data stream. To start
		python sender.py
		```

		### Changing the Scale of the Benchmark

		The benchmark’s scale can be adjusted by altering the number of replicas in the workflow execution. At its smallest scale, the benchmark utilizes 9 nodes (a single replica). To increase the number of replicas, modify the `job.sb` submission script by updating the node count and the number of replicas. For example, to use 900 nodes and 100 replicas, adjust the script to `#SBATCH -N 900` and set `REPLICAS=100` accordingly.

		## Run Rules

		- _Weak Scaling Experiments:_ Each rank at level 1 (refer to Figure 2) trains a TFT model on 64 ([4, 4, 4]) voxels, and the level 2 rank operates on [2, 2, 2] mean voxels of level 1. Consequently, for an input with dimensions 8 × 8 × 8, a total of 9 ranks (eight in level 1 and one in level 2) are required. For a larger input of 64 × 64 × 64, a total of 4608 ranks are needed, divided into 4096 in level 1 and 512 in level 2.
		@@ -70,10 +74,6 @@ The primary figure of merit for the ML4NSE (defined in detail in https://doi.org

		An additional potential figure of merit that could demonstrate the robustness of the system would include any active guidance between the Compute and Services Clusters; the latency involved in control operations becomes crucial. Specifically, it's essential to assess the duration a compute job is held while disseminating new control information. Also, as additional input streams and output consumers are added, the effect on end-to-end time-to-solution could be affected.

		\| Dataset \| Dimension \| # Nodes \| Sending Transfer Rate \| Avg. Receiving Transfer Rate (per rank) \| Throughput \|
		\| ------ \| ------ \| ------ \| ------ \| ------ \| ------ \|
		\| p_322_data_np_res_16.npy \| 16 x 16 x 16 \| 9 \| 62.79 Gbps \| 10.43 Mbps \| 1.144 × 10<sup>-3</sup> \|
		\| p_322_data_np_res_32.npy \| 16 x 16 x 16 \| 9 \| 68.68 Gbps \| 125.72 Mbps \| 1.131 × 10<sup>-3</sup> \|
		\| p_322_data_np_res_64.npy \| 16 x 16 x 16 \| 9 \| 62.17 Gbps \| 555.57 Mbps \| 1.112 × 10<sup>-3</sup> \|
		\| p_322_data_np_res_32.npy \| 32 x 32 x 32 \| 72 \| 67.17 Gbps \| 60.26 Mbps \| 0.927 × 10<sup>-3</sup> \|
		\| p_322_data_np_res_64.npy \| 32 x 32 x 32 \| 72 \| 70.08 Gbps \| 95.51 Mbps \| 0.636 × 10<sup>-3</sup> \|
		\| Dataset \| Dimension \| # Nodes \| # Replicas \| Sending Transfer Rate \| Avg. Receiving Transfer Rate (per rank) \| Throughput \|
		\| ------ \| ------ \| ------ \| ------ \| ------ \| ------ \| ------ \|
		\| p_322_data_np_res_16.npy \| 16 x 16 x 16 \| 900 \| 100 \| 62.79 Gbps \| 10.43 Mbps \| 1.144 × 10<sup>-3</sup> \|