Commit 751c60ba authored by Ferreira Da Silva, Rafael's avatar Ferreira Da Silva, Rafael
Browse files

Update README.md

parent 6f728426
Loading
Loading
Loading
Loading
+7 −7
Original line number Diff line number Diff line
@@ -53,6 +53,10 @@ Once the job starts running, it will hold waiting for the data stream. To start
python sender.py
```

### Changing the Scale of the Benchmark

The benchmark’s scale can be adjusted by altering the number of replicas in the workflow execution. At its smallest scale, the benchmark utilizes 9 nodes (a single replica). To increase the number of replicas, modify the `job.sb` submission script by updating the node count and the number of replicas. For example, to use 900 nodes and 100 replicas, adjust the script to `#SBATCH -N 900` and set `REPLICAS=100` accordingly.

## Run Rules

- _Weak Scaling Experiments:_ Each rank at level 1 (refer to Figure 2) trains a TFT model on 64 ([4, 4, 4]) voxels, and the level 2 rank operates on [2, 2, 2] mean voxels of level 1. Consequently, for an input with dimensions 8 × 8 × 8, a total of 9 ranks (eight in level 1 and one in level 2) are required. For a larger input of 64 × 64 × 64, a total of 4608 ranks are needed, divided into 4096 in level 1 and 512 in level 2.
@@ -70,10 +74,6 @@ The primary figure of merit for the ML4NSE (defined in detail in https://doi.org

An additional potential figure of merit that could demonstrate the robustness of the system would include any active guidance between the Compute and Services Clusters; the latency involved in control operations becomes crucial. Specifically, it's essential to assess the duration a compute job is held while disseminating new control information. Also, as additional input streams and output consumers are added, the effect on end-to-end time-to-solution could be affected.

| Dataset | Dimension | # Nodes | Sending Transfer Rate | Avg. Receiving Transfer Rate (per rank) | Throughput |
| ------ | ------ | ------ | ------ | ------ | ------ |
| p_322_data_np_res_16.npy | 16 x 16 x 16 | 9 | 62.79 Gbps | 10.43 Mbps | 1.144 × 10<sup>-3</sup> |
| p_322_data_np_res_32.npy | 16 x 16 x 16 | 9 | 68.68 Gbps | 125.72 Mbps | 1.131 × 10<sup>-3</sup> |
| p_322_data_np_res_64.npy | 16 x 16 x 16 | 9 | 62.17 Gbps | 555.57 Mbps | 1.112 × 10<sup>-3</sup> |
| p_322_data_np_res_32.npy | 32 x 32 x 32 | 72 | 67.17 Gbps | 60.26 Mbps | 0.927 × 10<sup>-3</sup> |
| p_322_data_np_res_64.npy | 32 x 32 x 32 | 72 | 70.08 Gbps | 95.51 Mbps | 0.636 × 10<sup>-3</sup> |
| Dataset | Dimension | # Nodes | # Replicas | Sending Transfer Rate | Avg. Receiving Transfer Rate (per rank) | Throughput |
| ------ | ------ | ------ | ------ | ------ | ------ | ------ |
| p_322_data_np_res_16.npy | 16 x 16 x 16 | 900 | 100 | 62.79 Gbps | 10.43 Mbps | 1.144 × 10<sup>-3</sup> |