Update README.md (137af47d) · Commits · workflow / OLCF-6 Workflow Benchmark

README.md

+9 −0

Original line number	Diff line number	Diff line
		@@ -9,6 +9,10 @@

		The objective of this workflow benchmark is to assess the capability of the High-Performance Computing (HPC) system in supporting dynamic workloads that originate from various data stream sources. These workloads will be processed within the compute nodes. The processed data will then be made available to a wide array of consumers, each potentially consuming unique abstractions of the data. We target an ML4NSE (Machine Learning for Neutron Scattering Experiment) application that employs a Temporal Fusion Transformer (TFT) model to both train on and predict the measurement time for a distinct cluster of peaks. This cluster includes a robust nuclear peak along with six weaker satellite peaks, resulting from the magnetic ordering within a single-crystal sample. The primary objective of this code is to enable near real-time decision-making by leveraging the combined powers of Machine Learning (ML) and High-Performance Computing (HPC). This benchmark provides a self-contained, end-to-end evaluation of a coupled compute/data problem.

		<img src="docs/workflow-benchmark.png" />

		_Abstract overview of a reference implementation of the workflow benchmark._

		## Characteristics of Benchmark

		Data is channeled from multiple sources to a gateway node using an open-source streaming transport layer capable of handling structured data, not just byte sequences (e.g., ZeroMQ, RabbitMQ). One input stream to the gateway node may potentially serve as a control stream, directing how to multiplex, filter, window, or otherwise service the other data streams. This orchestrates the creation of a single stream of structured data, possibly distinct in structure from the input streams, which is then forwarded to a listener at the Services Cluster, ensuring efficient information flow.
		@@ -33,5 +37,10 @@ sbatch job.sh

		## Run Rules

		- _Weak Scaling Experiments:_ Each rank at level 1 (refer to Figure 2) trains a TFT model on 64 ([4, 4, 4]) voxels, and the level 2 rank operates on [2, 2, 2] mean voxels of level 1. Consequently, for an input with dimensions 8 × 8 × 8, a total of 9 ranks (eight in level 1 and one in level 2) are required. For a larger input of 64 × 64 × 64, a total of 4608 ranks are needed, divided into 4096 in level 1 and 512 in level 2.

		- _Strong Scaling Experiments:_ The input dimensions are fixed at 32 × 32 × 32. The level 1 mapping can be altered from [2, 2, 2], [4, 2, 2] to [4, 4, 4], while maintaining the level 2 mapping at [2, 2, 2]. Under these conditions, 4608 ranks are necessary for the [2, 2, 2] mapping, while 576 ranks are required for the [4, 4, 4] configuration.


		## Figure of Merit