Add communication patterns and message size to network model (0edd9410) · Commits · ExaDigiT / sim-raps

docs/communication_patterns_report.md

0 → 100644

+164 −0

Original line number	Diff line number	Diff line
		# Communication Patterns and Message Size Implementation Report

		## Overview

		This report documents the implementation of configurable communication patterns and message sizes in the RAPS network model, enabling more realistic simulation of HPC application network behavior.

		## What Was Implemented

		### 1. Communication Patterns

		Two communication patterns are now supported:

		\| Pattern \| Description \| Traffic Distribution \|
		\|---------\|-------------\|---------------------\|
		\| `ALL_TO_ALL` \| Every node sends to every other node \| `tx / (N-1)` per peer \|
		\| `STENCIL_3D` \| Each node sends to 6 neighbors (±x, ±y, ±z) \| `tx / 6` per neighbor \|

		Implementation Details:

		- `CommunicationPattern` enum (`raps/job.py`): Defines pattern types
		- `factorize_3d(n)`: Maps N nodes to a virtual 3D grid (e.g., 8→2×2×2, 64→4×4×4)
		- `get_stencil_3d_neighbors()`: Computes 6 neighbors with periodic boundary conditions
		- `link_loads_for_job_stencil_3d()`: Routes traffic only to stencil neighbors
		- `link_loads_for_pattern()`: Dispatcher that selects the correct routing function

		### 2. Message Size

		Jobs can now specify a message size, which affects network overhead:

		\| Constant \| Value \| Use Case \|
		\|----------\|-------\|----------\|
		\| `MESSAGE_SIZE_64K` \| 64 KiB \| Small messages, higher overhead \|
		\| `MESSAGE_SIZE_1M` \| 1 MiB \| Large messages, lower overhead \|
		\| `None` \| N/A \| Raw bandwidth model (no overhead) \|

		Overhead Model:
		```
		effective_traffic = raw_traffic + (num_messages × header_overhead)
		num_messages = ceil(bytes_per_peer / message_size) × num_peers
		header_overhead = 64 bytes per message
		```

		### 3. Files Modified

		- `raps/job.py`: Added `CommunicationPattern` enum, `message_size` and `comm_pattern` to Job
		- `raps/network/base.py`: Added pattern routing functions and message overhead calculation
		- `raps/network/__init__.py`: Updated `NetworkModel.simulate_network_utilization()` to use patterns

		## Why These Choices

		### Communication Patterns

		All-to-all represents collective operations like `MPI_Alltoall`, common in FFT, matrix transpose, and some machine learning workloads. It creates high network load as traffic scales O(N²).

		3D Stencil represents nearest-neighbor communication patterns used in:
		- Computational fluid dynamics (CFD)
		- Weather/climate modeling
		- Finite difference methods
		- Many physics simulations

		Stencil patterns are O(N) in traffic and exhibit strong locality, making them ideal for torus topologies.

		### Message Size

		Message size affects real network performance through:
		1. Protocol overhead: Each message incurs fixed header costs
		2. Latency hiding: Smaller messages have worse bandwidth utilization
		3. Congestion dynamics: More messages = more contention for resources

		The 64 KiB and 1 MiB sizes represent common HPC message sizes—64K is typical for latency-sensitive applications, while 1M is common for bulk data transfers.

		## Test Results

		### Fat-Tree Topology (k=8, 128 nodes)

		\| Nodes \| Pattern \| Congestion \| Change vs All-to-All \|
		\|-------\|---------\|------------\|---------------------\|
		\| 8 \| all-to-all \| 14.63 \| — \|
		\| 8 \| stencil-3d \| 17.07 \| +17% (worse) \|
		\| 27 \| all-to-all \| 43.32 \| — \|
		\| 27 \| stencil-3d \| 51.20 \| +18% (worse) \|
		\| 64 \| all-to-all \| 78.02 \| — \|
		\| 64 \| stencil-3d \| 68.27 \| -12% (better) \|

		Analysis: On fat-tree, stencil shows worse congestion for small jobs because it concentrates traffic on fewer links (creating hotspots). At 64 nodes, stencil becomes beneficial as the traffic spreads more evenly.

		### 3D Torus Topology (4×4×4)

		\| Nodes \| Pattern \| Congestion \| Change vs All-to-All \|
		\|-------\|---------\|------------\|---------------------\|
		\| 8 \| all-to-all \| 44.80 \| — \|
		\| 8 \| stencil-3d \| 12.80 \| -71% (better) \|
		\| 27 \| all-to-all \| 166.40 \| — \|
		\| 27 \| stencil-3d \| 26.67 \| -84% (better) \|
		\| 64 \| all-to-all \| 403.20 \| — \|
		\| 64 \| stencil-3d \| 12.80 \| -97% (better) \|

		Analysis: On torus topology, stencil shows dramatic congestion reduction because:
		1. Torus is optimized for nearest-neighbor communication
		2. Stencil traffic stays local (1-hop to neighbors)
		3. All-to-all must traverse many hops, creating bottlenecks

		### Message Size Impact

		\| Message Size \| Overhead \|
		\|--------------\|----------\|
		\| None (raw) \| 0% \|
		\| 64 KiB \| ~0.1% \|
		\| 1 MiB \| ~0.006% \|

		The current header overhead model (64 bytes/message) produces minimal impact. This is realistic for large transfers but may underestimate overhead for latency-bound small-message workloads.

		## Usage Example

		```python
		from raps.job import job_dict, CommunicationPattern, MESSAGE_SIZE_64K, MESSAGE_SIZE_1M

		# CFD simulation with stencil pattern and 1 MiB messages
		cfd_job = job_dict(
		nodes_required=64,
		name='cfd_simulation',
		account='physics',
		id=1,
		scheduled_nodes=list(range(64)),
		cpu_trace=[0.8],
		gpu_trace=[0.9],
		ntx_trace=[5e9], # 5 GB/s per node
		nrx_trace=[5e9],
		comm_pattern=CommunicationPattern.STENCIL_3D,
		message_size=MESSAGE_SIZE_1M,
		)

		# ML training with all-to-all pattern and 64 KiB messages
		ml_job = job_dict(
		nodes_required=32,
		name='ml_training',
		account='ai',
		id=2,
		scheduled_nodes=list(range(32, 64)),
		cpu_trace=[0.5],
		gpu_trace=[0.95],
		ntx_trace=[10e9], # 10 GB/s per node
		nrx_trace=[10e9],
		comm_pattern=CommunicationPattern.ALL_TO_ALL,
		message_size=MESSAGE_SIZE_64K,
		)
		```

		## Key Findings

		1. Topology-pattern matching matters: Stencil patterns on torus show 70-97% congestion reduction vs all-to-all, while fat-tree shows mixed results.

		2. Job placement affects pattern efficiency: Stencil benefits require nodes to be allocated in a way that preserves locality.

		3. Message size overhead is minimal with the current model (~0.1% for 64K messages). Consider increasing header overhead or adding latency-based penalties for more impact.

		4. Pattern choice significantly affects congestion: Can be more impactful than message size for determining network performance.

		## Future Enhancements

		1. Additional patterns: Ring, tree, butterfly, 2D stencil, custom patterns
		2. Latency modeling: Small messages should incur latency penalties beyond just overhead
		3. Topology-aware stencil: Use actual torus coordinates when available instead of virtual grid
		4. Adaptive message sizing: Allow message size to vary with communication phase

docs/communication_patterns_report.tex

0 → 100644

+247 −0

Original line number	Diff line number	Diff line
		\documentclass[11pt]{article}
		\usepackage[margin=1in]{geometry}
		\usepackage{booktabs}
		\usepackage{graphicx}
		\usepackage{amsmath}
		\usepackage{listings}
		\usepackage{xcolor}
		\usepackage{hyperref}

		\lstset{
		language=Python,
		basicstyle=\ttfamily\small,
		keywordstyle=\color{blue},
		stringstyle=\color{red},
		commentstyle=\color{gray},
		frame=single,
		breaklines=true,
		showstringspaces=false
		}

		\title{Communication Patterns and Message Size Implementation Report}
		\author{RAPS Network Model}
		\date{\today}

		\begin{document}

		\maketitle

		\section{Overview}

		This report documents the implementation of configurable communication patterns and message sizes in the RAPS network model, enabling more realistic simulation of HPC application network behavior.

		\section{What Was Implemented}

		\subsection{Communication Patterns}

		Two communication patterns are now supported:

		\begin{table}[h]
		\centering
		\begin{tabular}{@{}lll@{}}
		\toprule
		Pattern & Description & Traffic Distribution \\
		\midrule
		\texttt{ALL\_TO\_ALL} & Every node sends to every other node & $tx / (N-1)$ per peer \\
		\texttt{STENCIL\_3D} & Each node sends to 6 neighbors ($\pm x, \pm y, \pm z$) & $tx / 6$ per neighbor \\
		\bottomrule
		\end{tabular}
		\caption{Supported communication patterns}
		\end{table}

		\textbf{Implementation Details:}

		\begin{itemize}
		\item \textbf{\texttt{CommunicationPattern} enum} (\texttt{raps/job.py}): Defines pattern types
		\item \textbf{\texttt{factorize\_3d(n)}}: Maps $N$ nodes to a virtual 3D grid (e.g., $8 \rightarrow 2 \times 2 \times 2$, $64 \rightarrow 4 \times 4 \times 4$)
		\item \textbf{\texttt{get\_stencil\_3d\_neighbors()}}: Computes 6 neighbors with periodic boundary conditions
		\item \textbf{\texttt{link\_loads\_for\_job\_stencil\_3d()}}: Routes traffic only to stencil neighbors
		\item \textbf{\texttt{link\_loads\_for\_pattern()}}: Dispatcher that selects the correct routing function
		\end{itemize}

		\subsection{Message Size}

		Jobs can now specify a message size, which affects network overhead:

		\begin{table}[h]
		\centering
		\begin{tabular}{@{}lll@{}}
		\toprule
		Constant & Value & Use Case \\
		\midrule
		\texttt{MESSAGE\_SIZE\_64K} & 64 KiB & Small messages, higher overhead \\
		\texttt{MESSAGE\_SIZE\_1M} & 1 MiB & Large messages, lower overhead \\
		\texttt{None} & N/A & Raw bandwidth model (no overhead) \\
		\bottomrule
		\end{tabular}
		\caption{Message size constants}
		\end{table}

		\textbf{Overhead Model:}
		\begin{align}
		\text{effective\_traffic} &= \text{raw\_traffic} + (\text{num\_messages} \times \text{header\_overhead}) \\
		\text{num\_messages} &= \lceil \text{bytes\_per\_peer} / \text{message\_size} \rceil \times \text{num\_peers} \\
		\text{header\_overhead} &= 64 \text{ bytes per message}
		\end{align}

		\subsection{Files Modified}

		\begin{itemize}
		\item \texttt{raps/job.py}: Added \texttt{CommunicationPattern} enum, \texttt{message\_size} and \texttt{comm\_pattern} to Job
		\item \texttt{raps/network/base.py}: Added pattern routing functions and message overhead calculation
		\item \texttt{raps/network/\_\_init\_\_.py}: Updated \texttt{NetworkModel.simulate\_network\_utilization()} to use patterns
		\end{itemize}

		\section{Why These Choices}

		\subsection{Communication Patterns}

		\textbf{All-to-all} represents collective operations like \texttt{MPI\_Alltoall}, common in FFT, matrix transpose, and some machine learning workloads. It creates high network load as traffic scales $O(N^2)$.

		\textbf{3D Stencil} represents nearest-neighbor communication patterns used in:
		\begin{itemize}
		\item Computational fluid dynamics (CFD)
		\item Weather/climate modeling
		\item Finite difference methods
		\item Many physics simulations
		\end{itemize}

		Stencil patterns are $O(N)$ in traffic and exhibit strong locality, making them ideal for torus topologies.

		\subsection{Message Size}

		Message size affects real network performance through:
		\begin{enumerate}
		\item \textbf{Protocol overhead}: Each message incurs fixed header costs
		\item \textbf{Latency hiding}: Smaller messages have worse bandwidth utilization
		\item \textbf{Congestion dynamics}: More messages = more contention for resources
		\end{enumerate}

		The 64 KiB and 1 MiB sizes represent common HPC message sizes---64K is typical for latency-sensitive applications, while 1M is common for bulk data transfers.

		\section{Test Results}

		\subsection{Fat-Tree Topology (k=8, 128 nodes)}

		\begin{table}[h]
		\centering
		\begin{tabular}{@{}llll@{}}
		\toprule
		Nodes & Pattern & Congestion & Change vs All-to-All \\
		\midrule
		8 & all-to-all & 14.63 & --- \\
		8 & stencil-3d & 17.07 & +17\% (worse) \\
		27 & all-to-all & 43.32 & --- \\
		27 & stencil-3d & 51.20 & +18\% (worse) \\
		64 & all-to-all & 78.02 & --- \\
		64 & stencil-3d & 68.27 & \textbf{-12\% (better)} \\
		\bottomrule
		\end{tabular}
		\caption{Congestion results on fat-tree topology}
		\end{table}

		\textbf{Analysis}: On fat-tree, stencil shows \textit{worse} congestion for small jobs because it concentrates traffic on fewer links (creating hotspots). At 64 nodes, stencil becomes beneficial as the traffic spreads more evenly.

		\subsection{3D Torus Topology ($4 \times 4 \times 4$)}

		\begin{table}[h]
		\centering
		\begin{tabular}{@{}llll@{}}
		\toprule
		Nodes & Pattern & Congestion & Change vs All-to-All \\
		\midrule
		8 & all-to-all & 44.80 & --- \\
		8 & stencil-3d & 12.80 & \textbf{-71\% (better)} \\
		27 & all-to-all & 166.40 & --- \\
		27 & stencil-3d & 26.67 & \textbf{-84\% (better)} \\
		64 & all-to-all & 403.20 & --- \\
		64 & stencil-3d & 12.80 & \textbf{-97\% (better)} \\
		\bottomrule
		\end{tabular}
		\caption{Congestion results on 3D torus topology}
		\end{table}

		\textbf{Analysis}: On torus topology, stencil shows \textit{dramatic} congestion reduction because:
		\begin{enumerate}
		\item Torus is optimized for nearest-neighbor communication
		\item Stencil traffic stays local (1-hop to neighbors)
		\item All-to-all must traverse many hops, creating bottlenecks
		\end{enumerate}

		\subsection{Message Size Impact}

		\begin{table}[h]
		\centering
		\begin{tabular}{@{}ll@{}}
		\toprule
		Message Size & Overhead \\
		\midrule
		None (raw) & 0\% \\
		64 KiB & $\sim$0.1\% \\
		1 MiB & $\sim$0.006\% \\
		\bottomrule
		\end{tabular}
		\caption{Message size overhead impact}
		\end{table}

		The current header overhead model (64 bytes/message) produces minimal impact. This is realistic for large transfers but may underestimate overhead for latency-bound small-message workloads.

		\section{Usage Example}

		\begin{lstlisting}
		from raps.job import (job_dict, CommunicationPattern,
		MESSAGE_SIZE_64K, MESSAGE_SIZE_1M)

		# CFD simulation with stencil pattern and 1 MiB messages
		cfd_job = job_dict(
		nodes_required=64,
		name='cfd_simulation',
		account='physics',
		id=1,
		scheduled_nodes=list(range(64)),
		cpu_trace=[0.8],
		gpu_trace=[0.9],
		ntx_trace=[5e9], # 5 GB/s per node
		nrx_trace=[5e9],
		comm_pattern=CommunicationPattern.STENCIL_3D,
		message_size=MESSAGE_SIZE_1M,
		)

		# ML training with all-to-all pattern and 64 KiB messages
		ml_job = job_dict(
		nodes_required=32,
		name='ml_training',
		account='ai',
		id=2,
		scheduled_nodes=list(range(32, 64)),
		cpu_trace=[0.5],
		gpu_trace=[0.95],
		ntx_trace=[10e9], # 10 GB/s per node
		nrx_trace=[10e9],
		comm_pattern=CommunicationPattern.ALL_TO_ALL,
		message_size=MESSAGE_SIZE_64K,
		)
		\end{lstlisting}

		\section{Key Findings}

		\begin{enumerate}
		\item \textbf{Topology-pattern matching matters}: Stencil patterns on torus show 70--97\% congestion reduction vs all-to-all, while fat-tree shows mixed results.

		\item \textbf{Job placement affects pattern efficiency}: Stencil benefits require nodes to be allocated in a way that preserves locality.

		\item \textbf{Message size overhead is minimal} with the current model ($\sim$0.1\% for 64K messages). Consider increasing header overhead or adding latency-based penalties for more impact.

		\item \textbf{Pattern choice significantly affects congestion}: Can be more impactful than message size for determining network performance.
		\end{enumerate}

		\section{Future Enhancements}

		\begin{enumerate}
		\item \textbf{Additional patterns}: Ring, tree, butterfly, 2D stencil, custom patterns
		\item \textbf{Latency modeling}: Small messages should incur latency penalties beyond just overhead
		\item \textbf{Topology-aware stencil}: Use actual torus coordinates when available instead of virtual grid
		\item \textbf{Adaptive message sizing}: Allow message size to vary with communication phase
		\end{enumerate}

		\end{document}

raps/job.py

+27 −2

Original line number	Diff line number	Diff line
		@@ -24,6 +24,17 @@ class JobState(Enum):
		TIMEOUT = 'TO'


		class CommunicationPattern(Enum):
		"""Communication patterns for network traffic modeling."""
		ALL_TO_ALL = "all-to-all" # Every node sends to every other node
		STENCIL_3D = "stencil-3d" # Each node sends to 6 neighbors (±x, ±y, ±z)


		# Standard message sizes for testing (in bytes)
		MESSAGE_SIZE_64K = 64 * 1024 # 64 KiB
		MESSAGE_SIZE_1M = 1024 * 1024 # 1 MiB


		def job_dict(*,
		nodes_required,
		name,
		@@ -57,9 +68,16 @@ def job_dict(*,
		trace_end_time: int \| None = 0,
		trace_quanta: int \| None = None,
		trace_missing_values: bool \| None = False,
		downscale: int = 1
		downscale: int = 1,
		# Communication parameters
		comm_pattern: CommunicationPattern \| str = CommunicationPattern.ALL_TO_ALL,
		message_size: int \| None = None, # bytes per message (None = raw bandwidth model)
		):
		""" Return job info dictionary """
		# Normalize comm_pattern to enum
		if isinstance(comm_pattern, str):
		comm_pattern = CommunicationPattern(comm_pattern)

		return {
		'nodes_required': nodes_required,
		'name': name,
		@@ -94,7 +112,10 @@ def job_dict(*,
		'trace_quanta': trace_quanta,
		'trace_missing_values': trace_missing_values,
		'dilated': False,
		'downscale': downscale
		'downscale': downscale,
		# Communication parameters:
		'comm_pattern': comm_pattern,
		'message_size': message_size,
		}


		@@ -181,6 +202,9 @@ class Job:
		self.trace_end_time = None # Relative end time of the trace
		self.trace_quanta = None # Trace quanta associated with the job # None means single value!
		self.current_run_time = 0 # Current running time updated when simulating
		# Communication parameters:
		self.comm_pattern = CommunicationPattern.ALL_TO_ALL
		self.message_size = None # None = raw bandwidth model (no message overhead)

		# If a job dict was given, override the values from the job_dict:
		for key, value in job_dict.items():
		@@ -222,6 +246,7 @@ class Job:
		f"allocated_gpu_units={self.allocated_gpu_units}, "
		f"cpu_trace={self.cpu_trace}, gpu_trace={self.gpu_trace}, "
		f"ntx_trace={self.ntx_trace}, nrx_trace={self.nrx_trace}, "
		f"comm_pattern={self.comm_pattern}, message_size={self.message_size}, "
		f"end_state={self.end_state}, "
		f"current_state={self.current_state}, "
		f"submit_time={self.submit_time}, time_limit={self.time_limit}, "

raps/network/init.py

+42 −8

Original line number	Diff line number	Diff line
		import os
		import warnings

		from raps.job import CommunicationPattern

		from .base import (
		all_to_all_paths,
		apply_job_slowdown,
		compute_system_network_stats,
		link_loads_for_job,
		link_loads_for_job_stencil_3d,
		link_loads_for_pattern,
		get_effective_traffic,
		apply_message_size_overhead,
		factorize_3d,
		stencil_3d_pairs,
		network_congestion,
		network_slowdown,
		network_utilization,
		@@ -31,6 +39,12 @@ __all__ = [
		"network_slowdown",
		"all_to_all_paths",
		"link_loads_for_job",
		"link_loads_for_job_stencil_3d",
		"link_loads_for_pattern",
		"get_effective_traffic",
		"apply_message_size_overhead",
		"factorize_3d",
		"stencil_3d_pairs",
		"worst_link_util",
		"build_fattree",
		"build_torus3d",
		@@ -115,11 +129,25 @@ class NetworkModel:

		net_tx = get_current_utilization(job.ntx_trace, job)
		net_rx = get_current_utilization(job.nrx_trace, job)
		net_util = network_utilization(net_tx, net_rx, max_throughput)

		# Get communication pattern and message size from job
		comm_pattern = getattr(job, 'comm_pattern', CommunicationPattern.ALL_TO_ALL)
		message_size = getattr(job, 'message_size', None)

		# Apply message size overhead if specified
		num_hosts = len(job.scheduled_nodes)
		effective_tx = get_effective_traffic(net_tx, job, num_hosts)
		effective_rx = get_effective_traffic(net_rx, job, num_hosts)

		net_util = network_utilization(effective_tx, effective_rx, max_throughput)

		if debug:
		print(f" comm_pattern: {comm_pattern}, message_size: {message_size}")
		print(f" raw tx/rx: {net_tx}/{net_rx}, effective tx/rx: {effective_tx}/{effective_rx}")

		if self.topology == "fat-tree":
		host_list = [node_id_to_host_name(n, self.fattree_k) for n in job.scheduled_nodes]
		loads = link_loads_for_job(self.net_graph, host_list, net_tx)
		loads = link_loads_for_pattern(self.net_graph, host_list, effective_tx, comm_pattern)
		net_cong = worst_link_util(loads, max_throughput)
		if debug:
		print(" fat-tree hosts:", host_list)
		@@ -134,7 +162,7 @@ class NetworkModel:
		print(" dragonfly hosts:", host_list)
		print("Example nodes in graph:", list(self.net_graph.nodes)[:10])
		print("Contains h_0_9_0?", "h_0_9_0" in self.net_graph)
		loads = link_loads_for_job(self.net_graph, host_list, net_tx)
		loads = link_loads_for_pattern(self.net_graph, host_list, effective_tx, comm_pattern)
		net_cong = worst_link_util(loads, max_throughput)

		elif self.topology == "torus3d":
		@@ -142,18 +170,24 @@ class NetworkModel:
		Y = self.config["TORUS_Y"]
		Z = self.config["TORUS_Z"]
		hosts_per_router = self.config["HOSTS_PER_ROUTER"]
		#host_list = [self.id_to_host[n] for n in job.scheduled_nodes]
		host_list = [
		torus_host_from_real_index(n, X, Y, Z, hosts_per_router)
		for n in job.scheduled_nodes
		]
		loads = link_loads_for_job_torus(self.net_graph, self.meta, host_list, net_tx)
		# For torus3d, use the specialized torus routing
		# but still apply the communication pattern for traffic distribution
		if comm_pattern == CommunicationPattern.STENCIL_3D:
		# Use pattern-aware loading for stencil on torus
		loads = link_loads_for_pattern(self.net_graph, host_list, effective_tx, comm_pattern)
		else:
		# Use torus-specific routing for all-to-all
		loads = link_loads_for_job_torus(self.net_graph, self.meta, host_list, effective_tx)
		net_cong = worst_link_util(loads, max_throughput)
		if debug:
		print(" torus3d hosts:", host_list)

		elif self.topology == "capacity":
		net_cong = network_congestion(net_tx, net_rx, max_throughput)
		net_cong = network_congestion(effective_tx, effective_rx, max_throughput)

		else:
		raise ValueError(f"Unsupported topology: {self.topology}")

raps/network/base.py

+248 −2

File changed.

Preview size limit exceeded, changes collapsed.