Commit 0edd9410 authored by Brewer, Wes's avatar Brewer, Wes
Browse files

Add communication patterns and message size to network model

- Add CommunicationPattern enum (ALL_TO_ALL, STENCIL_3D) to Job class
- Add message_size parameter (MESSAGE_SIZE_64K, MESSAGE_SIZE_1M constants)
- Implement stencil-3d pattern with virtual 3D grid mapping (factorize_3d)
- Add message size overhead calculation based on header costs
- Update NetworkModel.simulate_network_utilization() to use patterns
- Update simulate_inter_job_congestion() for pattern-aware routing
- Add documentation report with test results (md and tex)

Test results show stencil-3d reduces congestion by 70-97% on torus
topology vs all-to-all, while fat-tree shows mixed results depending
on job size.

🤖 Generated with [Claude Code](https://claude.com/claude-code

)

Co-Authored-By: default avatarClaude Opus 4.5 <noreply@anthropic.com>
parent 19470f4e
Loading
Loading
Loading
Loading
+164 −0
Original line number Diff line number Diff line
# Communication Patterns and Message Size Implementation Report

## Overview

This report documents the implementation of configurable communication patterns and message sizes in the RAPS network model, enabling more realistic simulation of HPC application network behavior.

## What Was Implemented

### 1. Communication Patterns

Two communication patterns are now supported:

| Pattern | Description | Traffic Distribution |
|---------|-------------|---------------------|
| `ALL_TO_ALL` | Every node sends to every other node | `tx / (N-1)` per peer |
| `STENCIL_3D` | Each node sends to 6 neighbors (±x, ±y, ±z) | `tx / 6` per neighbor |

**Implementation Details:**

- **`CommunicationPattern` enum** (`raps/job.py`): Defines pattern types
- **`factorize_3d(n)`**: Maps N nodes to a virtual 3D grid (e.g., 8→2×2×2, 64→4×4×4)
- **`get_stencil_3d_neighbors()`**: Computes 6 neighbors with periodic boundary conditions
- **`link_loads_for_job_stencil_3d()`**: Routes traffic only to stencil neighbors
- **`link_loads_for_pattern()`**: Dispatcher that selects the correct routing function

### 2. Message Size

Jobs can now specify a message size, which affects network overhead:

| Constant | Value | Use Case |
|----------|-------|----------|
| `MESSAGE_SIZE_64K` | 64 KiB | Small messages, higher overhead |
| `MESSAGE_SIZE_1M` | 1 MiB | Large messages, lower overhead |
| `None` | N/A | Raw bandwidth model (no overhead) |

**Overhead Model:**
```
effective_traffic = raw_traffic + (num_messages × header_overhead)
num_messages = ceil(bytes_per_peer / message_size) × num_peers
header_overhead = 64 bytes per message
```

### 3. Files Modified

- `raps/job.py`: Added `CommunicationPattern` enum, `message_size` and `comm_pattern` to Job
- `raps/network/base.py`: Added pattern routing functions and message overhead calculation
- `raps/network/__init__.py`: Updated `NetworkModel.simulate_network_utilization()` to use patterns

## Why These Choices

### Communication Patterns

**All-to-all** represents collective operations like `MPI_Alltoall`, common in FFT, matrix transpose, and some machine learning workloads. It creates high network load as traffic scales O(N²).

**3D Stencil** represents nearest-neighbor communication patterns used in:
- Computational fluid dynamics (CFD)
- Weather/climate modeling
- Finite difference methods
- Many physics simulations

Stencil patterns are O(N) in traffic and exhibit strong locality, making them ideal for torus topologies.

### Message Size

Message size affects real network performance through:
1. **Protocol overhead**: Each message incurs fixed header costs
2. **Latency hiding**: Smaller messages have worse bandwidth utilization
3. **Congestion dynamics**: More messages = more contention for resources

The 64 KiB and 1 MiB sizes represent common HPC message sizes—64K is typical for latency-sensitive applications, while 1M is common for bulk data transfers.

## Test Results

### Fat-Tree Topology (k=8, 128 nodes)

| Nodes | Pattern | Congestion | Change vs All-to-All |
|-------|---------|------------|---------------------|
| 8 | all-to-all | 14.63 | — |
| 8 | stencil-3d | 17.07 | +17% (worse) |
| 27 | all-to-all | 43.32 | — |
| 27 | stencil-3d | 51.20 | +18% (worse) |
| 64 | all-to-all | 78.02 | — |
| 64 | stencil-3d | 68.27 | **-12% (better)** |

**Analysis**: On fat-tree, stencil shows *worse* congestion for small jobs because it concentrates traffic on fewer links (creating hotspots). At 64 nodes, stencil becomes beneficial as the traffic spreads more evenly.

### 3D Torus Topology (4×4×4)

| Nodes | Pattern | Congestion | Change vs All-to-All |
|-------|---------|------------|---------------------|
| 8 | all-to-all | 44.80 | — |
| 8 | stencil-3d | 12.80 | **-71% (better)** |
| 27 | all-to-all | 166.40 | — |
| 27 | stencil-3d | 26.67 | **-84% (better)** |
| 64 | all-to-all | 403.20 | — |
| 64 | stencil-3d | 12.80 | **-97% (better)** |

**Analysis**: On torus topology, stencil shows *dramatic* congestion reduction because:
1. Torus is optimized for nearest-neighbor communication
2. Stencil traffic stays local (1-hop to neighbors)
3. All-to-all must traverse many hops, creating bottlenecks

### Message Size Impact

| Message Size | Overhead |
|--------------|----------|
| None (raw) | 0% |
| 64 KiB | ~0.1% |
| 1 MiB | ~0.006% |

The current header overhead model (64 bytes/message) produces minimal impact. This is realistic for large transfers but may underestimate overhead for latency-bound small-message workloads.

## Usage Example

```python
from raps.job import job_dict, CommunicationPattern, MESSAGE_SIZE_64K, MESSAGE_SIZE_1M

# CFD simulation with stencil pattern and 1 MiB messages
cfd_job = job_dict(
    nodes_required=64,
    name='cfd_simulation',
    account='physics',
    id=1,
    scheduled_nodes=list(range(64)),
    cpu_trace=[0.8],
    gpu_trace=[0.9],
    ntx_trace=[5e9],  # 5 GB/s per node
    nrx_trace=[5e9],
    comm_pattern=CommunicationPattern.STENCIL_3D,
    message_size=MESSAGE_SIZE_1M,
)

# ML training with all-to-all pattern and 64 KiB messages
ml_job = job_dict(
    nodes_required=32,
    name='ml_training',
    account='ai',
    id=2,
    scheduled_nodes=list(range(32, 64)),
    cpu_trace=[0.5],
    gpu_trace=[0.95],
    ntx_trace=[10e9],  # 10 GB/s per node
    nrx_trace=[10e9],
    comm_pattern=CommunicationPattern.ALL_TO_ALL,
    message_size=MESSAGE_SIZE_64K,
)
```

## Key Findings

1. **Topology-pattern matching matters**: Stencil patterns on torus show 70-97% congestion reduction vs all-to-all, while fat-tree shows mixed results.

2. **Job placement affects pattern efficiency**: Stencil benefits require nodes to be allocated in a way that preserves locality.

3. **Message size overhead is minimal** with the current model (~0.1% for 64K messages). Consider increasing header overhead or adding latency-based penalties for more impact.

4. **Pattern choice significantly affects congestion**: Can be more impactful than message size for determining network performance.

## Future Enhancements

1. **Additional patterns**: Ring, tree, butterfly, 2D stencil, custom patterns
2. **Latency modeling**: Small messages should incur latency penalties beyond just overhead
3. **Topology-aware stencil**: Use actual torus coordinates when available instead of virtual grid
4. **Adaptive message sizing**: Allow message size to vary with communication phase
+247 −0
Original line number Diff line number Diff line
\documentclass[11pt]{article}
\usepackage[margin=1in]{geometry}
\usepackage{booktabs}
\usepackage{graphicx}
\usepackage{amsmath}
\usepackage{listings}
\usepackage{xcolor}
\usepackage{hyperref}

\lstset{
    language=Python,
    basicstyle=\ttfamily\small,
    keywordstyle=\color{blue},
    stringstyle=\color{red},
    commentstyle=\color{gray},
    frame=single,
    breaklines=true,
    showstringspaces=false
}

\title{Communication Patterns and Message Size Implementation Report}
\author{RAPS Network Model}
\date{\today}

\begin{document}

\maketitle

\section{Overview}

This report documents the implementation of configurable communication patterns and message sizes in the RAPS network model, enabling more realistic simulation of HPC application network behavior.

\section{What Was Implemented}

\subsection{Communication Patterns}

Two communication patterns are now supported:

\begin{table}[h]
\centering
\begin{tabular}{@{}lll@{}}
\toprule
Pattern & Description & Traffic Distribution \\
\midrule
\texttt{ALL\_TO\_ALL} & Every node sends to every other node & $tx / (N-1)$ per peer \\
\texttt{STENCIL\_3D} & Each node sends to 6 neighbors ($\pm x, \pm y, \pm z$) & $tx / 6$ per neighbor \\
\bottomrule
\end{tabular}
\caption{Supported communication patterns}
\end{table}

\textbf{Implementation Details:}

\begin{itemize}
    \item \textbf{\texttt{CommunicationPattern} enum} (\texttt{raps/job.py}): Defines pattern types
    \item \textbf{\texttt{factorize\_3d(n)}}: Maps $N$ nodes to a virtual 3D grid (e.g., $8 \rightarrow 2 \times 2 \times 2$, $64 \rightarrow 4 \times 4 \times 4$)
    \item \textbf{\texttt{get\_stencil\_3d\_neighbors()}}: Computes 6 neighbors with periodic boundary conditions
    \item \textbf{\texttt{link\_loads\_for\_job\_stencil\_3d()}}: Routes traffic only to stencil neighbors
    \item \textbf{\texttt{link\_loads\_for\_pattern()}}: Dispatcher that selects the correct routing function
\end{itemize}

\subsection{Message Size}

Jobs can now specify a message size, which affects network overhead:

\begin{table}[h]
\centering
\begin{tabular}{@{}lll@{}}
\toprule
Constant & Value & Use Case \\
\midrule
\texttt{MESSAGE\_SIZE\_64K} & 64 KiB & Small messages, higher overhead \\
\texttt{MESSAGE\_SIZE\_1M} & 1 MiB & Large messages, lower overhead \\
\texttt{None} & N/A & Raw bandwidth model (no overhead) \\
\bottomrule
\end{tabular}
\caption{Message size constants}
\end{table}

\textbf{Overhead Model:}
\begin{align}
\text{effective\_traffic} &= \text{raw\_traffic} + (\text{num\_messages} \times \text{header\_overhead}) \\
\text{num\_messages} &= \lceil \text{bytes\_per\_peer} / \text{message\_size} \rceil \times \text{num\_peers} \\
\text{header\_overhead} &= 64 \text{ bytes per message}
\end{align}

\subsection{Files Modified}

\begin{itemize}
    \item \texttt{raps/job.py}: Added \texttt{CommunicationPattern} enum, \texttt{message\_size} and \texttt{comm\_pattern} to Job
    \item \texttt{raps/network/base.py}: Added pattern routing functions and message overhead calculation
    \item \texttt{raps/network/\_\_init\_\_.py}: Updated \texttt{NetworkModel.simulate\_network\_utilization()} to use patterns
\end{itemize}

\section{Why These Choices}

\subsection{Communication Patterns}

\textbf{All-to-all} represents collective operations like \texttt{MPI\_Alltoall}, common in FFT, matrix transpose, and some machine learning workloads. It creates high network load as traffic scales $O(N^2)$.

\textbf{3D Stencil} represents nearest-neighbor communication patterns used in:
\begin{itemize}
    \item Computational fluid dynamics (CFD)
    \item Weather/climate modeling
    \item Finite difference methods
    \item Many physics simulations
\end{itemize}

Stencil patterns are $O(N)$ in traffic and exhibit strong locality, making them ideal for torus topologies.

\subsection{Message Size}

Message size affects real network performance through:
\begin{enumerate}
    \item \textbf{Protocol overhead}: Each message incurs fixed header costs
    \item \textbf{Latency hiding}: Smaller messages have worse bandwidth utilization
    \item \textbf{Congestion dynamics}: More messages = more contention for resources
\end{enumerate}

The 64 KiB and 1 MiB sizes represent common HPC message sizes---64K is typical for latency-sensitive applications, while 1M is common for bulk data transfers.

\section{Test Results}

\subsection{Fat-Tree Topology (k=8, 128 nodes)}

\begin{table}[h]
\centering
\begin{tabular}{@{}llll@{}}
\toprule
Nodes & Pattern & Congestion & Change vs All-to-All \\
\midrule
8 & all-to-all & 14.63 & --- \\
8 & stencil-3d & 17.07 & +17\% (worse) \\
27 & all-to-all & 43.32 & --- \\
27 & stencil-3d & 51.20 & +18\% (worse) \\
64 & all-to-all & 78.02 & --- \\
64 & stencil-3d & 68.27 & \textbf{-12\% (better)} \\
\bottomrule
\end{tabular}
\caption{Congestion results on fat-tree topology}
\end{table}

\textbf{Analysis}: On fat-tree, stencil shows \textit{worse} congestion for small jobs because it concentrates traffic on fewer links (creating hotspots). At 64 nodes, stencil becomes beneficial as the traffic spreads more evenly.

\subsection{3D Torus Topology ($4 \times 4 \times 4$)}

\begin{table}[h]
\centering
\begin{tabular}{@{}llll@{}}
\toprule
Nodes & Pattern & Congestion & Change vs All-to-All \\
\midrule
8 & all-to-all & 44.80 & --- \\
8 & stencil-3d & 12.80 & \textbf{-71\% (better)} \\
27 & all-to-all & 166.40 & --- \\
27 & stencil-3d & 26.67 & \textbf{-84\% (better)} \\
64 & all-to-all & 403.20 & --- \\
64 & stencil-3d & 12.80 & \textbf{-97\% (better)} \\
\bottomrule
\end{tabular}
\caption{Congestion results on 3D torus topology}
\end{table}

\textbf{Analysis}: On torus topology, stencil shows \textit{dramatic} congestion reduction because:
\begin{enumerate}
    \item Torus is optimized for nearest-neighbor communication
    \item Stencil traffic stays local (1-hop to neighbors)
    \item All-to-all must traverse many hops, creating bottlenecks
\end{enumerate}

\subsection{Message Size Impact}

\begin{table}[h]
\centering
\begin{tabular}{@{}ll@{}}
\toprule
Message Size & Overhead \\
\midrule
None (raw) & 0\% \\
64 KiB & $\sim$0.1\% \\
1 MiB & $\sim$0.006\% \\
\bottomrule
\end{tabular}
\caption{Message size overhead impact}
\end{table}

The current header overhead model (64 bytes/message) produces minimal impact. This is realistic for large transfers but may underestimate overhead for latency-bound small-message workloads.

\section{Usage Example}

\begin{lstlisting}
from raps.job import (job_dict, CommunicationPattern,
                      MESSAGE_SIZE_64K, MESSAGE_SIZE_1M)

# CFD simulation with stencil pattern and 1 MiB messages
cfd_job = job_dict(
    nodes_required=64,
    name='cfd_simulation',
    account='physics',
    id=1,
    scheduled_nodes=list(range(64)),
    cpu_trace=[0.8],
    gpu_trace=[0.9],
    ntx_trace=[5e9],  # 5 GB/s per node
    nrx_trace=[5e9],
    comm_pattern=CommunicationPattern.STENCIL_3D,
    message_size=MESSAGE_SIZE_1M,
)

# ML training with all-to-all pattern and 64 KiB messages
ml_job = job_dict(
    nodes_required=32,
    name='ml_training',
    account='ai',
    id=2,
    scheduled_nodes=list(range(32, 64)),
    cpu_trace=[0.5],
    gpu_trace=[0.95],
    ntx_trace=[10e9],  # 10 GB/s per node
    nrx_trace=[10e9],
    comm_pattern=CommunicationPattern.ALL_TO_ALL,
    message_size=MESSAGE_SIZE_64K,
)
\end{lstlisting}

\section{Key Findings}

\begin{enumerate}
    \item \textbf{Topology-pattern matching matters}: Stencil patterns on torus show 70--97\% congestion reduction vs all-to-all, while fat-tree shows mixed results.

    \item \textbf{Job placement affects pattern efficiency}: Stencil benefits require nodes to be allocated in a way that preserves locality.

    \item \textbf{Message size overhead is minimal} with the current model ($\sim$0.1\% for 64K messages). Consider increasing header overhead or adding latency-based penalties for more impact.

    \item \textbf{Pattern choice significantly affects congestion}: Can be more impactful than message size for determining network performance.
\end{enumerate}

\section{Future Enhancements}

\begin{enumerate}
    \item \textbf{Additional patterns}: Ring, tree, butterfly, 2D stencil, custom patterns
    \item \textbf{Latency modeling}: Small messages should incur latency penalties beyond just overhead
    \item \textbf{Topology-aware stencil}: Use actual torus coordinates when available instead of virtual grid
    \item \textbf{Adaptive message sizing}: Allow message size to vary with communication phase
\end{enumerate}

\end{document}
+27 −2
Original line number Diff line number Diff line
@@ -24,6 +24,17 @@ class JobState(Enum):
    TIMEOUT = 'TO'


class CommunicationPattern(Enum):
    """Communication patterns for network traffic modeling."""
    ALL_TO_ALL = "all-to-all"      # Every node sends to every other node
    STENCIL_3D = "stencil-3d"      # Each node sends to 6 neighbors (±x, ±y, ±z)


# Standard message sizes for testing (in bytes)
MESSAGE_SIZE_64K = 64 * 1024      # 64 KiB
MESSAGE_SIZE_1M = 1024 * 1024     # 1 MiB


def job_dict(*,
             nodes_required,
             name,
@@ -57,9 +68,16 @@ def job_dict(*,
             trace_end_time: int | None = 0,
             trace_quanta: int | None = None,
             trace_missing_values: bool | None = False,
             downscale: int = 1
             downscale: int = 1,
             # Communication parameters
             comm_pattern: CommunicationPattern | str = CommunicationPattern.ALL_TO_ALL,
             message_size: int | None = None,  # bytes per message (None = raw bandwidth model)
             ):
    """ Return job info dictionary """
    # Normalize comm_pattern to enum
    if isinstance(comm_pattern, str):
        comm_pattern = CommunicationPattern(comm_pattern)

    return {
        'nodes_required': nodes_required,
        'name': name,
@@ -94,7 +112,10 @@ def job_dict(*,
        'trace_quanta': trace_quanta,
        'trace_missing_values': trace_missing_values,
        'dilated': False,
        'downscale': downscale
        'downscale': downscale,
        # Communication parameters:
        'comm_pattern': comm_pattern,
        'message_size': message_size,
    }


@@ -181,6 +202,9 @@ class Job:
        self.trace_end_time = None    # Relative end time of the trace
        self.trace_quanta = None  # Trace quanta associated with the job # None means single value!
        self.current_run_time = 0     # Current running time updated when simulating
        # Communication parameters:
        self.comm_pattern = CommunicationPattern.ALL_TO_ALL
        self.message_size = None  # None = raw bandwidth model (no message overhead)

        # If a job dict was given, override the values from the job_dict:
        for key, value in job_dict.items():
@@ -222,6 +246,7 @@ class Job:
                f"allocated_gpu_units={self.allocated_gpu_units}, "
                f"cpu_trace={self.cpu_trace}, gpu_trace={self.gpu_trace}, "
                f"ntx_trace={self.ntx_trace}, nrx_trace={self.nrx_trace}, "
                f"comm_pattern={self.comm_pattern}, message_size={self.message_size}, "
                f"end_state={self.end_state}, "
                f"current_state={self.current_state}, "
                f"submit_time={self.submit_time}, time_limit={self.time_limit}, "
+42 −8
Original line number Diff line number Diff line
import os
import warnings

from raps.job import CommunicationPattern

from .base import (
    all_to_all_paths,
    apply_job_slowdown,
    compute_system_network_stats,
    link_loads_for_job,
    link_loads_for_job_stencil_3d,
    link_loads_for_pattern,
    get_effective_traffic,
    apply_message_size_overhead,
    factorize_3d,
    stencil_3d_pairs,
    network_congestion,
    network_slowdown,
    network_utilization,
@@ -31,6 +39,12 @@ __all__ = [
    "network_slowdown",
    "all_to_all_paths",
    "link_loads_for_job",
    "link_loads_for_job_stencil_3d",
    "link_loads_for_pattern",
    "get_effective_traffic",
    "apply_message_size_overhead",
    "factorize_3d",
    "stencil_3d_pairs",
    "worst_link_util",
    "build_fattree",
    "build_torus3d",
@@ -115,11 +129,25 @@ class NetworkModel:

        net_tx = get_current_utilization(job.ntx_trace, job)
        net_rx = get_current_utilization(job.nrx_trace, job)
        net_util = network_utilization(net_tx, net_rx, max_throughput)

        # Get communication pattern and message size from job
        comm_pattern = getattr(job, 'comm_pattern', CommunicationPattern.ALL_TO_ALL)
        message_size = getattr(job, 'message_size', None)

        # Apply message size overhead if specified
        num_hosts = len(job.scheduled_nodes)
        effective_tx = get_effective_traffic(net_tx, job, num_hosts)
        effective_rx = get_effective_traffic(net_rx, job, num_hosts)

        net_util = network_utilization(effective_tx, effective_rx, max_throughput)

        if debug:
            print(f"  comm_pattern: {comm_pattern}, message_size: {message_size}")
            print(f"  raw tx/rx: {net_tx}/{net_rx}, effective tx/rx: {effective_tx}/{effective_rx}")

        if self.topology == "fat-tree":
            host_list = [node_id_to_host_name(n, self.fattree_k) for n in job.scheduled_nodes]
            loads = link_loads_for_job(self.net_graph, host_list, net_tx)
            loads = link_loads_for_pattern(self.net_graph, host_list, effective_tx, comm_pattern)
            net_cong = worst_link_util(loads, max_throughput)
            if debug:
                print("  fat-tree hosts:", host_list)
@@ -134,7 +162,7 @@ class NetworkModel:
                print("  dragonfly hosts:", host_list)
                print("Example nodes in graph:", list(self.net_graph.nodes)[:10])
                print("Contains h_0_9_0?", "h_0_9_0" in self.net_graph)
            loads = link_loads_for_job(self.net_graph, host_list, net_tx)
            loads = link_loads_for_pattern(self.net_graph, host_list, effective_tx, comm_pattern)
            net_cong = worst_link_util(loads, max_throughput)

        elif self.topology == "torus3d":
@@ -142,18 +170,24 @@ class NetworkModel:
            Y = self.config["TORUS_Y"]
            Z = self.config["TORUS_Z"]
            hosts_per_router = self.config["HOSTS_PER_ROUTER"]
            #host_list = [self.id_to_host[n] for n in job.scheduled_nodes]
            host_list = [
                torus_host_from_real_index(n, X, Y, Z, hosts_per_router)
                for n in job.scheduled_nodes
            ]
            loads = link_loads_for_job_torus(self.net_graph, self.meta, host_list, net_tx)
            # For torus3d, use the specialized torus routing
            # but still apply the communication pattern for traffic distribution
            if comm_pattern == CommunicationPattern.STENCIL_3D:
                # Use pattern-aware loading for stencil on torus
                loads = link_loads_for_pattern(self.net_graph, host_list, effective_tx, comm_pattern)
            else:
                # Use torus-specific routing for all-to-all
                loads = link_loads_for_job_torus(self.net_graph, self.meta, host_list, effective_tx)
            net_cong = worst_link_util(loads, max_throughput)
            if debug:
                print("  torus3d hosts:", host_list)

        elif self.topology == "capacity":
            net_cong = network_congestion(net_tx, net_rx, max_throughput)
            net_cong = network_congestion(effective_tx, effective_rx, max_throughput)

        else:
            raise ValueError(f"Unsupported topology: {self.topology}")
+248 −2

File changed.

Preview size limit exceeded, changes collapsed.