Commit de244b55 authored by Vacaliuc, Bogdan's avatar Vacaliuc, Bogdan
Browse files

slides: add slide 8 — A/B equivalence harness proposal



Two-tier slide showing the test harness that guards every refactor.

Top half: pipeline diagram.
  reference fixtures (1) →
      ├─ Path A: drive GUI via pytest-qt, capture .dat
      └─ Path B: invoke ReductionProcess.reduce(), capture .dat
    → compare Q/R/dR/dQ + sample-log checksum (2)
    → PASS (equivalent) or FAIL (divergence with plot + JSON delta)

Bottom half: three panels showing how the harness pays off across
the hack-a-thon timeline.
  - Before (Day 1): baseline measurement. Expect failures on every
    fixture — each failure quantifies an active slide-3 tension,
    producing a debt manifest for Day-4 EWM tickets.
  - During (Days 2–5): every PR gated by the harness. Each refactor
    (build_mrr_kwargs, _as_ints unification, QuickNXS scale, etc.)
    turns more fixtures green. "Done" means bit-for-bit equivalent
    OR explicitly sign-off-divergent reductions.
  - After (permanent): CI gate forever. Would have caught commit
    9e67585 (NX vs NY pixel clipping) and 2029db1 (stitching
    overwrites run number) the day they landed.

Builds on existing pytest-qt / pytest-xvfb / LFS infrastructure;
no new CI resources required.

This is item 10 on slide 7 — the Tier-2 investment that makes every
other refactor safe.

Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
parent b73f56c8
Loading
Loading
Loading
Loading
+232 KiB
Loading image diff...
+202 −0
Original line number Diff line number Diff line
<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 1920 1080" width="1920" height="1080" font-family="Helvetica, Arial, sans-serif">
  <defs>
    <linearGradient id="headerGrad" x1="0" x2="0" y1="0" y2="1">
      <stop offset="0%" stop-color="#002E5D"/>
      <stop offset="100%" stop-color="#004B8D"/>
    </linearGradient>
    <linearGradient id="guiLane" x1="0" x2="0" y1="0" y2="1">
      <stop offset="0%" stop-color="#FFF3E6"/>
      <stop offset="100%" stop-color="#FEE4CB"/>
    </linearGradient>
    <linearGradient id="autoLane" x1="0" x2="0" y1="0" y2="1">
      <stop offset="0%" stop-color="#F0F7EE"/>
      <stop offset="100%" stop-color="#DDECD8"/>
    </linearGradient>
    <linearGradient id="diffBox" x1="0" x2="0" y1="0" y2="1">
      <stop offset="0%" stop-color="#FFF8E7"/>
      <stop offset="100%" stop-color="#FAEBC0"/>
    </linearGradient>
    <linearGradient id="passBox" x1="0" x2="0" y1="0" y2="1">
      <stop offset="0%" stop-color="#E2F1D9"/>
      <stop offset="100%" stop-color="#C9E4BB"/>
    </linearGradient>
    <linearGradient id="failBox" x1="0" x2="0" y1="0" y2="1">
      <stop offset="0%" stop-color="#FEEADF"/>
      <stop offset="100%" stop-color="#FBC8A6"/>
    </linearGradient>
    <marker id="flowArrow" viewBox="0 0 12 9" refX="11" refY="4.5" markerWidth="9" markerHeight="7" orient="auto">
      <path d="M 0 0 L 12 4.5 L 0 9 z" fill="#002E5D"/>
    </marker>
    <marker id="greenArrow" viewBox="0 0 12 9" refX="11" refY="4.5" markerWidth="9" markerHeight="7" orient="auto">
      <path d="M 0 0 L 12 4.5 L 0 9 z" fill="#1F5B1F"/>
    </marker>
    <marker id="redArrow" viewBox="0 0 12 9" refX="11" refY="4.5" markerWidth="9" markerHeight="7" orient="auto">
      <path d="M 0 0 L 12 4.5 L 0 9 z" fill="#C8102E"/>
    </marker>
    <filter id="softShadow" x="-10%" y="-10%" width="120%" height="120%">
      <feGaussianBlur in="SourceAlpha" stdDeviation="2"/>
      <feOffset dx="1" dy="2" result="offset"/>
      <feComponentTransfer><feFuncA type="linear" slope="0.22"/></feComponentTransfer>
      <feMerge><feMergeNode/><feMergeNode in="SourceGraphic"/></feMerge>
    </filter>
  </defs>

  <rect width="1920" height="1080" fill="#FAFAFA"/>

  <!-- Header -->
  <rect x="0" y="0" width="1920" height="100" fill="url(#headerGrad)"/>
  <text x="960" y="55" font-size="34" font-weight="700" fill="white" text-anchor="middle">A/B Equivalence Harness — Catch Disagreement Before and After the Refactor</text>
  <text x="960" y="85" font-size="17" fill="#B8D4E8" text-anchor="middle">One fixture set, two paths through the code, one diff — the regression barrier that lets the hack-a-thon refactor confidently</text>

  <!-- Sub-heading -->
  <text x="40" y="145" font-size="16" fill="#2A2A2A">The debt items in slide 3 are real because <tspan font-style="italic">nothing catches them</tspan> today.  Building this harness is Tier-2 work (item 10 on slide 7) and is</text>
  <text x="40" y="167" font-size="16" fill="#2A2A2A">the highest-leverage investment for the hack-a-thon: every later refactor runs through it, every drift shows up as a failed diff.</text>

  <!-- ─── Input fixture stage ─── -->
  <g>
    <rect x="40" y="200" width="320" height="200" rx="12" fill="#FFFFFF" stroke="#002E5D" stroke-width="2.5" filter="url(#softShadow)"/>
    <rect x="40" y="200" width="320" height="40" rx="12" fill="#002E5D"/>
    <text x="200" y="226" font-size="16" font-weight="700" fill="white" text-anchor="middle">1.  Reference fixture set</text>
    <g font-size="13" fill="#2A2A2A">
      <text x="60" y="265">●  ~10 REF_M runs chosen by scientific lead:</text>
      <text x="75" y="284">- polarized and unpolarized</text>
      <text x="75" y="302">- single-peak and multi-peak</text>
      <text x="75" y="320">- known-good, published reductions</text>
      <text x="60" y="345">●  Pinned template files / Configurations</text>
      <text x="60" y="365">●  Under LFS in the repo</text>
      <text x="60" y="385">●  One expected-output file per run</text>
    </g>
  </g>

  <!-- Branching arrow from fixtures to both paths -->
  <path d="M 370 270 C 420 270 420 295 475 295" stroke="#002E5D" stroke-width="2.5" fill="none" marker-end="url(#flowArrow)"/>
  <path d="M 370 330 C 420 330 420 425 475 425" stroke="#002E5D" stroke-width="2.5" fill="none" marker-end="url(#flowArrow)"/>

  <!-- Split label -->
  <text x="415" y="365" font-size="13" fill="#606060" font-style="italic" text-anchor="middle">(same input)</text>

  <!-- ─── Path A: GUI reduction ─── -->
  <g>
    <rect x="490" y="230" width="540" height="150" rx="12" fill="url(#guiLane)" stroke="#E87722" stroke-width="2" filter="url(#softShadow)"/>
    <text x="760" y="265" font-size="18" font-weight="700" fill="#8C3A00" text-anchor="middle">Path A — GUI reduction</text>
    <text x="510" y="293" font-size="13" fill="#3E2C00">Drive the quicknxsv2 main window with <tspan font-family="monospace" font-weight="700">pytest-qt</tspan>:</text>
    <text x="530" y="313" font-size="13" fill="#3E2C00">load run → set ROIs from fixture → click Reduce</text>
    <text x="510" y="340" font-size="13" fill="#3E2C00">Capture the .dat file from the output directory.</text>
    <text x="510" y="360" font-size="12" fill="#8C3A00" font-style="italic">exercises every Qt signal, every deepcopy, the CSD/NexusData path, QuickNXS post-scaling…</text>
  </g>

  <!-- Path B: autoreduce reduction -->
  <g>
    <rect x="490" y="400" width="540" height="150" rx="12" fill="url(#autoLane)" stroke="#6DA64F" stroke-width="2" filter="url(#softShadow)"/>
    <text x="760" y="435" font-size="18" font-weight="700" fill="#1F5B1F" text-anchor="middle">Path B — autoreduce reduction</text>
    <text x="510" y="463" font-size="13" fill="#1F3A1F">Invoke <tspan font-family="monospace" font-weight="700">mr_reduction.ReductionProcess(...).reduce()</tspan></text>
    <text x="530" y="483" font-size="13" fill="#1F3A1F">with the same input file and matched options.</text>
    <text x="510" y="510" font-size="13" fill="#1F3A1F">Capture the .dat file from <tspan font-family="monospace">shared/autoreduce/</tspan>.</text>
    <text x="510" y="530" font-size="12" fill="#1F5B1F" font-style="italic">exercises the template contract, the autoreduce-specific default set</text>
  </g>

  <!-- Arrows merging into diff box -->
  <path d="M 1040 295 C 1080 295 1080 370 1115 370" stroke="#002E5D" stroke-width="2.5" fill="none" marker-end="url(#flowArrow)"/>
  <path d="M 1040 475 C 1080 475 1080 400 1115 400" stroke="#002E5D" stroke-width="2.5" fill="none" marker-end="url(#flowArrow)"/>

  <!-- ─── Diff stage ─── -->
  <g>
    <rect x="1125" y="300" width="330" height="180" rx="12" fill="url(#diffBox)" stroke="#D2A000" stroke-width="2.5" filter="url(#softShadow)"/>
    <rect x="1125" y="300" width="330" height="40" rx="12" fill="#D2A000"/>
    <text x="1290" y="327" font-size="16" font-weight="700" fill="white" text-anchor="middle">2.  Compare R(Q) curves</text>
    <g font-size="13" fill="#3A2C00">
      <text x="1145" y="365">●  Q values must match bin-by-bin</text>
      <text x="1145" y="385">●  R and dR compared relative tol.</text>
      <text x="1145" y="405">●  Plus sample-log checksum:</text>
      <text x="1160" y="423" font-family="monospace" font-size="12">two_theta · lambda_min/max</text>
      <text x="1160" y="441" font-family="monospace" font-size="12">specular_pixel · scatt_peak_*</text>
      <text x="1145" y="463" font-style="italic">Thresholds agreed by scientific lead</text>
    </g>
  </g>

  <!-- Arrows to PASS / FAIL -->
  <path d="M 1460 360 L 1510 360" stroke="#1F5B1F" stroke-width="3" fill="none" marker-end="url(#greenArrow)"/>
  <path d="M 1460 420 L 1510 420" stroke="#C8102E" stroke-width="3" fill="none" marker-end="url(#redArrow)"/>

  <!-- PASS -->
  <g>
    <rect x="1520" y="310" width="360" height="90" rx="12" fill="url(#passBox)" stroke="#53A548" stroke-width="2.5" filter="url(#softShadow)"/>
    <text x="1700" y="345" font-size="20" font-weight="700" fill="#1F5B1F" text-anchor="middle">✓  PASS  —  equivalent</text>
    <text x="1700" y="370" font-size="13" fill="#1F5B1F" text-anchor="middle">refactor preserved the reduction</text>
    <text x="1700" y="389" font-size="13" fill="#1F5B1F" text-anchor="middle">merge the PR; the contract holds</text>
  </g>

  <!-- FAIL -->
  <g>
    <rect x="1520" y="410" width="360" height="130" rx="12" fill="url(#failBox)" stroke="#C8102E" stroke-width="2.5" filter="url(#softShadow)"/>
    <text x="1700" y="444" font-size="20" font-weight="700" fill="#8C0D24" text-anchor="middle">✗  FAIL  —  divergence</text>
    <text x="1540" y="470" font-size="13" fill="#3E2C00">●  Plot both curves side-by-side</text>
    <text x="1540" y="490" font-size="13" fill="#3E2C00">●  Emit the per-bin delta as JSON</text>
    <text x="1540" y="510" font-size="13" fill="#3E2C00">●  Fail the CI job and block merge</text>
    <text x="1540" y="530" font-size="12" fill="#8C3A00" font-style="italic">→ refactor needs review or rollback</text>
  </g>

  <!-- ═════════════ BOTTOM HALF — Uses of the harness ═════════════ -->
  <text x="960" y="605" font-size="22" font-weight="700" fill="#002E5D" text-anchor="middle">How this harness pays off across the hack-a-thon</text>

  <!-- Three use cases -->
  <g>
    <rect x="40" y="630" width="600" height="370" rx="12" fill="#FFFFFF" stroke="#002E5D" stroke-width="2" filter="url(#softShadow)"/>
    <rect x="40" y="630" width="600" height="40" rx="12" fill="#002E5D"/>
    <text x="340" y="656" font-size="16" font-weight="700" fill="white" text-anchor="middle">Before the refactor (Day 1)</text>
    <text x="60" y="695" font-size="14" fill="#2A2A2A" font-weight="700">Baseline measurement:</text>
    <text x="60" y="720" font-size="13" fill="#2A2A2A">Run the harness on all fixtures <tspan font-style="italic">as-is</tspan>.</text>
    <text x="60" y="742" font-size="13" fill="#2A2A2A">Expect <tspan fill="#C8102E" font-weight="700">failures</tspan> on every fixture.</text>
    <text x="60" y="775" font-size="14" fill="#2A2A2A" font-weight="700">What the failures tell you:</text>
    <text x="60" y="800" font-size="13" fill="#2A2A2A">●  which slide-3 tensions are active</text>
    <text x="60" y="820" font-size="13" fill="#2A2A2A">●  how big the numerical impact actually is</text>
    <text x="60" y="840" font-size="13" fill="#2A2A2A">●  which fixtures diverge most — the best</text>
    <text x="85" y="858" font-size="13" fill="#2A2A2A">candidates for deeper investigation</text>
    <text x="60" y="890" font-size="14" fill="#2A2A2A" font-weight="700">Outcome:</text>
    <text x="60" y="915" font-size="13" fill="#2A2A2A">A <tspan font-weight="700">debt manifest</tspan> with concrete numbers</text>
    <text x="60" y="935" font-size="13" fill="#2A2A2A">to attach to the Day-4 EWM tickets.</text>
    <text x="60" y="970" font-size="12" fill="#606060" font-style="italic">Duration: ~half a day to set up + populate fixtures.</text>
  </g>

  <g>
    <rect x="660" y="630" width="600" height="370" rx="12" fill="#FFFFFF" stroke="#E87722" stroke-width="2" filter="url(#softShadow)"/>
    <rect x="660" y="630" width="600" height="40" rx="12" fill="#E87722"/>
    <text x="960" y="656" font-size="16" font-weight="700" fill="white" text-anchor="middle">During the refactor (Days 2–5)</text>
    <text x="680" y="695" font-size="14" fill="#2A2A2A" font-weight="700">Every PR is gated by the harness:</text>
    <text x="680" y="720" font-size="13" fill="#2A2A2A"><tspan font-family="monospace">build_mrr_kwargs()</tspan> lands → same diff</text>
    <text x="680" y="740" font-size="13" fill="#2A2A2A">●  default-value pinning lands → some fixtures pass</text>
    <text x="680" y="760" font-size="13" fill="#2A2A2A">●  _as_ints unification lands → more fixtures pass</text>
    <text x="680" y="780" font-size="13" fill="#2A2A2A">●  QuickNXS-scale consolidation → all fixtures pass</text>
    <text x="680" y="815" font-size="14" fill="#2A2A2A" font-weight="700">Each green PR is earned:</text>
    <text x="680" y="840" font-size="13" fill="#2A2A2A">the refactor is not "done" until both paths</text>
    <text x="680" y="860" font-size="13" fill="#2A2A2A">produce <tspan font-weight="700">bit-for-bit equivalent</tspan> (or explicitly</text>
    <text x="680" y="880" font-size="13" fill="#2A2A2A">sign-off-divergent) reductions.</text>
    <text x="680" y="915" font-size="14" fill="#2A2A2A" font-weight="700">Living contract:</text>
    <text x="680" y="940" font-size="13" fill="#2A2A2A">Each tension in slide 3 gets an explicit decision</text>
    <text x="680" y="960" font-size="13" fill="#2A2A2A">recorded in the fixture's expected-output file.</text>
  </g>

  <g>
    <rect x="1280" y="630" width="600" height="370" rx="12" fill="#FFFFFF" stroke="#6DA64F" stroke-width="2" filter="url(#softShadow)"/>
    <rect x="1280" y="630" width="600" height="40" rx="12" fill="#6DA64F"/>
    <text x="1580" y="656" font-size="16" font-weight="700" fill="white" text-anchor="middle">After the refactor (permanent)</text>
    <text x="1300" y="695" font-size="14" fill="#2A2A2A" font-weight="700">CI gate forever:</text>
    <text x="1300" y="720" font-size="13" fill="#2A2A2A">●  Runs on every push and every PR</text>
    <text x="1300" y="740" font-size="13" fill="#2A2A2A">●  Runs on every Mantid bump</text>
    <text x="1300" y="760" font-size="13" fill="#2A2A2A">●  Runs when instrument geometry changes (IDF)</text>
    <text x="1300" y="795" font-size="14" fill="#2A2A2A" font-weight="700">Catches what slipped through:</text>
    <text x="1300" y="820" font-size="13" fill="#2A2A2A">Item 9e67585 (NX vs NY pixel clipping) would</text>
    <text x="1300" y="840" font-size="13" fill="#2A2A2A">have been caught the day it landed.</text>
    <text x="1300" y="860" font-size="13" fill="#2A2A2A">Item 2029db1 (stitching overwrites run #) too.</text>
    <text x="1300" y="895" font-size="14" fill="#2A2A2A" font-weight="700">Audit trail:</text>
    <text x="1300" y="920" font-size="13" fill="#2A2A2A">Each fixture pass/fail is a recorded event.</text>
    <text x="1300" y="940" font-size="13" fill="#2A2A2A">Scientists can point at a known-good reduction</text>
    <text x="1300" y="960" font-size="13" fill="#2A2A2A">date in the CI history when defending a paper.</text>
  </g>

  <!-- Footer -->
  <rect x="0" y="1020" width="1920" height="60" fill="#001A35"/>
  <text x="40" y="1058" font-size="13" fill="#6A90B0">Builds on existing pytest-qt / pytest-xvfb infrastructure — no new CI runner required.  All fixture runs fit under the existing LFS quota.</text>
  <text x="1880" y="1058" font-size="13" fill="#6A90B0" text-anchor="end">Reflectometry Hack-a-thon 2026 · slide 8 of 8</text>
</svg>