claude finished before I left home... so I commit (d2d45c54) · Commits · Vacaliuc, Bogdan / tasking

Hack-A-Thon-2026-EWM-Estimation-Baseline.md

0 → 100644

+447 −0

Original line number	Diff line number	Diff line
		# Hack-A-Thon 2026 — EWM-Calibrated Estimation Baseline

		Prepared 2026-04-25 (Saturday). Reviewable Monday morning.

		---

		## TL;DR

		Using 1,076 historical EWM-tracked work items (`Resolved-REF-Work-2026-04-25` tab,
		2023-02-02 through 2026-04-24) plus 14 git-merge ↔ EWM-ID cross-references
		across `quicknxsv2` and `mr_reduction`, the back-end refactor backlog
		identified at the close of Day 3 (39 line items) comes to:

		\| Quantity \| Median \| p80 \|
		\|---\|---:\|---:\|
		\| Focused effort (developer-hours) \| 358 h \| 816 h \|
		\| ↳ in person-weeks @ 40 h/wk \| 8.9 pw \| 20.4 pw \|
		\| ↳ in person-months @ 160 h/mo \| 2.24 pm \| 5.10 pm \|
		\| Calendar at current REF-cap (≈136 h/mo team), process-tax 2× \| 5.3 mo \| 12.0 mo \|
		\| Calendar at current REF-cap, process-tax 3× (central) \| 7.9 mo \| 18.0 mo \|
		\| Calendar at current REF-cap, process-tax 5× (Glass's "real-world" anecdote) \| 13.1 mo \| 30.0 mo \|

		Bottom line for Bogdan, ahead of the Monday deep-dive:

		- Glass's verbal estimate on Day 3 (≈3 mo optimistic / ≈1 yr realistic) is **directly
		consistent** with the empirical numbers — the "optimistic" bound matches the
		median-effort-at-2×-tax band, and "realistic" matches the p80-at-3×-tax band.
		- The dominant risk is process tax, not under-estimation of the work itself.
		The team's median `Time Spent / Estimate` ratio is **1.00 across every work
		class** — they do not under-quote. The variance lives in the calendar gap
		between EWM-creation and first git commit, where the median attention lag
		is 50.6 days.
		- **The single highest-leverage item is `MG6` (build_mrr_kwargs / consolidate
		the MRR call)** — 29 h median / 62 h p80. This one PR retires the five Day-1
		silent default bugs and is the natural Tier-1 of the refactor.

		---

		## 1 · Inputs

		- `~/Desktop/Hack-A-Thon-2026/Focus-REF-Work-2026-04-25.xlsx`
		- `Resolved-REF-Work-2026-04-25` tab — 1,075 historical work items
		(Task / Story / Defect / Release / Capability) with `Estimate`,
		`Corrected Estimate`, `Time Spent`, `Creation Date`, `Resolution Date`.
		Time-window: 2023-02-02 to 2026-04-24.
		- `Focus-REF-Work-2026-04-25` tab — 94 currently in-flight items;
		used here for capacity (the per-developer "X Time" placeholder rows).
		- Git history of `quicknxsv2/` (1,170 commits) and `mr_reduction/`
		(417 commits), scanned for merge subjects matching `ewm<NNNN>_<descr>`
		to obtain branch boundaries.
		- Day-3 backlog, sourced from the Consensus & decisions section of
		`Hack-A-Thon-2026-Day3-Summary.md` (15 tech-debt items + 14 migration-
		table rows + 7 developer-insight cleanups + 3 scientist TODOs + 2
		cross-cutting infra items = 41 items, working from 39 logical line
		items as TD8 and IN1 each carry an internal multiplier).

		---

		## 2 · Phase 1 — Empirical baselines from EWM

		All times normalized to focused hours (1 day = 8 h, 1 week = 40 h).
		Per-class statistics on the Resolved sheet (n=1,076):

		### 2.1 · Time Spent (focused hours per item) by work-class

		\| class \| n \| median (h) \| mean (h) \| p20 \| p80 \|
		\|---\|---:\|---:\|---:\|---:\|---:\|
		\| admin \| 4 \| 1.0 \| 2.5 \| 0.0 \| 8.0 \|
		\| bugfix \| 36 \| 5.5 \| 7.0 \| 2.0 \| 12.0 \|
		\| ci_devops \| 32 \| 2.0 \| 3.5 \| 1.0 \| 5.0 \|
		\| docs \| 19 \| 2.0 \| 3.7 \| 2.0 \| 5.0 \|
		\| investig\| 20 \| 4.5 \| 7.3 \| 3.0 \| 16.0 \|
		\| newfeat \| 53 \| 3.0 \| 5.7 \| 1.0 \| 8.0 \|
		\| other \| 161 \| 4.0 \| 5.5 \| 2.0 \| 8.0 \|
		\| refactor\| 24 \| 4.0 \| 4.8 \| 2.0 \| 8.0 \|
		\| release \| 28 \| 1.5 \| 2.5 \| 1.0 \| 3.0 \|
		\| review \| 73 \| 1.0 \| 1.9 \| 1.0 \| 2.0 \|
		\| test \| 75 \| 5.0 \| 6.4 \| 2.0 \| 11.0 \|

		Bolded rows are the five classes used for the Day-3 backlog (refactor /
		bugfix / test / investig / newfeat). The classifier is Summary-keyword based
		(see `/tmp/phase1_extract.py`); spot-checked on 25 refactor-classified items
		and confirmed.

		### 2.2 · Honesty of estimation (Time Spent ÷ Estimate)

		\| class \| n \| median \| mean \| p80 \|
		\|---\|---:\|---:\|---:\|---:\|
		\| admin \| 3 \| 1.00 \| 1.17 \| 2.00 \|
		\| bugfix \| 36 \| 1.00 \| 0.97 \| 1.00 \|
		\| ci_devops \| 29 \| 1.00 \| 0.89 \| 1.00 \|
		\| docs \| 19 \| 1.00 \| 0.95 \| 1.00 \|
		\| investig \| 19 \| 1.00 \| 1.01 \| 1.50 \|
		\| newfeat \| 51 \| 1.00 \| 1.09 \| 1.33 \|
		\| refactor \| 24 \| 1.00 \| 0.86 \| 1.00 \|
		\| test \| 71 \| 1.00 \| 0.88 \| 1.00 \|

		Median ratio = 1.00 across every work-class. This team does not
		systematically under-quote. The slight under-mean on test/refactor/ci_devops
		(0.86–0.95) suggests they over-quote slightly and finish ahead. Investig and
		newfeat are the only two with p80 > 1.0 (1.5× and 1.33×), reflecting the open-
		ended nature of investigation work.

		### 2.3 · Calendar gap (Creation → Resolution)

		\| class \| n \| median (days) \| p80 (days) \|
		\|---\|---:\|---:\|---:\|
		\| admin \| 5 \| 4.1 \| 23.0 \|
		\| bugfix \| 71 \| 7.1 \| 36.9 \|
		\| ci_devops \| 56 \| 5.5 \| 24.4 \|
		\| docs \| 27 \| 4.4 \| 11.8 \|
		\| investig \| 23 \| 4.9 \| 27.2 \|
		\| newfeat \| 97 \| 12.8 \| 104.5 \|
		\| refactor \| 47 \| 12.9 \| 29.2 \|
		\| release \| 236 \| 14.2 \| 50.2 \|
		\| review \| 70 \| 1.2 \| 4.9 \|
		\| test \| 94 \| 6.5 \| 20.8 \|

		**Refactor work takes ≈4 h of focused effort (median) but spans ≈13 calendar
		days from EWM creation to resolution.** The 77× ratio (13 d × 24 h ÷ 4 h)
		is not the developer working slowly; it's the EWM-ticket-sat-in-the-queue
		phenomenon Bogdan flagged. Phase 3 below decomposes this into attention-lag
		and focused-work components.

		### 2.4 · Per-developer throughput, trailing 6 months (Nov-2025–Apr-2026)

		\| Developer \| Resolved items / 6 mo \| EWM-recorded h / 6 mo \| h / mo (recorded) \| h / mo (≈actual)¹ \|
		\|---\|---:\|---:\|---:\|---:\|
		\| Glass Elsarboukh \| 70 \| 138 \| 23.0 \| 50.0 \|
		\| Marie Backman \| 78 \| 170 \| 28.3 \| 61.5 \|
		\| Kevin Tactac \| 43 \| 68 \| 11.3 \| 24.6 \|
		\| Jose Borreguero \| 4 \| 0 \| 0.0 \| 0.0 (rotated off after Aug 2025) \|
		\| 3-dev team total \| — \| — \| 62.6 \| ≈136 h / mo \|

		¹ Only 46% of recent Resolved items have `Time Spent` populated. Inflating the
		recorded total by 1/0.46 ≈ 2.17× gives the "actual" column. Cross-checked against
		Glass's `Glass Time` placeholder task in the Focus tab (30 h estimated for the
		2026 2.2 cycle, ≈6 weeks → 5 h/week → 22 h/mo on REF) — this matches the
		recorded hours, supporting the interpretation that recorded hours under-count
		about 2× because some items don't get logged.

		The 136 h/mo figure is REF-only capacity; the same three developers also
		work other projects.

		---

		## 3 · Phase 2 — Day-3 backlog ledger

		Each item is hand-classified into one or more Phase-1 work-classes, with a
		size multiplier applied to the class median/p80:

		- `trivial` × 0.5
		- `small` × 1.0 (use class median directly)
		- `medium` × 1.5
		- `large` × 2.0 (use class p80, scaled)

		The script `/tmp/phase2_ledger.py` is reproducible — change a class
		assignment or size and rerun.

		### 3.1 · Per-item ledger

		\| ID \| Item \| Class \| Size \| Count \| Median (h) \| p80 (h) \| Note \|
		\|---\|---\|---\|---\|---:\|---:\|---:\|---\|
		\| TD1 \| Remove duplicated `DeadTimeCorrection` \| refactor \| small \| 1 \| 4.0 \| 8.0 \| QuickNXS version not used; concrete delete + rewire \|
		\| TD2 \| Reconcile `peak_finding.py` \| refactor \| small \| 1 \| 4.0 \| 8.0 \| Diff is whitespace-only per Day-1 deck \|
		\| TD3 \| Remove two `_as_ints` in `data_set.py` \| refactor \| trivial \| 1 \| 2.0 \| 4.0 \| Local helper consolidation \|
		\| TD4 \| Investigate per-cross-section MRR call \| investig \| medium \| 1 \| 6.8 \| 24.0 \| Pure analysis spike before refactor \|
		\| TD5 \| De-duplicate `quicknxs_scaling_factor` \| refactor+bugfix \| medium \| 1 \| 14.2 \| 30.0 \| 3 places; 2 with `+1` div-by-zero hack; root-cause fix \|
		\| TD6 \| Remove `email_test` key from `DEFAULT_OPTIONS` \| refactor \| trivial \| 1 \| 2.0 \| 4.0 \| Pure delete \|
		\| TD7 \| Remove GenX templates \| refactor \| small \| 1 \| 4.0 \| 8.0 \| Confirmed Day-2 decision \|
		\| TD8 \| Reconcile MRR 5 silent default mismatches \| investig+bugfix \| small × 5 \| 5 \| 50.0 \| 140.0 \| Each requires scientist sign-off \|
		\| TD9 \| Split `Configuration` into 3 classes (Global / Run / Plot) \| refactor \| large \| 1 \| 8.0 \| 16.0 \| God-class refactor; touches every consumer \|
		\| TD10 \| Rationalise `DEFAULT_OPTIONS` vs `ReductionDialog.get_options()` \| refactor \| small \| 1 \| 4.0 \| 8.0 \| Add missing keys + lint test \|
		\| TD11 \| Turn off `ErrorWeighting` consistently in `RefRoi` \| bugfix \| small \| 1 \| 5.5 \| 12.0 \| Audit + fix; Tim's ruling needed first \|
		\| TD12 \| Thread off file-I/O and long calc \| newfeat \| large \| 1 \| 6.0 \| 16.0 \| Architectural; QThread + signal-slot rewrite \|
		\| TD13 \| Deprecate `MRInspectData` (Mantid) \| refactor \| large \| 1 \| 8.0 \| 16.0 \| Cross-repo; replace with instrument-specific info \|
		\| TD14 \| Define `Enum` for `off_spec_x_axis` \| refactor \| trivial \| 1 \| 2.0 \| 4.0 \| 4 lines + grep-replace \|
		\| TD15 \| Use `scipy` for Planck / neutron mass \| refactor \| trivial \| 1 \| 2.0 \| 4.0 \| Constants swap \|
		\| MG1 \| GISANS calc → `mr_reduction` \| refactor+test \| large \| 1 \| 18.0 \| 38.0 \| New back-end home; tests must follow \|
		\| MG2 \| Off-spec reflectivity calc → `mr_reduction` \| refactor+test \| large \| 1 \| 18.0 \| 38.0 \| Back-end home + parity tests \|
		\| MG3 \| Python script generation \| refactor \| medium \| 1 \| 6.0 \| 12.0 \| Target package TBD \|
		\| MG4 \| Reduced data output (`.dat` files) \| refactor \| medium \| 1 \| 6.0 \| 12.0 \| Target package TBD; format-stable \|
		\| MG5 \| Unit conversion → `mr_reduction` \| refactor \| medium \| 1 \| 6.0 \| 12.0 \| Centralize; small surface \|
		\| MG6 \| Consolidate MRR call (single `build_mrr_kwargs()`) \| refactor+bugfix+test \| large \| 1 \| 29.0 \| 62.0 \| The flagship — retires the 5 silent defaults \|
		\| MG7 \| Consolidate `DataInfo` class \| refactor \| large \| 1 \| 8.0 \| 16.0 \| 413 vs 1101 lines; neither is a subset \|
		\| MG8 \| Pre-processing for plot-facing data \| refactor \| medium \| 1 \| 6.0 \| 12.0 \| Move shaping logic out of UI \|
		\| MG9 \| Move `interfaces/data_handling/` → `mr_reduction` \| refactor+test \| large \| 1 \| 18.0 \| 38.0 \| Wholesale-move per Glass's note \|
		\| MG10 \| Rebinning → back-end \| refactor \| medium \| 1 \| 6.0 \| 12.0 \| Replace 2× non-Mantid `Rebin` sites \|
		\| MG11 \| File summing → back-end \| refactor \| small \| 1 \| 4.0 \| 8.0 \| Move via `data_handling/` first \|
		\| MG12 \| Stitching → `mr_reduction` \| refactor \| medium \| 1 \| 6.0 \| 12.0 \| Day-1 deck row 7 \|
		\| MG13 \| Direct beam matching → `mr_reduction` \| refactor \| medium \| 1 \| 6.0 \| 12.0 \| Procedure not yet determined; investig component implicit \|
		\| MG14 \| Peak finding → `mr_reduction` \| refactor \| medium \| 1 \| 6.0 \| 12.0 \| After MG2 reconciliation \|
		\| DI1 \| Remove obsolete `#pylint` directives \| refactor \| trivial \| 1 \| 2.0 \| 4.0 \| grep-replace + verify \|
		\| DI2 \| Replace `auto_change_active` with `blockSignals()` \| refactor \| small \| 1 \| 4.0 \| 8.0 \| Qt idiom upgrade \|
		\| DI3 \| Consolidate reduction params `CrossSectionData` → `NexusData` \| refactor \| medium \| 1 \| 6.0 \| 12.0 \| Touches `Configuration`; partially overlaps TD9 \|
		\| DI4 \| Rename `gui.py` → `main.py` \| refactor \| trivial \| 1 \| 2.0 \| 4.0 \| Pure rename + import sweep \|
		\| DI5 \| Clean up bare `except`s (Mantid throws `RuntimeError`) \| refactor \| small \| 1 \| 4.0 \| 8.0 \| Per-except audit \|
		\| DI6 \| Look through TODOs, prune \| refactor \| small \| 1 \| 4.0 \| 8.0 \| Inventory + delete \|
		\| DI7 \| Make variable naming consistent \| refactor \| small \| 1 \| 4.0 \| 8.0 \| `number` vs `run_number` \|
		\| SC1 \| Physics-name ↔ PV mapping doc \| investig+docs \| small \| 1 \| 6.5 \| 21.0 \| `tthi` = `SampleAngle`, etc. \|
		\| SC2 \| Find test data for GISANS and off-specular \| investig \| medium \| 1 \| 6.8 \| 24.0 \| No GISANS data today \|
		\| SC3 \| Find better test data for QuickNXS / `mr_reduction` \| investig+test \| small \| 1 \| 9.5 \| 27.0 \| Wider coverage \|
		\| IN1 \| Whole-workflow regression tests (spec / off-spec / 4xs) \| test \| large × 3 \| 3 \| 30.0 \| 66.0 \| 3 representative blocks per Day-3 p.3 \|
		\| IN2 \| `build_mrr_kwargs()` shared parameter-builder (Day-1 stretch) \| refactor+test \| medium \| 1 \| 13.5 \| 28.5 \| The Tier-1 PR per the Day-1 deck (overlaps MG6) \|
		\| TOTAL \| \| \| \| \| 358 \| 816 \| \|

		Notes on overlaps and duplications. TD8 (5 silent-default reconciliations,
		50/140 h) and MG6 (build_mrr_kwargs, 29/62 h) and IN2 (Day-1 stretch goal,
		13.5/28.5 h) are all attacking the same root cause from different angles —
		once MG6 lands, TD8 collapses to a 1-line PR per default. Don't double-count
		in detailed planning; a sensible scope for the first hack-a-thon outcome is
		MG6 first → 5 silent-defaults retire as no-op PRs in TD8 → then MG7.

		Likewise TD9 (split `Configuration`) and DI3 (consolidate
		`CrossSectionData` → `NexusData` reduction params) are partially the same
		architectural move; sequence DI3 inside TD9.

		### 3.2 · Capacity vs. backlog

		At ≈136 h/mo of REF capacity from the three primary developers (§2.4):

		\| Total effort \| Calendar at 100% REF allocation \| Calendar at 50% REF allocation \|
		\|---\|---:\|---:\|
		\| 358 h (median) \| 2.6 mo \| 5.3 mo \|
		\| 816 h (p80) \| 6.0 mo \| 12.0 mo \|

		These are focused-effort months — they do not include the process
		tax described in §4 (stories require reviews require multiple PR rounds
		require release cycle ≈2–3 weeks per cut, per Glass on Day 3).

		---

		## 4 · Phase 3 — git cross-validation

		Branches in `quicknxsv2` are named `ewm<NNNN>_<descr>` so the EWM ID is
		recoverable from the merge commit subject. Scanning all merge commits in
		both repos found 14 EWM↔git records (all in `quicknxsv2`;
		`mr_reduction` does not use the `ewm<NNNN>` branch convention as
		consistently). Of those, 0 had `Time Spent` filled in on the parent EWM
		record (the parent items are usually Stories; `Time Spent` lives on the
		Tasks underneath, which don't carry the `ewm<NNNN>` branch tag).

		### 4.1 · Per-record table

		\| Repo \| EWM \| EWM summary \| Lag (d) \| Work (d) \| Active days \| Commits \| Diff \|
		\|---\|---:\|---\|---:\|---:\|---:\|---:\|---\|
		\| quicknxs \| 6004 \| `[QuickNXS] Functionality to save ORSO` \| 301.7 \| 1.1 \| 2 \| 8 \| 6 files, +1* \|
		\| quicknxs \| 9367 \| `[QuickNXS] Reflectivity plot for direc...` \| 97.7 \| 0.0 \| 1 \| 3 \| 2 files, +1* \|
		\| quicknxs \| 11653 \| `[quicknxs] Reflectivity plot not updat...` \| 82.9 \| 6.0 \| 1 \| 2 \| 8 files, +1* \|
		\| quicknxs \| 12806 \| `[quicknxs] Direct beam not being corre...` \| 55.1 \| 1.7 \| 2 \| 8 \| 15 files \|
		\| quicknxs \| 12788 \| `[QuickNXS/mr_reduction] Port deadtime ...` \| 84.2 \| 0.7 \| 2 \| 3 \| 8 files, +5* \|
		\| quicknxs \| 13661 \| `[quicknxs] reflectivity calculation ca...` \| 0.8 \| 0.1 \| 1 \| 1 \| 5 files, +2* \|
		\| quicknxs \| 14138 \| `[quicknxs] Stitching overwrites reduct...` \| 0.0 \| 1.0 \| 1 \| 1 \| 3 files \|
		\| quicknxs \| 14846 \| `[QuickNXS] Mismatch in active run betw...` (×2) \| 54.8, 68.8 \| 0.9, 0.0 \| 1, 1 \| 3, 8 \| — \|
		\| quicknxs \| 15204 \| `[CIS] [QuickNXS] Fix functionality to ...` (×4) \| -6.1 to 46.5 \| 0.0–31.4 \| 1–9 \| 1–20 \| up to 38 files \|
		\| quicknxs \| 15832 \| `[QuickNXS] Diagnostic widget not openi...` \| 0.9 \| 1.0 \| 2 \| 3 \| 8 files, +6* \|

		sizes truncated where visible in raw output

		### 4.2 · Calibration constants

		From the 14 records:

		\| Quantity \| Median \| p80 \| Range \|
		\|---\|---:\|---:\|---\|
		\| Attention lag (EWM Creation → first commit), days \| 50.6 \| 84.2 \| -6.1 .. 301.7 \|
		\| Focused calendar (first commit → merge), days \| 1.0 \| 3.1 \| 0.0 .. 31.4 \|
		\| Active commit-days per branch \| 1.0 \| 2.0 \| 1 .. 9 \|

		Interpretation. EWM-creation date is not an honest start signal —
		it precedes actual work by ≈7 weeks (median). Once a developer starts a
		branch, work is concentrated: the median branch is closed in 1 calendar
		day with 1–2 active commit-days. The few outliers (EWM 6004 at 301-day
		lag, EWM 15204 at 31-day work-window) are the long-tail work that you
		should plan for explicitly, not statistically.

		This calibration does not change the focused-effort estimates in
		§3.1 — those come from `Time Spent` which is already a focused-hours
		measure. What it tells us is the process tax between *deciding to
		do something* and getting it merged on this team's current cadence.

		### 4.3 · Process tax — the multiplier from focused effort to calendar months

		Three scenarios for the calendar projection:

		\| Process-tax \| Effective effort (median) \| Effective effort (p80) \| Calendar median \| Calendar p80 \|
		\|---\|---:\|---:\|---:\|---:\|
		\| 2× (low — single bundled PR per item) \| 716 h \| 1,633 h \| 5.3 mo \| 12.0 mo \|
		\| 3× (mid — central estimate) \| 1,073 h \| 2,450 h \| 7.9 mo \| 18.0 mo \|
		\| 5× (high — Glass's "real-world" anecdote) \| 1,789 h \| 4,082 h \| 13.1 mo \| 30.0 mo \|

		The 3× central estimate corresponds to: each refactor item becomes 1
		story → ~2 PRs after splitting per reviewer feedback → 1–2 review
		rounds each → each PR waits ~2 weeks for the next release cut.
		Empirically, this matches the 50.6-day attention-lag plus a working
		period that is much shorter than the lag, integrated over many tasks
		running concurrently.

		The 5× "Glass refl1d anecdote" scenario is the case where every PR
		needs Brian-and-Paul-class sign-off. It is not the working hypothesis
		for this team — Glass's developers self-review faster than refl1d's
		upstream — but it is the right number to use if scope expands to include
		non-REF package compatibility (pinned Mantid version migration,
		upstream dependency conflicts, etc.).

		---

		## 5 · Synthesis — what to plan for

		### 5.1 · Bands for the back-end refactor (39 line items)

		\| Scenario \| Effort \| Calendar (REF-only capacity) \|
		\|---\|---:\|---:\|
		\| Best plausible (median effort, low process-tax) \| 358 h \| 5.3 mo \|
		\| Central (median effort, mid process-tax) \| 1,073 h equiv. \| 7.9 mo \|
		\| Realistic upper bound (p80 effort, mid process-tax) \| 2,450 h equiv. \| 18 mo \|

		The team's empirical estimation honesty (`Spent / Estimate` median = 1.00)
		makes me confident that the effort-band is tight; the calendar-band
		spread is dominated by the process-tax assumption and by how much of
		each developer's month actually lands on REF.

		### 5.2 · Sequencing recommendation (data-driven)

		Order items so each one removes more code than it adds, and each one
		unblocks something downstream. Concretely:

		1. MG6 first (29 h / 62 h, large): `build_mrr_kwargs()` central
		parameter-builder. This is the Day-1 stretch goal and the
		single highest-leverage move. After MG6 lands:
		- TD8 collapses from 50/140 h to ~5 h (one PR per default that
		adjusts a single dict entry now).
		- The "GUI vs autoreduce produce different R(Q)" Day-1 bullet is
		directly retired.
		2. TD13 + MG7 (8/16 h + 8/16 h): deprecate `MRInspectData` and
		consolidate `DataInfo`. Both remove cross-repo coupling that every
		later migration rests on.
		3. MG9 (18/38 h): lift `interfaces/data_handling/` wholesale into
		`mr_reduction`. After this, the Qt-import test (zero today) becomes
		trivially preserved and most of MG3–MG14 become internal to
		`mr_reduction` and lose their cross-repo overhead.
		4. TD9 + DI3 (8/16 h + 6/12 h, with overlap): split `Configuration`
		into `Global / Run / Plot`. This unblocks the per-run/global
		parameter discrimination that Becky's xlsx (Day-2 deliverable) needs.
		5. TD12 (6/16 h): thread off file-I/O. Independent of the back-end
		move; can run in parallel.
		6. *The remaining MG\ items** can run concurrently in any order once
		MG9 lands, because each is now an internal-to-`mr_reduction`
		reorganization.
		7. IN1 (30/66 h, spec/off-spec/4xs): regression tests, run as a
		gating check for everything above. Build these first if you want
		the safety net to bite during MG6.
		8. Trivial cleanups (DI1, DI4, TD3, TD6, TD14, TD15): 12 h total
		median. Bundle into a single PR-of-cleanups during a slow week.

		### 5.3 · How this maps to the Day-2 outage commitment

		The team committed on Day 2 to having the *back-end plan*** ready
		before the SNS beam outage (2026-06-25 → 2026-08-04, ≈9 weeks). The
		plan is exactly what this document plus Glass's Google Doc constitute,
		and it is already ready.

		The *back-end refactor*** is a different deliverable. At median
		effort (358 h) and 50% REF allocation (≈68 h/mo), even the optimistic
		case does not finish before 2026-08-04 (358 h / 68 h-per-mo = 5.3 mo).
		The realistic case is roughly end of 2026 → first half of 2027
		under the 3× process-tax assumption.

		This matches Glass's verbal "≈3 mo optimistic / ≈1 yr realistic"
		estimate from Day 3 directly: *3 months ≈ median / no process tax /
		single-developer-pair / no other commitments; 1 year ≈ p80 /
		3× tax / current allocation*.

		### 5.4 · "Could Claude estimate?" — Valeria's question, answered

		Yes — for this kind of work, on this team, with this much historical
		data, the per-item median is reliable to within ~30%. The realistic
		calendar bound is reliable to within a factor of ≈2× and is dominated
		by how much of each developer's month is allocated to REF, not by
		how long any individual item takes.

		The single highest-impact thing the team can do for predictability
		is fill in `Time Spent` on every Resolved item (currently 46% are
		missing it). Doing that for one calendar quarter would tighten the
		calendar bands by ~30%.

		---

		## 6 · Caveats

		- The Phase-1 refactor sample is small (n=24 items with both
		Estimate and Time Spent filled in). The median ratio of 1.00 is
		reassuring but the underlying class is heterogeneous. If you want
		tighter bounds on a specific Day-3 item, look up 2–3 EWM items
		whose Summary closely resembles it and use their actual Time Spent.
		- Phase-3 git data does not include `mr_reduction` records (0 of
		the 14 cross-validated branches were in that repo). The
		`mr_reduction` history uses dependabot-flavored merge subjects more
		often than `ewm<NNNN>_<descr>`, so the pattern recovery is biased
		toward `quicknxs`. The calibration constants in §4.2 should be
		treated as upper-bound estimates of the lag and lower-bound
		estimates of the work-window for `mr_reduction` work.
		- The size multipliers (trivial/small/medium/large) are subjective.
		Each item can be re-graded; the script `/tmp/phase2_ledger.py`
		re-rolls the totals automatically. Suggested grading rule of thumb:
		small if it touches one file and changes < 50 lines; medium if
		it touches up to 5 files; large if it crosses a package boundary
		or rewrites a class.
		- Capacity is REF-only. The 136 h/mo is what the three developers
		actually log to REF tickets on a ≈46%-coverage basis; the same
		developers are also working other projects. If non-REF priorities
		spike, REF capacity will drop, not the other way around.
		- Process-tax is empirical for this team (Story → Review → PR
		rounds → Release cycle 2–3 weeks per cut). It will be higher if
		the refactor needs sign-off from outside the immediate team
		(Mantid maintainers, Brian/Paul on refl1d, the NDIP/NOVA platform
		group). Day-3 already flagged this risk.
		- Concurrency assumed = 3 developers. If Marie, Glass, and Kevin
		are not all assigned to REF for the duration, divide the calendar
		bands accordingly.

		---

		## 7 · Reproducibility

		All calculations live in three scripts under `/tmp/` (rerun anytime to
		get the same numbers):

		\| Script \| What it does \| Output \|
		\|---\|---\|---\|
		\| `/tmp/phase1_extract.py` \| Reads the xlsx, classifies, stats per class \| `/tmp/phase1_out.json` \|
		\| `/tmp/phase2_ledger.py` \| Per-item ledger with class+size, totals & calendar bands \| `/tmp/phase2_ledger.json` \|
		\| `/tmp/phase3_git_v2.py` \| Mines EWM↔git mappings, computes lag and work-window \| `/tmp/phase3_records.json` \|

		To redo with a different baseline week (e.g. trailing 12 months instead
		of trailing 6):

		1. In `phase1_extract.py`, filter `R` by `_resolved >= datetime(2025,5,1)` etc.
		2. In `phase2_ledger.py`, edit `BASE` (the per-class median/p80 dict)
		and / or any item's class / size assignment.
		3. Re-run all three scripts.

		---

		*End of baseline. If on Monday you want this re-cast against a
		different process-tax assumption, or with a different sequencing
		order, the scripts above are the place to start. — Claude*