Commit d2d45c54 authored by Bogdan Vacaliuc's avatar Bogdan Vacaliuc
Browse files

claude finished before I left home... so I commit

parent e2d789f2
Loading
Loading
Loading
Loading
+447 −0
Original line number Diff line number Diff line
# Hack-A-Thon 2026 — EWM-Calibrated Estimation Baseline

*Prepared 2026-04-25 (Saturday). Reviewable Monday morning.*

---

## TL;DR

Using **1,076 historical EWM-tracked work items** (`Resolved-REF-Work-2026-04-25` tab,
2023-02-02 through 2026-04-24) plus **14 git-merge ↔ EWM-ID cross-references**
across `quicknxsv2` and `mr_reduction`, the back-end refactor backlog
identified at the close of Day 3 (39 line items) comes to:

| Quantity | Median | p80 |
|---|---:|---:|
| **Focused effort (developer-hours)**            | **358 h**   | **816 h**   |
| ↳ in person-weeks @ 40 h/wk                     | 8.9 pw      | 20.4 pw     |
| ↳ in person-months @ 160 h/mo                   | 2.24 pm     | 5.10 pm     |
| **Calendar at current REF-cap (≈136 h/mo team)**, process-tax 2× | **5.3 mo** | **12.0 mo** |
| **Calendar at current REF-cap, process-tax 3×** *(central)* | **7.9 mo** | **18.0 mo** |
| **Calendar at current REF-cap, process-tax 5×** *(Glass's "real-world" anecdote)* | **13.1 mo** | **30.0 mo** |

**Bottom line for Bogdan, ahead of the Monday deep-dive:**

- Glass's verbal estimate on Day 3 (≈3 mo optimistic / ≈1 yr realistic) is **directly
  consistent** with the empirical numbers — the "optimistic" bound matches the
  median-effort-at-2×-tax band, and "realistic" matches the p80-at-3×-tax band.
- The dominant risk is **process tax**, not under-estimation of the work itself.
  The team's median `Time Spent / Estimate` ratio is **1.00 across every work
  class** — they do not under-quote. The variance lives in the calendar gap
  between *EWM-creation* and *first git commit*, where the median attention lag
  is **50.6 days**.
- **The single highest-leverage item is `MG6` (build_mrr_kwargs / consolidate
  the MRR call)** — 29 h median / 62 h p80. This one PR retires the five Day-1
  silent default bugs and is the natural Tier-1 of the refactor.

---

## 1 · Inputs

- `~/Desktop/Hack-A-Thon-2026/Focus-REF-Work-2026-04-25.xlsx`
  - **`Resolved-REF-Work-2026-04-25` tab** — 1,075 historical work items
    (Task / Story / Defect / Release / Capability) with `Estimate`,
    `Corrected Estimate`, `Time Spent`, `Creation Date`, `Resolution Date`.
    Time-window: 2023-02-02 to 2026-04-24.
  - **`Focus-REF-Work-2026-04-25` tab** — 94 currently in-flight items;
    used here for capacity (the per-developer "X Time" placeholder rows).
- **Git history** of `quicknxsv2/` (1,170 commits) and `mr_reduction/`
  (417 commits), scanned for merge subjects matching `ewm<NNNN>_<descr>`
  to obtain branch boundaries.
- **Day-3 backlog**, sourced from the Consensus & decisions section of
  `Hack-A-Thon-2026-Day3-Summary.md` (15 tech-debt items + 14 migration-
  table rows + 7 developer-insight cleanups + 3 scientist TODOs + 2
  cross-cutting infra items = 41 items, working from 39 logical line
  items as TD8 and IN1 each carry an internal multiplier).

---

## 2 · Phase 1 — Empirical baselines from EWM

All times normalized to **focused hours** (1 day = 8 h, 1 week = 40 h).
Per-class statistics on the Resolved sheet (n=1,076):

### 2.1 · Time Spent (focused hours per item) by work-class

| class      | n   | median (h) | mean (h) | p20 | p80 |
|---|---:|---:|---:|---:|---:|
| admin       | 4   | 1.0  | 2.5  | 0.0  | 8.0  |
| **bugfix**  | 36  | **5.5**  | 7.0  | 2.0  | **12.0** |
| ci_devops   | 32  | 2.0  | 3.5  | 1.0  | 5.0  |
| docs        | 19  | 2.0  | 3.7  | 2.0  | 5.0  |
| **investig**| 20  | **4.5**  | 7.3  | 3.0  | **16.0** |
| **newfeat** | 53  | **3.0**  | 5.7  | 1.0  | **8.0**  |
| other       | 161 | 4.0  | 5.5  | 2.0  | 8.0  |
| **refactor**| 24  | **4.0**  | 4.8  | 2.0  | **8.0**  |
| release     | 28  | 1.5  | 2.5  | 1.0  | 3.0  |
| review      | 73  | 1.0  | 1.9  | 1.0  | 2.0  |
| **test**    | 75  | **5.0**  | 6.4  | 2.0  | **11.0** |

Bolded rows are the five classes used for the Day-3 backlog (refactor /
bugfix / test / investig / newfeat). The classifier is Summary-keyword based
(see `/tmp/phase1_extract.py`); spot-checked on 25 refactor-classified items
and confirmed.

### 2.2 · Honesty of estimation (Time Spent ÷ Estimate)

| class      | n   | median | mean | p80  |
|---|---:|---:|---:|---:|
| admin       | 3   | 1.00 | 1.17 | 2.00 |
| bugfix      | 36  | 1.00 | 0.97 | 1.00 |
| ci_devops   | 29  | 1.00 | 0.89 | 1.00 |
| docs        | 19  | 1.00 | 0.95 | 1.00 |
| investig    | 19  | 1.00 | 1.01 | 1.50 |
| newfeat     | 51  | 1.00 | 1.09 | 1.33 |
| refactor    | 24  | 1.00 | 0.86 | 1.00 |
| test        | 71  | 1.00 | 0.88 | 1.00 |

**Median ratio = 1.00 across every work-class.** This team does not
systematically under-quote. The slight under-mean on test/refactor/ci_devops
(0.86–0.95) suggests they over-quote slightly and finish ahead. Investig and
newfeat are the only two with p80 > 1.0 (1.5× and 1.33×), reflecting the open-
ended nature of investigation work.

### 2.3 · Calendar gap (Creation → Resolution)

| class      | n   | median (days) | p80 (days) |
|---|---:|---:|---:|
| admin       | 5   | 4.1   | 23.0  |
| bugfix      | 71  | 7.1   | 36.9  |
| ci_devops   | 56  | 5.5   | 24.4  |
| docs        | 27  | 4.4   | 11.8  |
| investig    | 23  | 4.9   | 27.2  |
| newfeat     | 97  | 12.8  | 104.5 |
| refactor    | 47  | 12.9  | 29.2  |
| release     | 236 | 14.2  | 50.2  |
| review      | 70  | 1.2   | 4.9   |
| test        | 94  | 6.5   | 20.8  |

**Refactor work takes ≈4 h of focused effort (median) but spans ≈13 calendar
days from EWM creation to resolution.** The 77× ratio (13 d × 24 h ÷ 4 h)
is *not* the developer working slowly; it's the EWM-ticket-sat-in-the-queue
phenomenon Bogdan flagged. Phase 3 below decomposes this into attention-lag
and focused-work components.

### 2.4 · Per-developer throughput, trailing 6 months (Nov-2025–Apr-2026)

| Developer        | Resolved items / 6 mo | EWM-recorded h / 6 mo | h / mo (recorded) | h / mo (≈actual)¹ |
|---|---:|---:|---:|---:|
| Glass Elsarboukh | 70  | 138 | 23.0 | **50.0** |
| Marie Backman    | 78  | 170 | 28.3 | **61.5** |
| Kevin Tactac     | 43  | 68  | 11.3 | **24.6** |
| Jose Borreguero  | 4   | 0   | 0.0  | 0.0 *(rotated off after Aug 2025)* |
| **3-dev team total** | — | — | **62.6** | **≈136 h / mo** |

¹ *Only 46% of recent Resolved items have `Time Spent` populated.* Inflating the
recorded total by 1/0.46 ≈ 2.17× gives the "actual" column. Cross-checked against
Glass's `Glass Time` placeholder task in the Focus tab (30 h estimated for the
2026 2.2 cycle, ≈6 weeks → 5 h/week → 22 h/mo on REF) — this matches the
*recorded* hours, supporting the interpretation that recorded hours under-count
about 2× because some items don't get logged.

The 136 h/mo figure is **REF-only capacity**; the same three developers also
work other projects.

---

## 3 · Phase 2 — Day-3 backlog ledger

Each item is hand-classified into one or more Phase-1 work-classes, with a
**size multiplier** applied to the class median/p80:

- `trivial` × 0.5
- `small`   × 1.0  *(use class median directly)*
- `medium`  × 1.5
- `large`   × 2.0  *(use class p80, scaled)*

The script `/tmp/phase2_ledger.py` is reproducible — change a class
assignment or size and rerun.

### 3.1 · Per-item ledger

| ID  | Item | Class | Size | Count | Median (h) | p80 (h) | Note |
|---|---|---|---|---:|---:|---:|---|
| TD1  | Remove duplicated `DeadTimeCorrection` | refactor | small | 1 | 4.0 | 8.0 | QuickNXS version not used; concrete delete + rewire |
| TD2  | Reconcile `peak_finding.py` | refactor | small | 1 | 4.0 | 8.0 | Diff is whitespace-only per Day-1 deck |
| TD3  | Remove two `_as_ints` in `data_set.py` | refactor | trivial | 1 | 2.0 | 4.0 | Local helper consolidation |
| TD4  | Investigate per-cross-section MRR call | investig | medium | 1 | 6.8 | 24.0 | Pure analysis spike before refactor |
| TD5  | De-duplicate `quicknxs_scaling_factor` | refactor+bugfix | medium | 1 | 14.2 | 30.0 | 3 places; 2 with `+1` div-by-zero hack; root-cause fix |
| TD6  | Remove `email_test` key from `DEFAULT_OPTIONS` | refactor | trivial | 1 | 2.0 | 4.0 | Pure delete |
| TD7  | Remove GenX templates | refactor | small | 1 | 4.0 | 8.0 | Confirmed Day-2 decision |
| TD8  | Reconcile MRR 5 silent default mismatches | investig+bugfix | small × 5 | 5 | 50.0 | 140.0 | Each requires scientist sign-off |
| TD9  | Split `Configuration` into 3 classes (Global / Run / Plot) | refactor | large | 1 | 8.0 | 16.0 | God-class refactor; touches every consumer |
| TD10 | Rationalise `DEFAULT_OPTIONS` vs `ReductionDialog.get_options()` | refactor | small | 1 | 4.0 | 8.0 | Add missing keys + lint test |
| TD11 | Turn off `ErrorWeighting` consistently in `RefRoi` | bugfix | small | 1 | 5.5 | 12.0 | Audit + fix; Tim's ruling needed first |
| TD12 | Thread off file-I/O and long calc | newfeat | large | 1 | 6.0 | 16.0 | Architectural; QThread + signal-slot rewrite |
| TD13 | Deprecate `MRInspectData` (Mantid) | refactor | large | 1 | 8.0 | 16.0 | Cross-repo; replace with instrument-specific info |
| TD14 | Define `Enum` for `off_spec_x_axis` | refactor | trivial | 1 | 2.0 | 4.0 | 4 lines + grep-replace |
| TD15 | Use `scipy` for Planck / neutron mass | refactor | trivial | 1 | 2.0 | 4.0 | Constants swap |
| MG1  | GISANS calc → `mr_reduction` | refactor+test | large | 1 | 18.0 | 38.0 | New back-end home; tests must follow |
| MG2  | Off-spec reflectivity calc → `mr_reduction` | refactor+test | large | 1 | 18.0 | 38.0 | Back-end home + parity tests |
| MG3  | Python script generation | refactor | medium | 1 | 6.0 | 12.0 | Target package TBD |
| MG4  | Reduced data output (`.dat` files) | refactor | medium | 1 | 6.0 | 12.0 | Target package TBD; format-stable |
| MG5  | Unit conversion → `mr_reduction` | refactor | medium | 1 | 6.0 | 12.0 | Centralize; small surface |
| MG6  | **Consolidate MRR call (single `build_mrr_kwargs()`)** | refactor+bugfix+test | large | 1 | **29.0** | **62.0** | The flagship — retires the 5 silent defaults |
| MG7  | Consolidate `DataInfo` class | refactor | large | 1 | 8.0 | 16.0 | 413 vs 1101 lines; neither is a subset |
| MG8  | Pre-processing for plot-facing data | refactor | medium | 1 | 6.0 | 12.0 | Move shaping logic out of UI |
| MG9  | Move `interfaces/data_handling/``mr_reduction` | refactor+test | large | 1 | 18.0 | 38.0 | Wholesale-move per Glass's note |
| MG10 | Rebinning → back-end | refactor | medium | 1 | 6.0 | 12.0 | Replace 2× non-Mantid `Rebin` sites |
| MG11 | File summing → back-end | refactor | small | 1 | 4.0 | 8.0 | Move via `data_handling/` first |
| MG12 | Stitching → `mr_reduction` | refactor | medium | 1 | 6.0 | 12.0 | Day-1 deck row 7 |
| MG13 | Direct beam matching → `mr_reduction` | refactor | medium | 1 | 6.0 | 12.0 | Procedure not yet determined; investig component implicit |
| MG14 | Peak finding → `mr_reduction` | refactor | medium | 1 | 6.0 | 12.0 | After MG2 reconciliation |
| DI1  | Remove obsolete `#pylint` directives | refactor | trivial | 1 | 2.0 | 4.0 | grep-replace + verify |
| DI2  | Replace `auto_change_active` with `blockSignals()` | refactor | small | 1 | 4.0 | 8.0 | Qt idiom upgrade |
| DI3  | Consolidate reduction params `CrossSectionData``NexusData` | refactor | medium | 1 | 6.0 | 12.0 | Touches `Configuration`; partially overlaps TD9 |
| DI4  | Rename `gui.py``main.py` | refactor | trivial | 1 | 2.0 | 4.0 | Pure rename + import sweep |
| DI5  | Clean up bare `except`s (Mantid throws `RuntimeError`) | refactor | small | 1 | 4.0 | 8.0 | Per-except audit |
| DI6  | Look through TODOs, prune | refactor | small | 1 | 4.0 | 8.0 | Inventory + delete |
| DI7  | Make variable naming consistent | refactor | small | 1 | 4.0 | 8.0 | `number` vs `run_number` |
| SC1  | Physics-name ↔ PV mapping doc | investig+docs | small | 1 | 6.5 | 21.0 | `tthi` = `SampleAngle`, etc. |
| SC2  | Find test data for GISANS and off-specular | investig | medium | 1 | 6.8 | 24.0 | No GISANS data today |
| SC3  | Find better test data for QuickNXS / `mr_reduction` | investig+test | small | 1 | 9.5 | 27.0 | Wider coverage |
| IN1  | Whole-workflow regression tests (spec / off-spec / 4xs) | test | large × 3 | 3 | 30.0 | 66.0 | 3 representative blocks per Day-3 p.3 |
| IN2  | `build_mrr_kwargs()` shared parameter-builder *(Day-1 stretch)* | refactor+test | medium | 1 | 13.5 | 28.5 | The Tier-1 PR per the Day-1 deck *(overlaps MG6)* |
| **TOTAL** | | | | | **358** | **816** | |

**Notes on overlaps and duplications.** TD8 (5 silent-default reconciliations,
50/140 h) and MG6 (build_mrr_kwargs, 29/62 h) and IN2 (Day-1 stretch goal,
13.5/28.5 h) are all attacking the same root cause from different angles —
once MG6 lands, TD8 collapses to a 1-line PR per default. Don't double-count
in detailed planning; a sensible scope for the first hack-a-thon outcome is
**MG6 first → 5 silent-defaults retire as no-op PRs in TD8 → then MG7**.

Likewise TD9 (split `Configuration`) and DI3 (consolidate
`CrossSectionData``NexusData` reduction params) are partially the same
architectural move; sequence DI3 inside TD9.

### 3.2 · Capacity vs. backlog

At ≈136 h/mo of REF capacity from the three primary developers (§2.4):

| Total effort | Calendar at 100% REF allocation | Calendar at 50% REF allocation |
|---|---:|---:|
| 358 h (median) | 2.6 mo | 5.3 mo |
| 816 h (p80)    | 6.0 mo | 12.0 mo |

These are *focused-effort* months — they do **not** include the process
tax described in §4 (stories require reviews require multiple PR rounds
require release cycle ≈2–3 weeks per cut, per Glass on Day 3).

---

## 4 · Phase 3 — git cross-validation

Branches in `quicknxsv2` are named `ewm<NNNN>_<descr>` so the EWM ID is
recoverable from the merge commit subject. Scanning all merge commits in
both repos found **14 EWM↔git records** (all in `quicknxsv2`;
`mr_reduction` does not use the `ewm<NNNN>` branch convention as
consistently). Of those, 0 had `Time Spent` filled in on the parent EWM
record (the parent items are usually Stories; `Time Spent` lives on the
Tasks underneath, which don't carry the `ewm<NNNN>` branch tag).

### 4.1 · Per-record table

| Repo | EWM | EWM summary | Lag (d) | Work (d) | Active days | Commits | Diff |
|---|---:|---|---:|---:|---:|---:|---|
| quicknxs | 6004 | `[QuickNXS] Functionality to save ORSO`  | 301.7 | 1.1 | 2 | 8 | 6 files, +1*  |
| quicknxs | 9367 | `[QuickNXS] Reflectivity plot for direc...` | 97.7 | 0.0 | 1 | 3 | 2 files, +1* |
| quicknxs | 11653 | `[quicknxs] Reflectivity plot not updat...` | 82.9 | 6.0 | 1 | 2 | 8 files, +1* |
| quicknxs | 12806 | `[quicknxs] Direct beam not being corre...` | 55.1 | 1.7 | 2 | 8 | 15 files |
| quicknxs | 12788 | `[QuickNXS/mr_reduction] Port deadtime ...` | 84.2 | 0.7 | 2 | 3 | 8 files, +5* |
| quicknxs | 13661 | `[quicknxs] reflectivity calculation ca...` | 0.8 | 0.1 | 1 | 1 | 5 files, +2* |
| quicknxs | 14138 | `[quicknxs] Stitching overwrites reduct...` | 0.0 | 1.0 | 1 | 1 | 3 files |
| quicknxs | 14846 | `[QuickNXS] Mismatch in active run betw...` (×2) | 54.8, 68.8 | 0.9, 0.0 | 1, 1 | 3, 8 | — |
| quicknxs | 15204 | `[CIS] [QuickNXS] Fix functionality to ...` (×4) | -6.1 to 46.5 | 0.0–31.4 | 1–9 | 1–20 | up to 38 files |
| quicknxs | 15832 | `[QuickNXS] Diagnostic widget not openi...` | 0.9 | 1.0 | 2 | 3 | 8 files, +6* |

*sizes truncated where visible in raw output*

### 4.2 · Calibration constants

From the 14 records:

| Quantity | Median | p80 | Range |
|---|---:|---:|---|
| Attention lag (EWM Creation → first commit), days | **50.6** | 84.2 | -6.1 .. 301.7 |
| Focused calendar (first commit → merge), days | **1.0** | 3.1 | 0.0 .. 31.4 |
| Active commit-days per branch | 1.0 | 2.0 | 1 .. 9 |

**Interpretation.** EWM-creation date is *not* an honest start signal —
it precedes actual work by ≈7 weeks (median). Once a developer starts a
branch, work is concentrated: the median branch is closed in 1 calendar
day with 1–2 active commit-days. The few outliers (EWM 6004 at 301-day
lag, EWM 15204 at 31-day work-window) are the long-tail work that you
should plan for explicitly, not statistically.

This calibration **does not change** the focused-effort estimates in
§3.1 — those come from `Time Spent` which is already a focused-hours
measure. What it tells us is the **process tax** between *deciding to
do something* and *getting it merged* on this team's current cadence.

### 4.3 · Process tax — the multiplier from focused effort to calendar months

Three scenarios for the calendar projection:

| Process-tax | Effective effort (median) | Effective effort (p80) | Calendar median | Calendar p80 |
|---|---:|---:|---:|---:|
| **2×** *(low — single bundled PR per item)*  | 716 h | 1,633 h | **5.3 mo** | **12.0 mo** |
| **3×** *(mid — central estimate)*            | 1,073 h | 2,450 h | **7.9 mo** | **18.0 mo** |
| **5×** *(high — Glass's "real-world" anecdote)* | 1,789 h | 4,082 h | **13.1 mo** | **30.0 mo** |

The 3× central estimate corresponds to: each refactor item becomes 1
story → ~2 PRs after splitting per reviewer feedback → 1–2 review
rounds each → each PR waits ~2 weeks for the next release cut.
Empirically, this matches the 50.6-day attention-lag plus a working
period that is much shorter than the lag, integrated over many tasks
running concurrently.

The 5× "Glass refl1d anecdote" scenario is the case where every PR
needs Brian-and-Paul-class sign-off. It is **not** the working hypothesis
for this team — Glass's developers self-review faster than refl1d's
upstream — but it is the right number to use if scope expands to include
*non-REF* package compatibility (pinned Mantid version migration,
upstream dependency conflicts, etc.).

---

## 5 · Synthesis — what to plan for

### 5.1 · Bands for the back-end refactor (39 line items)

| Scenario | Effort | Calendar (REF-only capacity) |
|---|---:|---:|
| Best plausible (median effort, low process-tax) | 358 h | **5.3 mo** |
| Central (median effort, mid process-tax) | 1,073 h equiv. | **7.9 mo** |
| Realistic upper bound (p80 effort, mid process-tax) | 2,450 h equiv. | **18 mo** |

The team's empirical estimation honesty (`Spent / Estimate` median = 1.00)
makes me confident that the effort-band is tight; the calendar-band
spread is dominated by the process-tax assumption and by how much of
each developer's month actually lands on REF.

### 5.2 · Sequencing recommendation (data-driven)

Order items so each one removes more code than it adds, and each one
unblocks something downstream. Concretely:

1. **MG6 first** *(29 h / 62 h, large)*: `build_mrr_kwargs()` central
   parameter-builder. This is the Day-1 stretch goal and the
   single highest-leverage move. After MG6 lands:
   - **TD8** collapses from 50/140 h to ~5 h (one PR per default that
     adjusts a single dict entry now).
   - The "GUI vs autoreduce produce different R(Q)" Day-1 bullet is
     directly retired.
2. **TD13 + MG7** *(8/16 h + 8/16 h)*: deprecate `MRInspectData` and
   consolidate `DataInfo`. Both remove cross-repo coupling that every
   later migration rests on.
3. **MG9** *(18/38 h)*: lift `interfaces/data_handling/` wholesale into
   `mr_reduction`. After this, the Qt-import test (zero today) becomes
   trivially preserved and most of MG3–MG14 become *internal* to
   `mr_reduction` and lose their cross-repo overhead.
4. **TD9 + DI3** *(8/16 h + 6/12 h, with overlap)*: split `Configuration`
   into `Global / Run / Plot`. This unblocks the per-run/global
   parameter discrimination that Becky's xlsx (Day-2 deliverable) needs.
5. **TD12** *(6/16 h)*: thread off file-I/O. Independent of the back-end
   move; can run in parallel.
6. **The remaining MG\* items** can run concurrently in any order once
   MG9 lands, because each is now an *internal-to-`mr_reduction`*
   reorganization.
7. **IN1** *(30/66 h, spec/off-spec/4xs)*: regression tests, run as a
   gating check for everything above. Build these first if you want
   the safety net to bite during MG6.
8. **Trivial cleanups (DI1, DI4, TD3, TD6, TD14, TD15)**: 12 h total
   median. Bundle into a single PR-of-cleanups during a slow week.

### 5.3 · How this maps to the Day-2 outage commitment

The team committed on Day 2 to having the **back-end *plan*** ready
before the SNS beam outage (2026-06-25 → 2026-08-04, ≈9 weeks). The
plan is exactly what this document plus Glass's Google Doc constitute,
and it is *already* ready.

The **back-end *refactor*** is a different deliverable. At median
effort (358 h) and 50% REF allocation (≈68 h/mo), even the optimistic
case does not finish before 2026-08-04 (358 h / 68 h-per-mo = 5.3 mo).
The realistic case is roughly **end of 2026 → first half of 2027**
under the 3× process-tax assumption.

This matches Glass's verbal "≈3 mo optimistic / ≈1 yr realistic"
estimate from Day 3 directly: *3 months ≈ median / no process tax /
single-developer-pair / no other commitments*; *1 year ≈ p80 /
3× tax / current allocation*.

### 5.4 · "Could Claude estimate?" — Valeria's question, answered

Yes — for this kind of work, on this team, with this much historical
data, the per-item median is reliable to within ~30%. The realistic
calendar bound is reliable to within a factor of ≈2× and is dominated
by *how much of each developer's month is allocated to REF*, not by
how long any individual item takes.

The single highest-impact thing the team can do for predictability
is **fill in `Time Spent` on every Resolved item** (currently 46% are
missing it). Doing that for one calendar quarter would tighten the
calendar bands by ~30%.

---

## 6 · Caveats

- **The Phase-1 refactor sample is small** (n=24 items with both
  Estimate and Time Spent filled in). The median ratio of 1.00 is
  reassuring but the underlying class is heterogeneous. If you want
  tighter bounds on a specific Day-3 item, look up 2–3 EWM items
  whose Summary closely resembles it and use their actual Time Spent.
- **Phase-3 git data does not include `mr_reduction` records** (0 of
  the 14 cross-validated branches were in that repo). The
  `mr_reduction` history uses dependabot-flavored merge subjects more
  often than `ewm<NNNN>_<descr>`, so the pattern recovery is biased
  toward `quicknxs`. The calibration constants in §4.2 should be
  treated as upper-bound estimates of the lag and lower-bound
  estimates of the work-window for `mr_reduction` work.
- **The size multipliers (trivial/small/medium/large) are subjective.**
  Each item can be re-graded; the script `/tmp/phase2_ledger.py`
  re-rolls the totals automatically. Suggested grading rule of thumb:
  *small* if it touches one file and changes < 50 lines; *medium* if
  it touches up to 5 files; *large* if it crosses a package boundary
  or rewrites a class.
- **Capacity is REF-only.** The 136 h/mo is what the three developers
  *actually log to REF tickets* on a ≈46%-coverage basis; the same
  developers are also working other projects. If non-REF priorities
  spike, REF capacity will drop, not the other way around.
- **Process-tax is empirical for this team** (Story → Review → PR
  rounds → Release cycle 2–3 weeks per cut). It will be *higher* if
  the refactor needs sign-off from outside the immediate team
  (Mantid maintainers, Brian/Paul on refl1d, the NDIP/NOVA platform
  group). Day-3 already flagged this risk.
- **Concurrency assumed = 3 developers.** If Marie, Glass, and Kevin
  are not all assigned to REF for the duration, divide the calendar
  bands accordingly.

---

## 7 · Reproducibility

All calculations live in three scripts under `/tmp/` (rerun anytime to
get the same numbers):

| Script | What it does | Output |
|---|---|---|
| `/tmp/phase1_extract.py` | Reads the xlsx, classifies, stats per class | `/tmp/phase1_out.json` |
| `/tmp/phase2_ledger.py`  | Per-item ledger with class+size, totals & calendar bands | `/tmp/phase2_ledger.json` |
| `/tmp/phase3_git_v2.py`  | Mines EWM↔git mappings, computes lag and work-window | `/tmp/phase3_records.json` |

To redo with a different baseline week (e.g. trailing 12 months instead
of trailing 6):

1. In `phase1_extract.py`, filter `R` by `_resolved >= datetime(2025,5,1)` etc.
2. In `phase2_ledger.py`, edit `BASE` (the per-class median/p80 dict)
   and / or any item's class / size assignment.
3. Re-run all three scripts.

---

*End of baseline. If on Monday you want this re-cast against a
different process-tax assumption, or with a different sequencing
order, the scripts above are the place to start. — Claude*
+43 −0

File changed.

Preview size limit exceeded, changes collapsed.

+238 −0

File added.

Preview size limit exceeded, changes collapsed.

+145 −0

File added.

Preview size limit exceeded, changes collapsed.

+176 −0

File added.

Preview size limit exceeded, changes collapsed.