Commit 2c9761b7 authored by Vacaliuc, Bogdan's avatar Vacaliuc, Bogdan
Browse files

plan: finalize v2 redesign — incorporate reviewer resolutions



Records the seven §11 resolutions from the 2026-05-02 review pass.
Adds three new sections that the design needed once the questions
were closed:

§13 — resolution table (4th-clone Administrator; update-config
      skill confirmed; read-only on workers; ls-remote default;
      defer phase 1 to a clean session; full 5-slug exercise;
      reviewer applies F1 Option A manually)

§14 — abbreviated dry-run cycle via opt-in `test-dry-run` pixi
      task and a new {integrator-test-cmd} config knob. Default
      stays test-reduction in production; test-dry-run is dry-run-
      only and additive — never replaces the production task. Cuts
      phase-4 wall clock from ~80-90 min to ~5-7 min while
      preserving the full failure-mode coverage matrix.

§15 — pixi syntax for pytest-timeout install (reviewer's manual
      action on lr_reduction:new_workflow_ui_plan) plus the
      Administrator phase 1 verification block that probes for
      the plugin and fails fast if missing — closing the F1 leak.

§16 — Phase 1 implementation prompt for the next session.
      Sets /model claude-opus-4-7, /effort xhigh; lists the 10
      ordered steps spanning protocol doc + four prompts +
      Administrator-prompt.md + four poll wrappers + statusline
      script + Stop hook via update-config; explicit guard rails
      against editing lr_reduction or running phase 4.

Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
parent 8eb49e8d
Loading
Loading
Loading
Loading
+427 −0
Original line number Diff line number Diff line
@@ -1243,3 +1243,430 @@ before phase 1 starts.
  persistence`, `[WHEN: long-running task]` (TaskStop discipline)
- `setup/patterns/agent-discipline.md` (parent, `main`) — target of
  "don't mask git push exit codes"

---

## 13. Plan finalization (2026-05-02)

The human reviewer answered the §11 open questions. Resolutions are
recorded below; the implementation-pass agent (next session) treats
these as authoritative on top of the §1-§10 design.

| §11 # | Question | Resolution |
|---|---|---|
| 1 | Administrator clone | **4th dedicated clone** — `<path-to-session-4>` (proposal accepted) |
| 2 | Recap-suppression mechanism | Use the `update-config` skill against `<clone>/.claude/settings.local.json`. Reviewer's understanding of the skill is **correct**: it handles both `settings.json` and `settings.local.json` (including hooks like the Stop-restore in §10.4) — confirmed |
| 3 | Administrator autonomy | **Read-only on workers** in v2 (proposal accepted). Future v3 can revisit if pattern data justifies it |
| 4 | Discovery method default | **`git ls-remote` (portable)** — GitHub REST + ETag stays as documented opt-in alternative for high-frequency polling |
| 5 | Apply in this session? | **No — defer to a clean session.** Phase 1 prompt is in §16 below |
| 6 | Phase 4 dry-run slug coverage | **All five slugs** (alpha, beta, gamma, delta, epsilon). Cycle-time concern addressed by §14's `test-dry-run` pixi task |
| 7 | F1 pytest-timeout fix | **Option A** — reviewer applies it manually to `new_workflow_ui_plan` (lr_reduction submodule) before phase 4. See §15 for pixi syntax. Administrator phase 1 verifies presence — see §15.2 |

These resolutions are non-binding on §1-§10's design — they sharpen
it. Where the resolution narrows a previously-open choice, the
implementation pass treats the resolution as the canonical default.

---

## 14. §11.6 refinement — abbreviated dry-run tests

**Problem.** The dry-run currently invokes `pixi run test-reduction`
for every Integrator cycle (~9.5 min wall clock per cycle, dominated
by 281 production tests reading from network mounts). On the full
five-slug exercise (alpha + beta + gamma + delta + epsilon, with
gamma cycling 3 times), wall-clock exceeds 80-90 min — verified by
the dry-run-Integrator-findings.md §2 cycle log.

**Reviewer's preference (§11.6 resolution).** Run all five slugs;
abbreviate cycle time if achievable without compromising production
instructions.

### 14.1 Design — `test-dry-run` pixi task with config knob

Add a new pixi task to `lr_reduction/pyproject.toml` that runs only
the synthesized dry-run tests (~5-10 s per cycle):

```toml
[tool.pixi.tasks]
test-reduction = { cmd = "cd tests/ && python -m pytest -vv --cov=../src --cov-report=xml --cov-report=term"}
test-dry-run   = { cmd = "cd tests/dry_run/ && python -m pytest -vv" }
```

The task targets only the `tests/dry_run/` directory the Developer
synthesizes per dry-run.md §5.3, so production tests don't run.
This is **safe for production** because:

- The original `test-reduction` task is unchanged in command,
  arguments, or behavior.
- The new task is dry-run-only by name; Integrator picks one based
  on a config knob.
- If a future commit accidentally moves a production test under
  `tests/dry_run/`, the pre-existing `test-reduction` still
  exercises it — `test-dry-run` is purely additive.

### 14.2 New configuration knob (§7 of orchestration.md)

| Symbol | Meaning | Default | Change by |
|---|---|---|---|
| `{integrator-test-cmd}` | Pixi task name the Integrator runs | **`test-reduction`** in production; **`test-dry-run`** when `DRY_RUN=1` | Opening prompt; Integrator's prompt picks based on `DRY_RUN` |

Defaults preserve the original design intent: a *first-time*
production dry run still uses `test-reduction` to validate the full
test pipeline. The `test-dry-run` shortcut is **opt-in** via the
knob, not the default — protocol changes don't accidentally bypass
production-pipeline coverage.

### 14.3 Integrator prompt change (§8.3 row update)

Append to the Integrator's loop-iteration block:

```
3. Run the test command:
     pixi run {integrator-test-cmd}
   In production this is `test-reduction`. In dry-run mode the
   knob defaults to `test-dry-run` (only the synthesized
   tests/dry_run/ tests; ~5-10 s per cycle vs. ~9.5 min for the
   full suite). The infrastructure-failure path (delta with
   --timeout=2) works against either task; pytest-timeout is
   required in either case (see §15).
```

For the delta slug specifically: Integrator continues to pass
`-- --timeout=2` to whichever task the knob selects. Both
`test-reduction` and `test-dry-run` accept the `--` passthrough.

### 14.4 Phase-4 wall-clock estimate (after change)

7 protocol cycles (per Integrator findings §2) × ~10 s test runtime
= ~70 s of test wall-clock. Plus polling latency (TA=TD=TI=30s,
~30 s × 7 = ~210 s = 3.5 min). Plus PR-creation roundtrips and
ref-state propagation (~1 min). **Total dry-run wall clock: ~5-7
min** — comfortably inside the §8 "30-60 min typical" budget and
fast enough to iterate.

### 14.5 Risk register for the change

| Risk | Mitigation |
|---|---|
| `tests/dry_run/` directory doesn't exist on a fresh `{base-branch}` | Developer creates it on the first feature branch (already does this per dry-run.md §5.3); pixi task fails cleanly with "no tests collected" if the dir is empty, which is itself a useful Initialization signal |
| User runs `test-dry-run` accidentally in a production session | Knob defaults to `test-reduction`; only set in the dry-run opening prompt |
| Test-suite drift (production test depends on a fixture that lives in `tests/dry_run/`) | Phase 4 dry run has both tasks runnable; if pre-flight in Administrator phase 1 sees `test-dry-run` collect 0 tests, that's a flag |

---

## 15. §11.7 — pytest-timeout install (reviewer-side action) and verification

### 15.1 Pixi syntax for adding pytest-timeout

The reviewer's preferred path is editing `pyproject.toml` directly,
which keeps the change auditable in git:

```toml
[tool.pixi.feature.developer.dependencies]
pytest-timeout = ">=2.1"
```

(Adjust the feature name if `lr_reduction/pyproject.toml` uses a
different one — the dry-run-Initialization-findings.md §2 noted the
fork's pixi config has feature groups; `developer` was the existing
group for dev-time dependencies. Verify with
`grep -nE '\[tool\.pixi\.feature\.' lr_reduction/pyproject.toml`
before editing.)

After the edit, regenerate the lock file:

```bash
cd lr_reduction
pixi install      # updates pixi.lock; re-resolves the env
pixi run python -c "import pytest_timeout; print(pytest_timeout.__version__)"
```

The CLI alternative (faster but produces the same edit):

```bash
cd lr_reduction
pixi add --feature developer pytest-timeout
```

Commit on `new_workflow_ui_plan` with a message like:

```
deps: add pytest-timeout for orchestration dry-run delta path

The dry-run delta slug exercises the infrastructure-failure path
via `pixi run test-reduction -- --timeout=2`. pytest-timeout is the
upstream pytest plugin that implements `--timeout`. Without it,
pytest exits 4 at argument parsing before any tests run, and the
dry-run can't validate the Integrator's infra-failure handling.

See: tasking-branch/plan/dry-run-Analyst-findings.md F1.
```

Standard merge / push to `agentic` per the existing v1 push
allowlist for `lr_reduction`.

### 15.2 Administrator phase 1 verification step

The Administrator agent's phase 1 checklist (in
`Administrator-prompt.md`, §5.8 of this redesign) gains a
"Target-branch dependency verification" step that runs **after**
the existing `initialization.md` §8 ("Test environment") check and
before declaring "Phase 1 complete." The implementation-pass agent
adds this to both `Administrator-prompt.md` and the
`initialization.md` reference content (the latter so a re-init is
also covered).

**Verification block (paste into Administrator-prompt.md phase 1):**

```
Target-branch dependency verification:

1. Confirm the active pixi env corresponds to the target branch
   ({base-branch} = new_workflow_ui_plan):
     cd lr_reduction
     git rev-parse --abbrev-ref HEAD          # expect {base-branch}
     pixi info | grep -E '^(Project|Environments)'

2. Probe each test-runtime plugin the protocol requires:
     pixi run python -c "import pytest_timeout; print('pytest-timeout', pytest_timeout.__version__)" \
       || echo "MISSING: pytest-timeout — needed for dry-run-delta. \
                See plan/orchestration-v2-redesign.md §15.1 for install."

3. Confirm the canonical test commands are accepted:
     pixi run test-reduction -- --collect-only --timeout=1 -k 'test_does_not_exist' \
       2>&1 | tail -5
   Expect a clean "0 tests collected" line — not an arg-parse error
   (exit 4) and not an environment failure. The --timeout=1 flag
   transitively verifies pytest-timeout is loaded.

4. (Dry-run only, if {integrator-test-cmd} = test-dry-run):
   Confirm the dry-run pixi task exists:
     pixi run test-dry-run -- --collect-only 2>&1 | tail -5
   Expect "0 tests collected" cleanly (the synthesized test files
   don't exist yet on a fresh {base-branch}; the task itself must
   exist and be invokable). If the task is missing, that's a
   pyproject.toml gap — see plan/orchestration-v2-redesign.md §14.

5. Read pyproject.toml's [tool.ruff.lint] ignore list and confirm
   it is non-empty (per dry-run-Initialization-findings.md §2;
   missing ignore list ⇒ 780 ruff violations on commit, breaking
   pre-commit hooks).

If any check fails, STOP and hand the user a specific action item.
Do not enter phase 2 (active monitoring) until every check passes.
```

This makes the missing-plugin failure mode (Analyst F1) impossible
to leak into a dry run — Administrator phase 1 fails fast on it.

---

## 16. Phase 1 implementation prompt (for the next session)

Per §11.5 resolution: the implementation pass runs in a fresh
session. The reviewer pastes the prompt below into the new session
after setting model and effort.

### 16.1 Pre-prompt setup

```
/model claude-opus-4-7
/effort xhigh
```

**Why this combo:** Phase 1 is implementation under a clear spec
(§7-§10 of this document plus the resolutions in §13). Per
`orchestration.md` §9.5's "Developer / Integrator robust default"
line: Opus 4.7 / `default` is the spec-driven implementation
target. We bump to `xhigh` because the work spans seven files
(orchestration.md, three worker prompts, initialization.md,
dry-run.md, new Administrator-prompt.md) plus three new scripts —
careful cross-document consistency matters, and `xhigh` reduces
the chance of forgotten cross-references.

If `xhigh` is unavailable in the current Claude Code version,
fall back to `/effort max`. Do not downgrade to `default` — the
multi-file consistency cost grows superlinearly with effort
reduction.

### 16.2 Prompt body (paste into new session)

```
You are continuing the lr_reduction-new_workflow-repairs effort.
Your task is to execute Phase 1, 2, and 3 of the orchestration v2
redesign.

Configuration:
  {tasking-branch} = lr_reduction-new_workflow-repairs
  {tasking-prefix} = /media/ssd2/Projects/Claude/1/tasking/

Read in this order before starting:
  1. {tasking-prefix}/plan/orchestration-v2-redesign.md
     — your spec. Especially §7.1, §8.1-§8.5, §9.1-§9.2, §10.2-§10.4,
       §13 (resolutions), §14 (test-dry-run knob), §15.2 (Admin
       phase 1 verification block).
  2. {tasking-prefix}/plan/orchestration.md
     — current v1 doc; you will edit it.
  3. {tasking-prefix}/plan/Analyst-prompt.md,
     {tasking-prefix}/plan/Developer-prompt.md,
     {tasking-prefix}/plan/Integrator-prompt.md
     — current v1 prompts; you will edit each.
  4. {tasking-prefix}/plan/initialization.md
     — content becomes Administrator phase 1 subroutine; add
       deprecation banner + F1-fix probes (§15.2).
  5. {tasking-prefix}/plan/dry-run.md
     — receives F2/F4/F5 fixes per Analyst findings.
  6. {tasking-prefix}/plan/dry-run-Analyst-findings.md,
     {tasking-prefix}/plan/dry-run-Developer-findings.md,
     {tasking-prefix}/plan/dry-run-Integrator-findings.md
     — the source of F1-F5 / §3.1-§3.7 / §4.1-§4.3 fixes.
  7. {tasking-prefix}/CLAUDE.md
  8. {tasking-prefix}/../CLAUDE.md
  9. ~/.claude/CLAUDE.md (especially [ALWAYS] sections)

Apply changes in this order; commit each as its own commit on the
lr_reduction-new_workflow-repairs branch of the tasking submodule
(commit-as-you-go per [ALWAYS]):

  Phase 1 — protocol doc + prompts (about 6 commits):

  1. Edit orchestration.md per §7.1 of the redesign plan.
     - §3 ref namespace: add init-check-* and admin-status-*
     - §5 naming conventions: add admin-status-<date>-<host> row
     - §6 state machine: explicit (SHA, name) dedup; ls-remote
       discovery; targeted fetch only when picking up work
     - §6.4 NEW: Administrator state machine (per §5.1 of the
       redesign plan)
     - §7 knobs: add TM, {admin-state-dir}, {admin-clone},
       {discovery-method}, {integrator-test-cmd} (§14.2);
       tighten cadence guidance: 60s default, 30s min in dry-run
     - §8 push allowlist: add Administrator's narrow allowlist;
       explicit deny on Administrator pushing protocol refs
     - §9.1 Analyst prompt: add session-setup preamble + loop-
       iteration instructions (§8.1, §8.2, §8.3 Analyst row)
     - §9.2 Developer prompt: same; plus v{N>1} feature-source
       fix (Developer findings §3.3); empty-commit convention
       (§3.4); skipped-triage.txt (§3.5)
     - §9.3 Integrator prompt: same; plus --force on tag fetch
       (Integrator findings §4.2); test-runtime budget note
     - §9.4 Initialization: REMOVE; replace with deprecation
       banner pointing to §9.6
     - §9.5 model/effort table: add Administrator row
       (Sonnet 4.6 / default; Haiku 4.5 cost-efficient)
     - §9.6 NEW Administrator: full prompt per §5.8 + §15.2
     - §9.7 NEW: statusline install/restore protocol (common
       preamble snippet; cross-referenced from §9.1-§9.3, §9.6)
     - §12 failure modes: add 4 new entries (Monitor exits / silent
       dedup / discovery rate-limited / state-file stale)
     - §13.1 runbook: Administrator-first sequence
     - §16 NEW: polling-cost analysis (per §6 of the redesign)

  2. Edit Analyst-prompt.md, Developer-prompt.md, Integrator-prompt.md
     to match orchestration.md §9.1-§9.3 (one commit per file is fine
     or one commit covering all three; pick whichever reads better
     in git log).

  3. Write {tasking-prefix}/plan/Administrator-prompt.md per
     orchestration.md §9.6 + §15.2 of the redesign plan.

  4. Edit dry-run.md with F2 (^{} peel), F4 (refspec form), F5
     (no-pipe-on-push) per Analyst findings; §3 cadence note
     (10s warning, 30s minimum in dry-run, 60s production).

  5. Edit initialization.md with §15.2 verification block + §0
     deprecation banner pointing to Administrator-prompt.md.

  6. Edit Initialization-prompt.md (existing file) to add the §0
     deprecation banner; do not delete (it remains canonical for
     the checklist content).

  Phase 2 — supporting scripts:

  7. Write ~/.claude/state/agent-statusline.sh per §4.2 of the
     redesign plan. Mode 700. mkdir -p ~/.claude/state if needed.

  8. Write {tasking-prefix}/plan/scripts/poll-wrapper-Analyst.sh,
     poll-wrapper-Developer.sh, poll-wrapper-Integrator.sh,
     poll-wrapper-Administrator.sh per §3.1 of the redesign plan.
     Mode 700. Each emits typed STALLED events on failure,
     continues polling, updates the role's state file.

  Phase 3 — Stop hook for statusline restore (use update-config skill):

  9. Use the update-config skill to add a Stop hook to
     ~/.claude/settings.json that restores the statusline backup
     from ~/.claude/state/statusline-backup-<sessionId>.json on
     session end. Reference: §10.4 of the redesign plan.

  10. While in update-config: discover and document the exact
      settings key for "disable inline recap" (per §11.2
      resolution; the reviewer wants to know the exact name for
      future opening prompts). Record it in
      orchestration-v2-redesign.md as a one-line addendum to §4.4.

[ALWAYS] reminders that apply throughout:
  - Regular merges only; no force-push, no --no-verify, no --amend
    after hook failure (Git discipline).
  - Standard merge commits; do not push without explicit user
    approval.
  - Commit-as-you-go on the tasking submodule. Match the recent
    commit-message style (see `git log --oneline -10`).
  - Robust over simple. When ambiguity arises, the redesign plan's
    text is the spec — do not improvise.
  - [TIME-BOXED] Aggressive context persistence: after each phase,
    update the redesign plan's §10 with done-vs-remaining or commit
    a small "phase X done" marker in plan/.

Do NOT in this session:
  - Edit lr_reduction (the reviewer is handling pytest-timeout).
  - Run the dry-run (that's Phase 4, a separate session).
  - Push to origin/agentic (push stays user-initiated).
  - Bump the parent repo's gitlinks (the reviewer handles those
    manually between efforts).

You may exit when phase 1, 2, and 3 are complete and committed,
with a one-paragraph end-of-session summary listing the commits
produced and any deviations from the spec.

Standby for any clarification questions before starting; if the
spec is unambiguous in the redesign plan, proceed.
```

### 16.3 Estimated wall-clock for the new session

- Phase 1 (protocol doc + prompts): 30-60 min — substantial editing
  spread across seven files with cross-references.
- Phase 2 (scripts): 15-30 min — three Bash wrappers (Analyst,
  Developer, Integrator are near-identical with refspec
  differences) plus the statusline command and the Administrator
  wrapper.
- Phase 3 (Stop hook + recap-key discovery via update-config):
  10-20 min — small but new-to-this-session tooling.

**Total: ~60-110 min wall clock.** The session may /clear between
phases if context utilization gets high; the redesign plan and
intermediate commits provide enough context for a fresh resume.

### 16.4 What success looks like

By end of the new session:

- 6+ commits on `lr_reduction-new_workflow-repairs` (tasking) covering
  the seven-file edit set per phase 1 of §10.2.
- 4 new files committed on tasking branch:
  `plan/Administrator-prompt.md`,
  `plan/scripts/poll-wrapper-Analyst.sh`,
  `plan/scripts/poll-wrapper-Developer.sh`,
  `plan/scripts/poll-wrapper-Integrator.sh`,
  `plan/scripts/poll-wrapper-Administrator.sh`.
- 1 new file in `~/.claude/state/`: `agent-statusline.sh` (not
  committed; user-scoped).
- 1 Stop hook entry in `~/.claude/settings.json` (not committed;
  user-scoped) — added via update-config skill.
- Recap-suppression settings key documented in §4.4 of this
  redesign plan as a one-line addendum.
- End-of-session summary in the conversation listing what was done
  and any open items for phase 4.

If anything in §10.2 (commits 1-6) is incomplete or deferred, the
new session reports it explicitly so phase 4 can plan around it.