plan: finalize v2 redesign — incorporate reviewer resolutions (2c9761b7) · Commits · Vacaliuc, Bogdan / tasking

plan/orchestration-v2-redesign.md

+427 −0

Original line number	Diff line number	Diff line
		@@ -1243,3 +1243,430 @@ before phase 1 starts.
		persistence`, `[WHEN: long-running task]` (TaskStop discipline)
		- `setup/patterns/agent-discipline.md` (parent, `main`) — target of
		"don't mask git push exit codes"

		---

		## 13. Plan finalization (2026-05-02)

		The human reviewer answered the §11 open questions. Resolutions are
		recorded below; the implementation-pass agent (next session) treats
		these as authoritative on top of the §1-§10 design.

		\| §11 # \| Question \| Resolution \|
		\|---\|---\|---\|
		\| 1 \| Administrator clone \| 4th dedicated clone — `<path-to-session-4>` (proposal accepted) \|
		\| 2 \| Recap-suppression mechanism \| Use the `update-config` skill against `<clone>/.claude/settings.local.json`. Reviewer's understanding of the skill is correct: it handles both `settings.json` and `settings.local.json` (including hooks like the Stop-restore in §10.4) — confirmed \|
		\| 3 \| Administrator autonomy \| Read-only on workers in v2 (proposal accepted). Future v3 can revisit if pattern data justifies it \|
		\| 4 \| Discovery method default \| `git ls-remote` (portable) — GitHub REST + ETag stays as documented opt-in alternative for high-frequency polling \|
		\| 5 \| Apply in this session? \| No — defer to a clean session. Phase 1 prompt is in §16 below \|
		\| 6 \| Phase 4 dry-run slug coverage \| All five slugs (alpha, beta, gamma, delta, epsilon). Cycle-time concern addressed by §14's `test-dry-run` pixi task \|
		\| 7 \| F1 pytest-timeout fix \| Option A — reviewer applies it manually to `new_workflow_ui_plan` (lr_reduction submodule) before phase 4. See §15 for pixi syntax. Administrator phase 1 verifies presence — see §15.2 \|

		These resolutions are non-binding on §1-§10's design — they sharpen
		it. Where the resolution narrows a previously-open choice, the
		implementation pass treats the resolution as the canonical default.

		---

		## 14. §11.6 refinement — abbreviated dry-run tests

		Problem. The dry-run currently invokes `pixi run test-reduction`
		for every Integrator cycle (~9.5 min wall clock per cycle, dominated
		by 281 production tests reading from network mounts). On the full
		five-slug exercise (alpha + beta + gamma + delta + epsilon, with
		gamma cycling 3 times), wall-clock exceeds 80-90 min — verified by
		the dry-run-Integrator-findings.md §2 cycle log.

		Reviewer's preference (§11.6 resolution). Run all five slugs;
		abbreviate cycle time if achievable without compromising production
		instructions.

		### 14.1 Design — `test-dry-run` pixi task with config knob

		Add a new pixi task to `lr_reduction/pyproject.toml` that runs only
		the synthesized dry-run tests (~5-10 s per cycle):

		```toml
		[tool.pixi.tasks]
		test-reduction = { cmd = "cd tests/ && python -m pytest -vv --cov=../src --cov-report=xml --cov-report=term"}
		test-dry-run = { cmd = "cd tests/dry_run/ && python -m pytest -vv" }
		```

		The task targets only the `tests/dry_run/` directory the Developer
		synthesizes per dry-run.md §5.3, so production tests don't run.
		This is safe for production because:

		- The original `test-reduction` task is unchanged in command,
		arguments, or behavior.
		- The new task is dry-run-only by name; Integrator picks one based
		on a config knob.
		- If a future commit accidentally moves a production test under
		`tests/dry_run/`, the pre-existing `test-reduction` still
		exercises it — `test-dry-run` is purely additive.

		### 14.2 New configuration knob (§7 of orchestration.md)

		\| Symbol \| Meaning \| Default \| Change by \|
		\|---\|---\|---\|---\|
		\| `{integrator-test-cmd}` \| Pixi task name the Integrator runs \| `test-reduction` in production; `test-dry-run` when `DRY_RUN=1` \| Opening prompt; Integrator's prompt picks based on `DRY_RUN` \|

		Defaults preserve the original design intent: a first-time
		production dry run still uses `test-reduction` to validate the full
		test pipeline. The `test-dry-run` shortcut is opt-in via the
		knob, not the default — protocol changes don't accidentally bypass
		production-pipeline coverage.

		### 14.3 Integrator prompt change (§8.3 row update)

		Append to the Integrator's loop-iteration block:

		```
		3. Run the test command:
		pixi run {integrator-test-cmd}
		In production this is `test-reduction`. In dry-run mode the
		knob defaults to `test-dry-run` (only the synthesized
		tests/dry_run/ tests; ~5-10 s per cycle vs. ~9.5 min for the
		full suite). The infrastructure-failure path (delta with
		--timeout=2) works against either task; pytest-timeout is
		required in either case (see §15).
		```

		For the delta slug specifically: Integrator continues to pass
		`-- --timeout=2` to whichever task the knob selects. Both
		`test-reduction` and `test-dry-run` accept the `--` passthrough.

		### 14.4 Phase-4 wall-clock estimate (after change)

		7 protocol cycles (per Integrator findings §2) × ~10 s test runtime
		= ~70 s of test wall-clock. Plus polling latency (TA=TD=TI=30s,
		~30 s × 7 = ~210 s = 3.5 min). Plus PR-creation roundtrips and
		ref-state propagation (~1 min). **Total dry-run wall clock: ~5-7
		min** — comfortably inside the §8 "30-60 min typical" budget and
		fast enough to iterate.

		### 14.5 Risk register for the change

		\| Risk \| Mitigation \|
		\|---\|---\|
		\| `tests/dry_run/` directory doesn't exist on a fresh `{base-branch}` \| Developer creates it on the first feature branch (already does this per dry-run.md §5.3); pixi task fails cleanly with "no tests collected" if the dir is empty, which is itself a useful Initialization signal \|
		\| User runs `test-dry-run` accidentally in a production session \| Knob defaults to `test-reduction`; only set in the dry-run opening prompt \|
		\| Test-suite drift (production test depends on a fixture that lives in `tests/dry_run/`) \| Phase 4 dry run has both tasks runnable; if pre-flight in Administrator phase 1 sees `test-dry-run` collect 0 tests, that's a flag \|

		---

		## 15. §11.7 — pytest-timeout install (reviewer-side action) and verification

		### 15.1 Pixi syntax for adding pytest-timeout

		The reviewer's preferred path is editing `pyproject.toml` directly,
		which keeps the change auditable in git:

		```toml
		[tool.pixi.feature.developer.dependencies]
		pytest-timeout = ">=2.1"
		```

		(Adjust the feature name if `lr_reduction/pyproject.toml` uses a
		different one — the dry-run-Initialization-findings.md §2 noted the
		fork's pixi config has feature groups; `developer` was the existing
		group for dev-time dependencies. Verify with
		`grep -nE '\[tool\.pixi\.feature\.' lr_reduction/pyproject.toml`
		before editing.)

		After the edit, regenerate the lock file:

		```bash
		cd lr_reduction
		pixi install # updates pixi.lock; re-resolves the env
		pixi run python -c "import pytest_timeout; print(pytest_timeout.__version__)"
		```

		The CLI alternative (faster but produces the same edit):

		```bash
		cd lr_reduction
		pixi add --feature developer pytest-timeout
		```

		Commit on `new_workflow_ui_plan` with a message like:

		```
		deps: add pytest-timeout for orchestration dry-run delta path

		The dry-run delta slug exercises the infrastructure-failure path
		via `pixi run test-reduction -- --timeout=2`. pytest-timeout is the
		upstream pytest plugin that implements `--timeout`. Without it,
		pytest exits 4 at argument parsing before any tests run, and the
		dry-run can't validate the Integrator's infra-failure handling.

		See: tasking-branch/plan/dry-run-Analyst-findings.md F1.
		```

		Standard merge / push to `agentic` per the existing v1 push
		allowlist for `lr_reduction`.

		### 15.2 Administrator phase 1 verification step

		The Administrator agent's phase 1 checklist (in
		`Administrator-prompt.md`, §5.8 of this redesign) gains a
		"Target-branch dependency verification" step that runs after
		the existing `initialization.md` §8 ("Test environment") check and
		before declaring "Phase 1 complete." The implementation-pass agent
		adds this to both `Administrator-prompt.md` and the
		`initialization.md` reference content (the latter so a re-init is
		also covered).

		Verification block (paste into Administrator-prompt.md phase 1):

		```
		Target-branch dependency verification:

		1. Confirm the active pixi env corresponds to the target branch
		({base-branch} = new_workflow_ui_plan):
		cd lr_reduction
		git rev-parse --abbrev-ref HEAD # expect {base-branch}
		pixi info \| grep -E '^(Project\|Environments)'

		2. Probe each test-runtime plugin the protocol requires:
		pixi run python -c "import pytest_timeout; print('pytest-timeout', pytest_timeout.__version__)" \
		\|\| echo "MISSING: pytest-timeout — needed for dry-run-delta. \
		See plan/orchestration-v2-redesign.md §15.1 for install."

		3. Confirm the canonical test commands are accepted:
		pixi run test-reduction -- --collect-only --timeout=1 -k 'test_does_not_exist' \
		2>&1 \| tail -5
		Expect a clean "0 tests collected" line — not an arg-parse error
		(exit 4) and not an environment failure. The --timeout=1 flag
		transitively verifies pytest-timeout is loaded.

		4. (Dry-run only, if {integrator-test-cmd} = test-dry-run):
		Confirm the dry-run pixi task exists:
		pixi run test-dry-run -- --collect-only 2>&1 \| tail -5
		Expect "0 tests collected" cleanly (the synthesized test files
		don't exist yet on a fresh {base-branch}; the task itself must
		exist and be invokable). If the task is missing, that's a
		pyproject.toml gap — see plan/orchestration-v2-redesign.md §14.

		5. Read pyproject.toml's [tool.ruff.lint] ignore list and confirm
		it is non-empty (per dry-run-Initialization-findings.md §2;
		missing ignore list ⇒ 780 ruff violations on commit, breaking
		pre-commit hooks).

		If any check fails, STOP and hand the user a specific action item.
		Do not enter phase 2 (active monitoring) until every check passes.
		```

		This makes the missing-plugin failure mode (Analyst F1) impossible
		to leak into a dry run — Administrator phase 1 fails fast on it.

		---

		## 16. Phase 1 implementation prompt (for the next session)

		Per §11.5 resolution: the implementation pass runs in a fresh
		session. The reviewer pastes the prompt below into the new session
		after setting model and effort.

		### 16.1 Pre-prompt setup

		```
		/model claude-opus-4-7
		/effort xhigh
		```

		Why this combo: Phase 1 is implementation under a clear spec
		(§7-§10 of this document plus the resolutions in §13). Per
		`orchestration.md` §9.5's "Developer / Integrator robust default"
		line: Opus 4.7 / `default` is the spec-driven implementation
		target. We bump to `xhigh` because the work spans seven files
		(orchestration.md, three worker prompts, initialization.md,
		dry-run.md, new Administrator-prompt.md) plus three new scripts —
		careful cross-document consistency matters, and `xhigh` reduces
		the chance of forgotten cross-references.

		If `xhigh` is unavailable in the current Claude Code version,
		fall back to `/effort max`. Do not downgrade to `default` — the
		multi-file consistency cost grows superlinearly with effort
		reduction.

		### 16.2 Prompt body (paste into new session)

		```
		You are continuing the lr_reduction-new_workflow-repairs effort.
		Your task is to execute Phase 1, 2, and 3 of the orchestration v2
		redesign.

		Configuration:
		{tasking-branch} = lr_reduction-new_workflow-repairs
		{tasking-prefix} = /media/ssd2/Projects/Claude/1/tasking/

		Read in this order before starting:
		1. {tasking-prefix}/plan/orchestration-v2-redesign.md
		— your spec. Especially §7.1, §8.1-§8.5, §9.1-§9.2, §10.2-§10.4,
		§13 (resolutions), §14 (test-dry-run knob), §15.2 (Admin
		phase 1 verification block).
		2. {tasking-prefix}/plan/orchestration.md
		— current v1 doc; you will edit it.
		3. {tasking-prefix}/plan/Analyst-prompt.md,
		{tasking-prefix}/plan/Developer-prompt.md,
		{tasking-prefix}/plan/Integrator-prompt.md
		— current v1 prompts; you will edit each.
		4. {tasking-prefix}/plan/initialization.md
		— content becomes Administrator phase 1 subroutine; add
		deprecation banner + F1-fix probes (§15.2).
		5. {tasking-prefix}/plan/dry-run.md
		— receives F2/F4/F5 fixes per Analyst findings.
		6. {tasking-prefix}/plan/dry-run-Analyst-findings.md,
		{tasking-prefix}/plan/dry-run-Developer-findings.md,
		{tasking-prefix}/plan/dry-run-Integrator-findings.md
		— the source of F1-F5 / §3.1-§3.7 / §4.1-§4.3 fixes.
		7. {tasking-prefix}/CLAUDE.md
		8. {tasking-prefix}/../CLAUDE.md
		9. ~/.claude/CLAUDE.md (especially [ALWAYS] sections)

		Apply changes in this order; commit each as its own commit on the
		lr_reduction-new_workflow-repairs branch of the tasking submodule
		(commit-as-you-go per [ALWAYS]):

		Phase 1 — protocol doc + prompts (about 6 commits):

		1. Edit orchestration.md per §7.1 of the redesign plan.
		- §3 ref namespace: add init-check-* and admin-status-*
		- §5 naming conventions: add admin-status-<date>-<host> row
		- §6 state machine: explicit (SHA, name) dedup; ls-remote
		discovery; targeted fetch only when picking up work
		- §6.4 NEW: Administrator state machine (per §5.1 of the
		redesign plan)
		- §7 knobs: add TM, {admin-state-dir}, {admin-clone},
		{discovery-method}, {integrator-test-cmd} (§14.2);
		tighten cadence guidance: 60s default, 30s min in dry-run
		- §8 push allowlist: add Administrator's narrow allowlist;
		explicit deny on Administrator pushing protocol refs
		- §9.1 Analyst prompt: add session-setup preamble + loop-
		iteration instructions (§8.1, §8.2, §8.3 Analyst row)
		- §9.2 Developer prompt: same; plus v{N>1} feature-source
		fix (Developer findings §3.3); empty-commit convention
		(§3.4); skipped-triage.txt (§3.5)
		- §9.3 Integrator prompt: same; plus --force on tag fetch
		(Integrator findings §4.2); test-runtime budget note
		- §9.4 Initialization: REMOVE; replace with deprecation
		banner pointing to §9.6
		- §9.5 model/effort table: add Administrator row
		(Sonnet 4.6 / default; Haiku 4.5 cost-efficient)
		- §9.6 NEW Administrator: full prompt per §5.8 + §15.2
		- §9.7 NEW: statusline install/restore protocol (common
		preamble snippet; cross-referenced from §9.1-§9.3, §9.6)
		- §12 failure modes: add 4 new entries (Monitor exits / silent
		dedup / discovery rate-limited / state-file stale)
		- §13.1 runbook: Administrator-first sequence
		- §16 NEW: polling-cost analysis (per §6 of the redesign)

		2. Edit Analyst-prompt.md, Developer-prompt.md, Integrator-prompt.md
		to match orchestration.md §9.1-§9.3 (one commit per file is fine
		or one commit covering all three; pick whichever reads better
		in git log).

		3. Write {tasking-prefix}/plan/Administrator-prompt.md per
		orchestration.md §9.6 + §15.2 of the redesign plan.

		4. Edit dry-run.md with F2 (^{} peel), F4 (refspec form), F5
		(no-pipe-on-push) per Analyst findings; §3 cadence note
		(10s warning, 30s minimum in dry-run, 60s production).

		5. Edit initialization.md with §15.2 verification block + §0
		deprecation banner pointing to Administrator-prompt.md.

		6. Edit Initialization-prompt.md (existing file) to add the §0
		deprecation banner; do not delete (it remains canonical for
		the checklist content).

		Phase 2 — supporting scripts:

		7. Write ~/.claude/state/agent-statusline.sh per §4.2 of the
		redesign plan. Mode 700. mkdir -p ~/.claude/state if needed.

		8. Write {tasking-prefix}/plan/scripts/poll-wrapper-Analyst.sh,
		poll-wrapper-Developer.sh, poll-wrapper-Integrator.sh,
		poll-wrapper-Administrator.sh per §3.1 of the redesign plan.
		Mode 700. Each emits typed STALLED events on failure,
		continues polling, updates the role's state file.

		Phase 3 — Stop hook for statusline restore (use update-config skill):

		9. Use the update-config skill to add a Stop hook to
		~/.claude/settings.json that restores the statusline backup
		from ~/.claude/state/statusline-backup-<sessionId>.json on
		session end. Reference: §10.4 of the redesign plan.

		10. While in update-config: discover and document the exact
		settings key for "disable inline recap" (per §11.2
		resolution; the reviewer wants to know the exact name for
		future opening prompts). Record it in
		orchestration-v2-redesign.md as a one-line addendum to §4.4.

		[ALWAYS] reminders that apply throughout:
		- Regular merges only; no force-push, no --no-verify, no --amend
		after hook failure (Git discipline).
		- Standard merge commits; do not push without explicit user
		approval.
		- Commit-as-you-go on the tasking submodule. Match the recent
		commit-message style (see `git log --oneline -10`).
		- Robust over simple. When ambiguity arises, the redesign plan's
		text is the spec — do not improvise.
		- [TIME-BOXED] Aggressive context persistence: after each phase,
		update the redesign plan's §10 with done-vs-remaining or commit
		a small "phase X done" marker in plan/.

		Do NOT in this session:
		- Edit lr_reduction (the reviewer is handling pytest-timeout).
		- Run the dry-run (that's Phase 4, a separate session).
		- Push to origin/agentic (push stays user-initiated).
		- Bump the parent repo's gitlinks (the reviewer handles those
		manually between efforts).

		You may exit when phase 1, 2, and 3 are complete and committed,
		with a one-paragraph end-of-session summary listing the commits
		produced and any deviations from the spec.

		Standby for any clarification questions before starting; if the
		spec is unambiguous in the redesign plan, proceed.
		```

		### 16.3 Estimated wall-clock for the new session

		- Phase 1 (protocol doc + prompts): 30-60 min — substantial editing
		spread across seven files with cross-references.
		- Phase 2 (scripts): 15-30 min — three Bash wrappers (Analyst,
		Developer, Integrator are near-identical with refspec
		differences) plus the statusline command and the Administrator
		wrapper.
		- Phase 3 (Stop hook + recap-key discovery via update-config):
		10-20 min — small but new-to-this-session tooling.

		Total: ~60-110 min wall clock. The session may /clear between
		phases if context utilization gets high; the redesign plan and
		intermediate commits provide enough context for a fresh resume.

		### 16.4 What success looks like

		By end of the new session:

		- 6+ commits on `lr_reduction-new_workflow-repairs` (tasking) covering
		the seven-file edit set per phase 1 of §10.2.
		- 4 new files committed on tasking branch:
		`plan/Administrator-prompt.md`,
		`plan/scripts/poll-wrapper-Analyst.sh`,
		`plan/scripts/poll-wrapper-Developer.sh`,
		`plan/scripts/poll-wrapper-Integrator.sh`,
		`plan/scripts/poll-wrapper-Administrator.sh`.
		- 1 new file in `~/.claude/state/`: `agent-statusline.sh` (not
		committed; user-scoped).
		- 1 Stop hook entry in `~/.claude/settings.json` (not committed;
		user-scoped) — added via update-config skill.
		- Recap-suppression settings key documented in §4.4 of this
		redesign plan as a one-line addendum.
		- End-of-session summary in the conversation listing what was done
		and any open items for phase 4.

		If anything in §10.2 (commits 1-6) is incomplete or deferred, the
		new session reports it explicitly so phase 4 can plan around it.

Admin message