plan: orchestration.md v2.1 — Administrator read-only + triage cleanup + /effort fix (26e61737) · Commits · Vacaliuc, Bogdan / tasking

plan/orchestration.md

+139 −68

Original line number	Diff line number	Diff line
		@@ -86,20 +86,32 @@ lr_reduction (github.com/neutrons/LiquidsReflectometer.git)
		feature/overplot-refresh
		feature/cd-dialog-resize
		(tags)
		qa/{slug} ← Developer pushes; Integrator consumes
		review/{slug} ← Integrator pushes on test failure; Developer deletes after re-tag
		review/{slug}-escalate ← annotated; Analyst pushes at retry cap
		init-check-<YYYYMMDD-HHMMSS>-{branch,tag} ← Administrator-managed scratch refs (phase 1)
		admin-status-<ISO-date>-<host> ← Administrator pushes once per machine at phase 2 start (informational)
		qa/{slug} ← Developer pushes; Integrator consumes (deleted after consumption)
		review/{slug} ← Integrator pushes on test failure; Developer/Analyst delete after re-tag (deleted at retry)
		review/{slug}-escalate ← annotated; Analyst pushes at retry cap (terminal — never auto-deleted)
		```

		Administrator-managed scratch namespace. `init-check-*` refs are
		created and deleted by the Administrator agent during phase 1 (the
		former Initialization agent's work). They never persist past phase 1
		end. The `admin-status-<ISO-date>-<host>` lightweight tag is an
		optional Administrator push at phase 2 start, used by multi-machine
		deployments to discover peer Administrators; the 3-agent system does
		not read it.
		End-state contract (v2.1 — minimize ref detritus). After a
		fully successful run, for every slug in the issue set, the remote
		contains exactly one `feature/{slug}` branch plus either an open
		PR/MR pointing to it or a `review/{slug}-escalate` annotated tag.
		No transient refs remain: every `qa/{slug}` is consumed and
		deleted by the Integrator; every non-escalate `review/{slug}` is
		consumed and deleted by the Developer (on retry) or the Analyst (at
		retry cap, replaced by the escalate tag); every `triage/{slug}*`
		branch is deleted by the Developer after merging into the analysis
		branch. The `analysis/{effort-name}` workspace branch is preserved
		as the historical record of plans + learnings; it is never merged
		to `{base-branch}` and may be deleted post-effort at the user's
		discretion.

		Administrator is read-only on the remote (v2.1). The
		Administrator agent does not push, fetch, delete, tag, or otherwise
		modify any ref on `{remote}`. Phase 1 verification of write
		capability (initialization.md §2.5) is performed either by the user
		manually or by the v1 standalone `Initialization-prompt.md`
		fallback session, which retains its narrow `init-check-*` scratch
		namespace.

		Base branch for feature branches: `{base-branch}` — see §7 for
		the knob. For this effort, `{base-branch}` is `new_workflow_ui_plan`
		@@ -160,8 +172,7 @@ branch for the `*-learning.md` pile.
		\| `qa/{slug}` \| Developer pushes tag \| "Ready for QA" signal \|
		\| `review/{slug}` \| Integrator pushes tag \| "Tests failed; read todo.md on feature/{slug}" \|
		\| `review/{slug}-escalate` \| Analyst pushes (annotated) tag \| Human attention required; retry cap hit \|
		\| `init-check-<YYYYMMDD-HHMMSS>-*` \| Administrator creates (phase 1) \| Scratch refs for the Initialization checklist; deleted at phase-1 cleanup \|
		\| `admin-status-<ISO-date>-<host>` \| Administrator pushes (lightweight tag) \| Optional informational marker that machine `<host>` has an Administrator running on `<ISO-date>`; one push at phase 2 start \|
		\| `init-check-<YYYYMMDD-HHMMSS>-` \| v1 fallback only: standalone Initialization session creates (and deletes) \| Scratch refs for the §2.5 write-capability check; NOT used by Administrator* in v2.1 (Administrator is read-only — see §3) \|

		`{effort-name}` for this effort is `new_workflow-repairs-2026-04`.

		@@ -259,6 +270,8 @@ that the v1 protocol used is no longer the default — see §7's
		│ if cross-project learnings found: │
		│ write plans/{slug}-learning.md │
		│ push analysis/{effort} │
		│ git push --delete {remote} │
		│ triage/{slug}[-v{N}] (v2.1 cleanup)│
		│ context lifecycle: CLEAR │
		└───────────────────────────────────────────┘

		@@ -302,11 +315,17 @@ poll for its liveness, and do not depend on its push allowlist).
		```
		┌──────────────── Administrator ────────────┐
		│ │
		user starts ─────┼── PHASE 1 — Initialization │
		│ Walk initialization.md §1-§10 in order. │
		│ Use a timestamped scratch ref namespace │
		│ init-check-<YYYYMMDD-HHMMSS> │
		│ for any push tests. Clean up at end. │
		user starts ─────┼── PHASE 1 — Initialization (READ-ONLY v2.1) │
		│ Walk initialization.md §1, §2.1-§2.4, │
		│ §3-§11 — read-only checks only. │
		│ §2.5 (write-capability verification) is │
		│ NOT done here in v2.1 — Administrator │
		│ does not push. The user runs §2.5 │
		│ manually or launches the v1 standalone│
		│ Initialization-prompt.md session for │
		│ it. Phase 1 asks the user to confirm │
		│ write capability has been verified │
		│ (last 24 h or since cred rotation). │
		│ If DRY_RUN=1 also walk dry-run.md §5.1. │
		│ If any section fails, STOP — hand the │
		│ user a specific action item; do NOT │
		@@ -315,8 +334,8 @@ poll for its liveness, and do not depend on its push allowlist).
		│ Ready to launch worker agents." │
		│ │
		user confirms ───┼── PHASE 2 — Active monitoring │
		worker startup │ (Optional: push admin-status-<ISO>-<host>│
		│ lightweight tag once at phase 2 start.)│
		worker startup │ (No push at phase 2 start — v2.1. │
		│ Administrator is read-only.) │
		│ │
		poll (TM secs) ──┼── Read {admin-state-dir}/agent-state- │
		│ {Analyst,Developer,Integrator}.json │
		@@ -342,13 +361,17 @@ poll for its liveness, and do not depend on its push allowlist).
		│ large reads) unless user asks. │
		│ Resume the poll loop after answering. │
		│ │
		│ Administrator NEVER: │
		│ Administrator NEVER (v2.1 — read-only): │
		│ - restarts a stalled worker │
		│ - rewrites worker state files │
		│ - opens PRs/MRs │
		│ - pushes any protocol ref │
		│ (analysis/, triage/, feature/, qa/, │
		│ review/, {base-branch}) │
		│ - pushes ANY ref to {remote} │
		│ (no protocol refs, no init-check-*, │
		│ no admin-status-*, no informational │
		│ tags — Administrator is read-only on │
		│ the repository per §3 v2.1 contract) │
		│ - deletes any ref on {remote} │
		│ - fetches objects (cheap ls-remote only)│
		└───────────────────────────────────────────┘
		```

		@@ -467,45 +490,71 @@ user approval.
		Allowlisted ref patterns (auto-push, no prompt):

		- `analysis/new_workflow-repairs-2026-04` (Analyst, Developer)
		- `triage/{slug}` and `triage/{slug}-v{N}` (Analyst)
		- `feature/{slug}` (Developer)
		- `triage/{slug}` and `triage/{slug}-v{N}` (Analyst — push)
		- Branch deletion of `triage/{slug}` and `triage/{slug}-v{N}`
		(Developer — after merging into the analysis branch; v2.1 ref
		cleanup so triage signals don't accumulate post-success)
		- `feature/{slug}` (Developer; Integrator may push a `todo.md`
		commit on test failure)
		- `qa/{slug}` tag (Developer)
		- Tag deletion of `qa/{slug}` (Integrator after consume)
		- `review/{slug}` tag (Integrator)
		- `review/{slug}-escalate` annotated tag (Analyst)
		- Tag deletions matching any of the above (Developer deletes `review/`;
		Integrator deletes `qa/`; Analyst deletes `review/` before re-plan)
		- `init-check-<YYYYMMDD-HHMMSS>-*` branches and tags
		(Administrator only, phase 1) — including their deletions at
		phase-1 cleanup
		- `admin-status-<ISO-date>-<host>` lightweight tag
		(Administrator only, optional one-shot push at phase 2 start)

		Explicit Administrator denials. The Administrator never pushes
		any of the protocol refs above (`analysis/`, `triage/`, `feature/`,
		`qa/`, `review/`, `{base-branch}`). Its allowlist is narrowly scoped
		to scratch-namespace and informational refs only. If a runbook step
		appears to require the Administrator to push a protocol ref, that
		- Tag deletion of `review/{slug}` (Developer after re-tag,
		Analyst before re-plan or before pushing the escalate tag)
		- `review/{slug}-escalate` annotated tag (Analyst — terminal,
		never auto-deleted)

		**v1-fallback allowlist (Initialization-prompt.md only, NOT
		Administrator):**

		- `init-check-<YYYYMMDD-HHMMSS>-*` branches and tags, plus their
		deletions, are pushed by the v1 standalone Initialization session
		for the §2.5 write-capability check. The Administrator agent in
		v2.1 does not use this namespace.

		Administrator (v2.1): no push allowlist. The Administrator
		agent is read-only on `{remote}`. It never pushes, deletes, fetches
		objects, or otherwise modifies any ref. Its only network operation
		on `{remote}` is `git ls-remote` (cheap, read-only). If a runbook
		step appears to require Administrator to write to the remote, that
		is a bug in the runbook — escalate to the user.

		Still requires explicit user confirmation:

		- Creating a PR (GitHub) or MR (GitLab) — the one moment work
		becomes visible upstream. The exact command depends on the
		platform and the CLIs installed; see
		`plan/initialization.md` §7 for
		the `gh` / `glab` / REST-API selector. First PR/MR creation in a
		session prompts; subsequent ones in the same session may reuse
		that approval.
		- Any push to `{base-branch}`, `main`, `next`, `qa` (the upstream
		branch of that name), or any other non-protocol ref.
		- Creating a PR (GitHub) or MR (GitLab) — see initialization.md §7.
		First PR/MR creation in an Integrator session prompts; subsequent
		ones reuse that approval for the same session.
		- Any push to `{base-branch}`, `main`, `next`, or any other
		non-protocol ref.
		- Force-pushes of any kind (the protocol never uses them).
		- Parent-repo (`Claude`) gitlink bumps — the user handles those
		manually between efforts.
		- Parent-repo (`Claude`) gitlink bumps — manual, by the user.

		End-state contract (v2.1 — restated for §8 readers). A
		fully-successful run leaves the remote in this state, *for every
		slug*:

		1. `feature/{slug}` branch present (PR basis).
		2. Either an open PR/MR pointing to `feature/{slug}`,
		or a `review/{slug}-escalate` annotated tag.
		3. No `qa/{slug}` tag (consumed and deleted).
		4. No non-escalate `review/{slug}` tag (consumed and deleted).
		5. No `triage/{slug}*` branch (deleted by Developer post-merge).
		6. `analysis/{effort-name}` branch retained (durable plan +
		learnings record; never merged to `{base-branch}`).

		A `cleanup-dry-run-refs.sh` helper (in `plan/scripts/`) lists every
		ref matching `{dry-run-prefix}` on `{remote}` and, with `--apply`,
		deletes them — for human-driven cleanup after dry runs (the
		agents' `--delete` pushes during normal operation also keep state
		clean, but the helper covers abnormal exits and the analysis
		branch).

		How a role cites authorization in its tool use: whenever a role
		does `git push`, it notes in its text update the rule being relied
		upon, e.g. "pushing `qa/overplot-axes` per §8 allowlist." This makes
		the standing order auditable in the transcript.
		does `git push` (including `--delete`), it notes in its text update
		the rule being relied upon, e.g. "pushing `qa/overplot-axes` per §8
		allowlist" or "deleting `triage/overplot-axes-v2` per §8 v2.1
		post-merge cleanup." This makes the standing order auditable in the
		transcript.

		---

		@@ -678,7 +727,7 @@ You may exit your poll loop only on explicit ESC by the user.
		Before pasting this prompt, set the session's model and effort:
		```
		/model claude-opus-4-7
		/effort default
		/effort xhigh
		```
		(Robust default per §9.5. Cost-efficient alternative: Sonnet 4.6
		at /effort max — accept a small uptick in test-failure retries.
		@@ -815,7 +864,15 @@ Every TD seconds:
		plans/{slug}-learning.md
		on the analysis branch. Structure: rule → Why → How to apply.
		Commit it. Push the analysis branch to {remote}.
		6. Update agent-state-Developer.json with lastAction,
		6. Delete the triage branch from {remote} (v2.1 cleanup —
		minimize ref detritus per §8 end-state contract):
		if git push --delete {remote} triage/{slug}[-v{N}]; then : ; else
		# If the delete fails (already gone, network blip),
		# don't lose progress. Log it; the cleanup-dry-run-refs.sh
		# helper picks up stragglers later.
		fi
		git branch -D triage/{slug}[-v{N}] 2>/dev/null \|\| true
		7. Update agent-state-Developer.json with lastAction,
		lastActionTs. Context lifecycle: /clear, then ScheduleWakeup
		(prompt="/poll", delaySeconds=5*TD) and return to the poll
		loop.
		@@ -825,6 +882,7 @@ without asking, per §8 of the meta-plan:
		feature/{slug}
		qa/{slug} tag
		analysis/new_workflow-repairs-2026-04
		--delete on triage/{slug}[-v{N}] (post-merge cleanup)
		Any other push still requires explicit user approval.

		For dry-run mode (DRY_RUN=1 with {dry-run-prefix}, {dry-run-remote},
		@@ -839,7 +897,7 @@ Exit the loop only on explicit ESC by the user.
		Before pasting this prompt, set the session's model and effort:
		```
		/model claude-opus-4-7
		/effort default
		/effort xhigh
		```
		(Robust default per §9.5. Bump to /effort max if Analyst keeps
		re-deriving failure modes — that's a sign the todo.md hypothesis
		@@ -1033,11 +1091,23 @@ role's prompt into its session.

		\| Role \| Robust default \| Cost-efficient alternative \| Why \|
		\|---\|---\|---\|---\|
		\| Initialization (deprecated; see Administrator phase 1) \| Sonnet 4.6 / `default` \| Haiku 4.5 / `default` \| Procedural checklist with occasional diagnostic suggestions when a section fails. Sonnet reads tool stderr well and offers actionable fixes. Haiku is fine for the happy path but weaker on diagnosing why something failed. \|
		\| Administrator \| Sonnet 4.6 / `default` \| Haiku 4.5 / `default` \| Phase 1 is the procedural Initialization checklist (Sonnet's strength). Phase 2 is small-summary aggregation across four state files plus a single ls-remote — Sonnet handles the "is anything stalled?" synthesis cheaply. Haiku is viable when phase-1 diagnostics aren't expected to fire (stable env). Do NOT bump to Opus: this role is read-only on workers; reasoning depth is not the bottleneck. \|
		\| Initialization (deprecated; v1 fallback only — Administrator phase 1 covers most checks read-only) \| Sonnet 4.6 / `medium` \| Haiku 4.5 / `medium` \| Procedural checklist with occasional diagnostic suggestions when a section fails. Sonnet reads tool stderr well and offers actionable fixes. Haiku is fine for the happy path but weaker on diagnosing why something failed. \|
		\| Administrator \| Sonnet 4.6 / `medium` \| Haiku 4.5 / `medium` \| Phase 1 is the procedural Initialization checklist (Sonnet's strength). Phase 2 is small-summary aggregation across four state files plus a single ls-remote — Sonnet handles the "is anything stalled?" synthesis cheaply. Haiku is viable when phase-1 diagnostics aren't expected to fire (stable env). Do NOT bump to Opus: this role is read-only on the repo; reasoning depth is not the bottleneck. \|
		\| Analyst \| Opus 4.7 / `max` \| — do not downgrade — \| Plan-quality is the highest-leverage variable in the protocol. Every Analyst error propagates to Developer + Integrator cycles; the retry math (see below) makes max-effort Opus the cheap option in expectation. \|
		\| Developer \| Opus 4.7 / `default` \| Sonnet 4.6 / `max` \| Implementation under a clear spec. Opus reduces test-failure retries; Sonnet at max is competent for spec-driven implementation but produces a small uptick in retry rate. \|
		\| Integrator \| Opus 4.7 / `default` \| Sonnet 4.6 / `default` \| Test-failure-diagnosis quality (the todo.md hypothesis ranking) determines how easy the Analyst's retry will be. Opus produces sharper hypotheses; Sonnet is fine for happy-path PR/MR creation and weaker on failure ranking. \|
		\| Developer \| Opus 4.7 / `xhigh` \| Sonnet 4.6 / `max` \| Implementation under a clear spec. Opus reduces test-failure retries; Sonnet at max is competent for spec-driven implementation but produces a small uptick in retry rate. \|
		\| Integrator \| Opus 4.7 / `xhigh` \| Sonnet 4.6 / `high` \| Test-failure-diagnosis quality (the todo.md hypothesis ranking) determines how easy the Analyst's retry will be. Opus produces sharper hypotheses; Sonnet is fine for happy-path PR/MR creation and weaker on failure ranking. \|

		Effort levels. Valid CLI options for `/effort` are
		`low \| medium \| high \| xhigh \| max \| auto`. The v1 documentation
		used `default` colloquially for "the natural balanced setting" —
		that is not a valid CLI option and is replaced above with
		explicit values. `max` is reserved for the role where reasoning
		depth has the highest leverage (Analyst). `xhigh` is the
		production-quality setting for spec-driven implementation
		(Developer, Integrator). `medium` is the procedural-checklist
		setting (Administrator, deprecated Initialization). `auto` lets the
		runtime pick — useful when context length varies significantly
		across cycles.

		Why max-effort on the Analyst is the cheap option. The protocol
		has a retry cap `N=3`. Each retry costs:
		@@ -1051,7 +1121,7 @@ cycles** of compute across the three roles, plus user-visible
		latency over multiple poll periods. A better-reasoned plan that
		takes zero retries costs 2 cycles. Spending more on the
		Analyst's reasoning has a multiplier effect on total cost — the
		expected cost of `max` effort is lower than `default` whenever it
		expected cost of `max` effort is lower than `xhigh` whenever it
		prevents even one retry.

		Tuning tip — `/fast` mode. Fast mode runs Opus 4.6 with faster
		@@ -1071,23 +1141,24 @@ of these for several cycles:
		(Analyst keeps having to re-derive the failure mode) → bump
		Integrator to `/effort max`.
		- Analyst takes > 5 min per slug to draft an initial plan → check
		if `max` is producing analysis paralysis; consider `default`
		if `max` is producing analysis paralysis; consider `xhigh`
		effort with a tighter prompt.
		- Initialization checklist takes > 2 min on a clean environment →
		consider Haiku 4.5; the work is procedural enough.
		- Initialization checklist (Administrator phase 1 or v1 fallback)
		takes > 2 min on a clean environment → consider Haiku 4.5; the
		work is procedural enough.

		### 9.6 Administrator — paste into <path-to-session-4>

		Before pasting this prompt, set the session's model and effort:
		```
		/model claude-sonnet-4-6
		/effort default
		/effort medium
		```
		(Robust default per §9.5. Phase 1 is procedural verification with
		diagnostic suggestions on failure; phase 2 is small-summary
		aggregation. Haiku 4.5 is a viable cost-efficient alternative if
		the env is stable. Do NOT bump to Opus — Administrator is
		read-only on workers; reasoning depth is not the bottleneck.)
		read-only on the repo; reasoning depth is not the bottleneck.)

		The full Administrator prompt body (configuration, phase 1
		checklist invocation including the §15.2 verification block, phase