Commit 26e61737 authored by Vacaliuc, Bogdan's avatar Vacaliuc, Bogdan
Browse files

plan: orchestration.md v2.1 — Administrator read-only + triage cleanup + /effort fix



Per reviewer feedback after the v2 implementation pass:

§3 (ref namespace, end-state contract, Administrator scope):
  - Drop init-check-* and admin-status-* from the canonical namespace
    diagram (Administrator no longer pushes them).
  - Add explicit "End-state contract": after success, every issue
    has exactly one feature/{slug} + (PR or escalate tag); no
    transient qa/, review/, or triage/ refs remain; analysis branch
    preserved as historical record.
  - Add explicit "Administrator is read-only on the remote" clause.
    Write-capability check (initialization.md §2.5) is user-manual
    or via the v1 Initialization-prompt.md fallback.

§5 (naming conventions): drop admin-status row; mark init-check-*
as v1-fallback only.

§6 (Developer state machine): add `git push --delete {remote}
triage/{slug}[-v{N}]` after merging into analysis (v2.1 cleanup).

§6.4 (Administrator state machine): phase 1 explicit READ-ONLY;
remove the optional admin-status push at phase 2 start; expand the
NEVER block to include "no push, no delete, no fetch (cheap
ls-remote only)".

§8 (push allowlist): rewritten for v2.1.
  - Add "Branch deletion of triage/{slug}*" to Developer's allowlist.
  - Move init-check-* into a separate "v1-fallback allowlist" section
    explicitly NOT used by Administrator.
  - Replace "Administrator allowlist" with "no push allowlist —
    Administrator is read-only" denial block.
  - Restate the end-state contract for §8 readers (six bullets).
  - Reference cleanup-dry-run-refs.sh for human-driven cleanup of
    abnormal-exit residue.

§9.1 / §9.2 / §9.3 / §9.6 pre-prompts: replace `/effort default`
(invalid CLI option) with valid options:
  - Developer: xhigh (matches existing Developer-prompt.md).
  - Integrator: xhigh (matches existing Integrator-prompt.md).
  - Administrator: medium.

§9.5 model/effort table: replace every `default` cell with valid
CLI options (medium for Sonnet-procedural, xhigh for Opus-spec-
driven). Add "Effort levels" paragraph documenting the valid CLI
set (low | medium | high | xhigh | max | auto) and mapping to
v1's colloquial `default`.

§9.5 explanatory text: update `default`-vs-`max` examples to use
valid options (`xhigh`).

§9.2 prompt body: add step 6 (push --delete triage post-merge);
extend allowlist to include the delete.

Removed a duplicate "Still requires explicit user confirmation" /
"How a role cites authorization" block that was leftover from the
v1 §8 (now consolidated into the v2.1 §8 rewrite).

Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
parent 55b0f71f
Loading
Loading
Loading
Loading
+139 −68
Original line number Diff line number Diff line
@@ -86,20 +86,32 @@ lr_reduction (github.com/neutrons/LiquidsReflectometer.git)
  feature/overplot-refresh
  feature/cd-dialog-resize
  (tags)
  qa/{slug}                            ← Developer pushes; Integrator consumes
  review/{slug}                        ← Integrator pushes on test failure; Developer deletes after re-tag
  review/{slug}-escalate               ← annotated; Analyst pushes at retry cap
  init-check-<YYYYMMDD-HHMMSS>-{branch,tag} ← Administrator-managed scratch refs (phase 1)
  admin-status-<ISO-date>-<host>       ← Administrator pushes once per machine at phase 2 start (informational)
  qa/{slug}                            ← Developer pushes; Integrator consumes (deleted after consumption)
  review/{slug}                        ← Integrator pushes on test failure; Developer/Analyst delete after re-tag (deleted at retry)
  review/{slug}-escalate               ← annotated; Analyst pushes at retry cap (terminal — never auto-deleted)
```

**Administrator-managed scratch namespace.** `init-check-*` refs are
created and deleted by the Administrator agent during phase 1 (the
former Initialization agent's work). They never persist past phase 1
end. The `admin-status-<ISO-date>-<host>` lightweight tag is an
optional Administrator push at phase 2 start, used by multi-machine
deployments to discover peer Administrators; the 3-agent system does
not read it.
**End-state contract (v2.1 — minimize ref detritus).** After a
fully successful run, for *every* slug in the issue set, the remote
contains exactly one `feature/{slug}` branch plus *either* an open
PR/MR pointing to it *or* a `review/{slug}-escalate` annotated tag.
**No transient refs remain**: every `qa/{slug}` is consumed and
deleted by the Integrator; every non-escalate `review/{slug}` is
consumed and deleted by the Developer (on retry) or the Analyst (at
retry cap, replaced by the escalate tag); every `triage/{slug}*`
branch is deleted by the Developer after merging into the analysis
branch. The `analysis/{effort-name}` workspace branch is preserved
as the historical record of plans + learnings; it is never merged
to `{base-branch}` and may be deleted post-effort at the user's
discretion.

**Administrator is read-only on the remote (v2.1).** The
Administrator agent does not push, fetch, delete, tag, or otherwise
modify any ref on `{remote}`. Phase 1 verification of write
capability (initialization.md §2.5) is performed either by the user
manually or by the v1 standalone `Initialization-prompt.md`
fallback session, which retains its narrow `init-check-*` scratch
namespace.

**Base branch for feature branches:** `{base-branch}` — see §7 for
the knob. For **this** effort, `{base-branch}` is `new_workflow_ui_plan`
@@ -160,8 +172,7 @@ branch for the `*-learning.md` pile.
| `qa/{slug}` | Developer pushes tag | "Ready for QA" signal |
| `review/{slug}` | Integrator pushes tag | "Tests failed; read todo.md on feature/{slug}" |
| `review/{slug}-escalate` | Analyst pushes (**annotated**) tag | Human attention required; retry cap hit |
| `init-check-<YYYYMMDD-HHMMSS>-*` | Administrator creates (phase 1) | Scratch refs for the Initialization checklist; deleted at phase-1 cleanup |
| `admin-status-<ISO-date>-<host>` | Administrator pushes (lightweight tag) | Optional informational marker that machine `<host>` has an Administrator running on `<ISO-date>`; one push at phase 2 start |
| `init-check-<YYYYMMDD-HHMMSS>-*` | **v1 fallback only**: standalone Initialization session creates (and deletes) | Scratch refs for the §2.5 write-capability check; **NOT used by Administrator** in v2.1 (Administrator is read-only — see §3) |

`{effort-name}` for this effort is `new_workflow-repairs-2026-04`.

@@ -259,6 +270,8 @@ that the v1 protocol used is no longer the default — see §7's
                  │     if cross-project learnings found:     │
                  │       write plans/{slug}-learning.md      │
                  │     push analysis/{effort}                │
                  │     git push --delete {remote}            │
                  │       triage/{slug}[-v{N}]  (v2.1 cleanup)│
                  │     context lifecycle: CLEAR              │
                  └───────────────────────────────────────────┘

@@ -302,11 +315,17 @@ poll for its liveness, and do not depend on its push allowlist).
```
                  ┌──────────────── Administrator ────────────┐
                  │                                           │
 user starts ─────┼── PHASE 1 — Initialization                │
                  │   Walk initialization.md §1-§10 in order. │
                  │   Use a timestamped scratch ref namespace │
                  │     init-check-<YYYYMMDD-HHMMSS>          │
                  │     for any push tests. Clean up at end.  │
 user starts ─────┼── PHASE 1 — Initialization (READ-ONLY v2.1) │
                  │   Walk initialization.md §1, §2.1-§2.4,   │
                  │     §3-§11 — read-only checks only.       │
                  │   §2.5 (write-capability verification) is │
                  │     NOT done here in v2.1 — Administrator │
                  │     does not push. The user runs §2.5     │
                  │     manually or launches the v1 standalone│
                  │     Initialization-prompt.md session for  │
                  │     it. Phase 1 asks the user to confirm  │
                  │     write capability has been verified    │
                  │     (last 24 h or since cred rotation).   │
                  │   If DRY_RUN=1 also walk dry-run.md §5.1. │
                  │   If any section fails, STOP — hand the   │
                  │     user a specific action item; do NOT   │
@@ -315,8 +334,8 @@ poll for its liveness, and do not depend on its push allowlist).
                  │     Ready to launch worker agents."       │
                  │                                           │
 user confirms ───┼── PHASE 2 — Active monitoring             │
 worker startup   │   (Optional: push admin-status-<ISO>-<host>
lightweight tag once at phase 2 start.)
 worker startup   │   (No push at phase 2 start — v2.1.       
Administrator is read-only.)           
                  │                                           │
 poll (TM secs) ──┼── Read {admin-state-dir}/agent-state-     │
                  │     {Analyst,Developer,Integrator}.json   │
@@ -342,13 +361,17 @@ poll for its liveness, and do not depend on its push allowlist).
                  │   large reads) unless user asks.          │
                  │   Resume the poll loop after answering.   │
                  │                                           │
                  │ Administrator NEVER:                   
                  │ Administrator NEVER (v2.1 — read-only):
                  │   - restarts a stalled worker             │
                  │   - rewrites worker state files           │
                  │   - opens PRs/MRs                         │
                  │   - pushes any protocol ref               │
                  │     (analysis/, triage/, feature/, qa/,   │
                  │      review/, {base-branch})              │
                  │   - pushes ANY ref to {remote}            │
                  │     (no protocol refs, no init-check-*,   │
                  │      no admin-status-*, no informational  │
                  │      tags — Administrator is read-only on │
                  │      the repository per §3 v2.1 contract) │
                  │   - deletes any ref on {remote}           │
                  │   - fetches objects (cheap ls-remote only)│
                  └───────────────────────────────────────────┘
```

@@ -467,45 +490,71 @@ user approval.
**Allowlisted ref patterns (auto-push, no prompt):**

- `analysis/new_workflow-repairs-2026-04` (Analyst, Developer)
- `triage/{slug}` and `triage/{slug}-v{N}` (Analyst)
- `feature/{slug}` (Developer)
- `triage/{slug}` and `triage/{slug}-v{N}` (Analyst — push)
- **Branch deletion of `triage/{slug}` and `triage/{slug}-v{N}`**
  (Developer — after merging into the analysis branch; v2.1 ref
  cleanup so triage signals don't accumulate post-success)
- `feature/{slug}` (Developer; Integrator may push a `todo.md`
  commit on test failure)
- `qa/{slug}` tag (Developer)
- **Tag deletion of `qa/{slug}`** (Integrator after consume)
- `review/{slug}` tag (Integrator)
- `review/{slug}-escalate` annotated tag (Analyst)
- Tag deletions matching any of the above (Developer deletes `review/`;
  Integrator deletes `qa/`; Analyst deletes `review/` before re-plan)
- `init-check-<YYYYMMDD-HHMMSS>-*` branches and tags
  (**Administrator only**, phase 1) — including their deletions at
  phase-1 cleanup
- `admin-status-<ISO-date>-<host>` lightweight tag
  (**Administrator only**, optional one-shot push at phase 2 start)

**Explicit Administrator denials.** The Administrator never pushes
any of the protocol refs above (`analysis/`, `triage/`, `feature/`,
`qa/`, `review/`, `{base-branch}`). Its allowlist is narrowly scoped
to scratch-namespace and informational refs only. If a runbook step
appears to require the Administrator to push a protocol ref, that
- **Tag deletion of `review/{slug}`** (Developer after re-tag,
  Analyst before re-plan or before pushing the escalate tag)
- `review/{slug}-escalate` annotated tag (Analyst — terminal,
  never auto-deleted)

**v1-fallback allowlist (Initialization-prompt.md only, NOT
Administrator):**

- `init-check-<YYYYMMDD-HHMMSS>-*` branches and tags, plus their
  deletions, are pushed by the v1 standalone Initialization session
  for the §2.5 write-capability check. The Administrator agent in
  v2.1 does **not** use this namespace.

**Administrator (v2.1): no push allowlist.** The Administrator
agent is read-only on `{remote}`. It never pushes, deletes, fetches
objects, or otherwise modifies any ref. Its only network operation
on `{remote}` is `git ls-remote` (cheap, read-only). If a runbook
step appears to require Administrator to write to the remote, that
is a bug in the runbook — escalate to the user.

**Still requires explicit user confirmation:**

- Creating a PR (GitHub) or MR (GitLab) — the one moment work
  becomes visible upstream. The exact command depends on the
  platform and the CLIs installed; see
  `plan/initialization.md` §7 for
  the `gh` / `glab` / REST-API selector. First PR/MR creation in a
  session prompts; subsequent ones in the same session may reuse
  that approval.
- Any push to `{base-branch}`, `main`, `next`, `qa` (the upstream
  branch of that name), or any other non-protocol ref.
- Creating a PR (GitHub) or MR (GitLab) — see initialization.md §7.
  First PR/MR creation in an Integrator session prompts; subsequent
  ones reuse that approval for the same session.
- Any push to `{base-branch}`, `main`, `next`, or any other
  non-protocol ref.
- Force-pushes of any kind (the protocol never uses them).
- Parent-repo (`Claude`) gitlink bumps — the user handles those
  manually between efforts.
- Parent-repo (`Claude`) gitlink bumps — manual, by the user.

**End-state contract (v2.1 — restated for §8 readers).** A
fully-successful run leaves the remote in this state, *for every
slug*:

1. `feature/{slug}` branch present (PR basis).
2. *Either* an open PR/MR pointing to `feature/{slug}`,
   *or* a `review/{slug}-escalate` annotated tag.
3. No `qa/{slug}` tag (consumed and deleted).
4. No non-escalate `review/{slug}` tag (consumed and deleted).
5. No `triage/{slug}*` branch (deleted by Developer post-merge).
6. `analysis/{effort-name}` branch retained (durable plan +
   learnings record; never merged to `{base-branch}`).

A `cleanup-dry-run-refs.sh` helper (in `plan/scripts/`) lists every
ref matching `{dry-run-prefix}` on `{remote}` and, with `--apply`,
deletes them — for human-driven cleanup after dry runs (the
agents' `--delete` pushes during normal operation also keep state
clean, but the helper covers abnormal exits and the analysis
branch).

**How a role cites authorization in its tool use:** whenever a role
does `git push`, it notes in its text update the rule being relied
upon, e.g. "pushing `qa/overplot-axes` per §8 allowlist." This makes
the standing order auditable in the transcript.
does `git push` (including `--delete`), it notes in its text update
the rule being relied upon, e.g. "pushing `qa/overplot-axes` per §8
allowlist" or "deleting `triage/overplot-axes-v2` per §8 v2.1
post-merge cleanup." This makes the standing order auditable in the
transcript.

---

@@ -678,7 +727,7 @@ You may exit your poll loop only on explicit ESC by the user.
Before pasting this prompt, set the session's model and effort:
```
  /model claude-opus-4-7
  /effort default
  /effort xhigh
```
  (Robust default per §9.5. Cost-efficient alternative: Sonnet 4.6
   at /effort max — accept a small uptick in test-failure retries.
@@ -815,7 +864,15 @@ Every TD seconds:
         plans/{slug}-learning.md
       on the analysis branch. Structure: rule → Why → How to apply.
       Commit it. Push the analysis branch to {remote}.
    6. Update agent-state-Developer.json with lastAction,
    6. Delete the triage branch from {remote} (v2.1 cleanup —
       minimize ref detritus per §8 end-state contract):
         if git push --delete {remote} triage/{slug}[-v{N}]; then : ; else
           # If the delete fails (already gone, network blip),
           # don't lose progress. Log it; the cleanup-dry-run-refs.sh
           # helper picks up stragglers later.
         fi
         git branch -D triage/{slug}[-v{N}] 2>/dev/null || true
    7. Update agent-state-Developer.json with lastAction,
       lastActionTs. Context lifecycle: /clear, then ScheduleWakeup
       (prompt="/poll", delaySeconds=5*TD) and return to the poll
       loop.
@@ -825,6 +882,7 @@ without asking, per §8 of the meta-plan:
  feature/{slug}
  qa/{slug} tag
  analysis/new_workflow-repairs-2026-04
  --delete on triage/{slug}[-v{N}] (post-merge cleanup)
Any other push still requires explicit user approval.

For dry-run mode (DRY_RUN=1 with {dry-run-prefix}, {dry-run-remote},
@@ -839,7 +897,7 @@ Exit the loop only on explicit ESC by the user.
Before pasting this prompt, set the session's model and effort:
```
  /model claude-opus-4-7
  /effort default
  /effort xhigh
```
  (Robust default per §9.5. Bump to /effort max if Analyst keeps
   re-deriving failure modes — that's a sign the todo.md hypothesis
@@ -1033,11 +1091,23 @@ role's prompt into its session.

| Role | Robust default | Cost-efficient alternative | Why |
|---|---|---|---|
| Initialization (deprecated; see Administrator phase 1) | Sonnet 4.6 / `default` | Haiku 4.5 / `default` | Procedural checklist with occasional diagnostic suggestions when a section fails. Sonnet reads tool stderr well and offers actionable fixes. Haiku is fine for the happy path but weaker on diagnosing why something failed. |
| **Administrator** | **Sonnet 4.6 / `default`** | **Haiku 4.5 / `default`** | Phase 1 is the procedural Initialization checklist (Sonnet's strength). Phase 2 is small-summary aggregation across four state files plus a single ls-remote — Sonnet handles the "is anything stalled?" synthesis cheaply. Haiku is viable when phase-1 diagnostics aren't expected to fire (stable env). Do NOT bump to Opus: this role is read-only on workers; reasoning depth is not the bottleneck. |
| Initialization (deprecated; v1 fallback only — Administrator phase 1 covers most checks read-only) | Sonnet 4.6 / `medium` | Haiku 4.5 / `medium` | Procedural checklist with occasional diagnostic suggestions when a section fails. Sonnet reads tool stderr well and offers actionable fixes. Haiku is fine for the happy path but weaker on diagnosing why something failed. |
| **Administrator** | **Sonnet 4.6 / `medium`** | **Haiku 4.5 / `medium`** | Phase 1 is the procedural Initialization checklist (Sonnet's strength). Phase 2 is small-summary aggregation across four state files plus a single ls-remote — Sonnet handles the "is anything stalled?" synthesis cheaply. Haiku is viable when phase-1 diagnostics aren't expected to fire (stable env). Do NOT bump to Opus: this role is read-only on the repo; reasoning depth is not the bottleneck. |
| **Analyst** | **Opus 4.7 / `max`** | **— do not downgrade —** | Plan-quality is the highest-leverage variable in the protocol. Every Analyst error propagates to Developer + Integrator cycles; the retry math (see below) makes max-effort Opus the cheap option in expectation. |
| Developer | Opus 4.7 / `default` | Sonnet 4.6 / `max` | Implementation under a clear spec. Opus reduces test-failure retries; Sonnet at max is competent for spec-driven implementation but produces a small uptick in retry rate. |
| Integrator | Opus 4.7 / `default` | Sonnet 4.6 / `default` | Test-failure-diagnosis quality (the todo.md hypothesis ranking) determines how easy the Analyst's retry will be. Opus produces sharper hypotheses; Sonnet is fine for happy-path PR/MR creation and weaker on failure ranking. |
| Developer | Opus 4.7 / `xhigh` | Sonnet 4.6 / `max` | Implementation under a clear spec. Opus reduces test-failure retries; Sonnet at max is competent for spec-driven implementation but produces a small uptick in retry rate. |
| Integrator | Opus 4.7 / `xhigh` | Sonnet 4.6 / `high` | Test-failure-diagnosis quality (the todo.md hypothesis ranking) determines how easy the Analyst's retry will be. Opus produces sharper hypotheses; Sonnet is fine for happy-path PR/MR creation and weaker on failure ranking. |

**Effort levels.** Valid CLI options for `/effort` are
`low | medium | high | xhigh | max | auto`. The v1 documentation
used `default` colloquially for "the natural balanced setting" —
that is **not a valid CLI option** and is replaced above with
explicit values. `max` is reserved for the role where reasoning
depth has the highest leverage (Analyst). `xhigh` is the
production-quality setting for spec-driven implementation
(Developer, Integrator). `medium` is the procedural-checklist
setting (Administrator, deprecated Initialization). `auto` lets the
runtime pick — useful when context length varies significantly
across cycles.

**Why max-effort on the Analyst is the cheap option.** The protocol
has a retry cap `N=3`. Each retry costs:
@@ -1051,7 +1121,7 @@ cycles** of compute across the three roles, plus user-visible
latency over multiple poll periods. A better-reasoned plan that
takes zero retries costs **2 cycles**. Spending more on the
Analyst's reasoning has a multiplier effect on total cost — the
expected cost of `max` effort is *lower* than `default` whenever it
expected cost of `max` effort is *lower* than `xhigh` whenever it
prevents even one retry.

**Tuning tip — `/fast` mode.** Fast mode runs Opus 4.6 with faster
@@ -1071,23 +1141,24 @@ of these for several cycles:
  (Analyst keeps having to re-derive the failure mode) → bump
  Integrator to `/effort max`.
- Analyst takes > 5 min per slug to draft an initial plan → check
  if `max` is producing analysis paralysis; consider `default`
  if `max` is producing analysis paralysis; consider `xhigh`
  effort with a tighter prompt.
- Initialization checklist takes > 2 min on a clean environment →
  consider Haiku 4.5; the work is procedural enough.
- Initialization checklist (Administrator phase 1 or v1 fallback)
  takes > 2 min on a clean environment → consider Haiku 4.5; the
  work is procedural enough.

### 9.6 Administrator — paste into <path-to-session-4>

Before pasting this prompt, set the session's model and effort:
```
  /model claude-sonnet-4-6
  /effort default
  /effort medium
```
  (Robust default per §9.5. Phase 1 is procedural verification with
   diagnostic suggestions on failure; phase 2 is small-summary
   aggregation. Haiku 4.5 is a viable cost-efficient alternative if
   the env is stable. Do NOT bump to Opus — Administrator is
   read-only on workers; reasoning depth is not the bottleneck.)
   read-only on the repo; reasoning depth is not the bottleneck.)

The full Administrator prompt body (configuration, phase 1
checklist invocation including the §15.2 verification block, phase