Commit d6803184 authored by Brewer, Wes's avatar Brewer, Wes
Browse files

Add federation skill doc

parent 7994b0eb
Loading
Loading
Loading
Loading
+49 −0
Original line number Diff line number Diff line
---
name: federation
description: Run and debug the RAPS federation demo (TUI + JSONL output), including demo presets and JSONL inspection.
---

# Federation Demo Skill

Use this skill when the user asks how to run, demo, or debug the federation dashboard or its multi-process metrics flow.

## Quick Run (TUI)
1. Activate env:
   - `source /opt/venvs/exadigit/bin/activate`
2. Run demo preset (balanced queue + round-robin):
   - `python3 scripts/run_federation.py --demo`

## JSONL Debugging
Emit structured JSONL from the main process (avoids multi-process interleaving):
```
python3 scripts/run_federation.py --demo --output-json /tmp/fed.jsonl --output-mode both
```

Inspect:
```
python3 scripts/inspect_federation_json.py /tmp/fed.jsonl
```

JSONL record types:
- `start`, `end`
- `event` (raw site worker status messages)
- `local_enqueue` (jobs queued by dashboard loop)
- `snapshot` (full federation state)

## Demo Controls (when tuning)
- `--submit-interval` (job submission rate)
- `--dispatch-interval` (dispatch cadence)
- `--max-dispatch` (cap jobs per dispatch attempt)
- `--dispatch-policy max_free|round_robin|random`
- `--max-waiting` (cap WAITING jobs)
- `--node-choices 8,16,32,...` (bias smaller jobs)

## Debug Checklist
1. If jobs appear enqueued but never run, enable JSONL and check:
   - `event=ENQUEUED` per site
   - `snapshot.meta.waiting_jobs` and `snapshot.sites[site].queued_jobs`
2. If queue explodes, increase `--max-waiting` or slow `--submit-interval`.
3. If a site starves, switch to `--dispatch-policy round_robin` and reduce `--node-choices`.

## Known Env Constraint
Multiprocessing semaphores can fail in sandbox. If a `PermissionError` appears when creating semaphores, rerun outside the sandbox (escalated execution).