Commit faa609a9 authored by Vacaliuc, Bogdan's avatar Vacaliuc, Bogdan
Browse files

plan: chi-scan integration re-assessment 2026-04-18



Audit of the instrument (/home/controls/{bl4b,share,common/scantools})
and the bl4b-scripts submodule against the 2026-04-08 analysis doc and
the 2026-04-15 flying_chi plan.

Findings promoted to the plans:

- Only one flying_chi run has executed (2026-04-15 08:56 EDT); sweep
  was mechanically clean but no profile samples were captured because
  beam was off and the pre-flight cadence check gave a false positive
  from the initial CA connect callback. Fix path outlined as P1.
- Two commits on flying-chi are orphaned in the local object DB
  (10d3bae, 38db2fc); deployed instrument-side flying_chi.py carries
  their content. Re-apply path outlined as P2 (cherry-pick).
- Seven of the eight analysis-doc §6 known-issues are closed on
  scantools:issue/4850 since 2026-04-13 (Poisson errors, chifit,
  figsize, unified st.cmd via symlink, new AlignScan_Chis.bob OPI,
  CHI Scan button on User screen). Remaining: static fit-mode flags
  (§6.4) and fit-mode-vs-Data:Y semantics (§6.7).
- Option C (scan-server Poisson error propagation) is ready to land
  offline and queued as P5.
- refactor-chi-scan plan: added §12 "Status re-assessment - 2026-04-18"
  covering T1 partial, orphan branch divergence, IOC-side progress,
  and priorities for the next dev-window opportunity.
- New plan: chi-integration-test-readiness.md lists P1-P9 with
  acceptance criteria, archiver-gap caveats, and single-page per-test
  operator protocol scaffolding.

Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
parent 7de725cf
Loading
Loading
Loading
Loading
+438 −0
Original line number Diff line number Diff line
# Chi-integration test readiness — work we can do without the instrument

**Date:** 2026-04-18
**Branch:** bl4b-alignment-integration (tasking) — this plan; targeted
output branches: `bl4b-scripts/flying-chi`, `scantools/issue/4850`,
`bl4b/issue/4850`, and the scan-server shared `share/scan/` repo.
**Context:** The instrument is in user-experiment mode for a couple of
days. We expect a short dev window afterwards and want to walk in with
every offline-preparable artefact done, every test pre-scripted, and
every archiver query template pre-written.

This plan is a companion to
`plan/refactor-chi-scan-with-flying-callback-technique.md` (see §12 for
the re-assessment that drove this plan) and to
`plan/option-c-error-propagation.md` (the scan-server-side Poisson
error-propagation design that's ready to ship).

## 1. Priority list (what unblocks what)

| # | Task | Where | Blocks | Beam needed? |
|---|---|---|---|---|
| P1 | Harden `flying_chi.py` pre-flight cadence check | `bl4b-scripts` | next T1 | no |
| P2 | Re-apply orphan commits (CSV path, `.gitignore`) | `bl4b-scripts` | clean state | no |
| P3 | Pre-scripted T1–T5 operator protocols + pre/post snapshot commands | this repo | next dev window | no |
| P4 | Pre-canned archiver query templates per test | this repo | post-window analysis | no |
| P5 | Land Option C (scan-server Poisson error propagation) | `scantools`, `bl4b`, `share/scan` | tighter final-fit errors | no (but needs IOC+scan-server restart to activate) |
| P6 | Offline dry-run harness for `chi_scan.extract_*` | this repo | IOC-integrated smoke test | no |
| P7 | OPI sanity audit of `AlignScan_Chis.bob` | `scantools` | button-click smoke test | no |
| P8 | Fit-function equivalence check (`Gauss_ConstantBkg` vs `fittings['gauss+const']`) | this repo | IOC-integrated smoke | no |
| P9 | Draft `theta_scan.py` module skeleton | new file in `bl4b-scripts` | Solid_theta integration | no |

P1, P2, P3 are the bare minimum to walk into the next dev window
prepared. P5 is the highest-impact offline deliverable — a small patch
set that wraps up Option C and is ready to deploy at the next IOC
restart. P6–P8 are confidence-building; P9 is the next frontier
(Redmine #4850's second script).

## 2. P1: Harden the pre-flight cadence check

**Problem.** On 2026-04-15, flying_chi's pre-flight check passed but the
subsequent sweep collected zero profile samples — because the initial
connection value counted as an "update" in the 1-second window but no
subsequent updates fired during the sweep (beam off → no events → ADnED
Stats plugin posts nothing new). See refactor-chi-scan-*.md §12.1 for
the full trace.

**Fix.** Modify `_preflight_profile_cadence_check` in `flying_chi.py`:

```python
def _preflight_profile_cadence_check(pvs, window_s=1.0, settle_s=0.3,
                                     min_hz=1.0):
    """Subscribe, wait for the initial CA connect callback to settle,
    then count only *fresh* updates in the window_s measurement.

    Raises RuntimeError if any PV's fresh update rate is < min_hz.
    """
    samples = {p: [] for p in pvs}

    def _make_cb(p):
        def _cb(*a, **kw):
            samples[p].append(kw['timestamp'])
        return _cb

    for p in pvs:
        camonitor(p, callback=_make_cb(p))
    try:
        time.sleep(settle_s)        # let initial callback(s) land
        # Record the high-water timestamp per PV, then measure fresh
        # updates past that point.
        hwm = {p: (max(samples[p]) if samples[p] else 0.0) for p in pvs}
        time.sleep(window_s)
        for p in pvs:
            fresh = sum(1 for t in samples[p] if t > hwm[p])
            rate = fresh / window_s
            if rate < min_hz:
                raise RuntimeError(
                    'Pre-flight cadence check failed for %s: %.2f fresh '
                    'updates/s (min %.2f). Beam may be off, '
                    'EventUpdatePeriod may be too fast, or StartDiag '
                    'did not enable the Stats plugin.'
                    % (p, rate, min_hz))
    finally:
        for p in pvs:
            camonitor_clear(p)
    return {p: len(samples[p]) for p in pvs}
```

Net change: one 300 ms settle before measurement, and high-water-mark
book-keeping so the initial-value callback is excluded. No new
parameters required on the caller side.

**Also**: expose a `skip_profile_callbacks=False` module-level parameter.
When `True`, neither the profile camonitor subscriptions nor the
pre-flight check run; only chi.RBV and PCharge are sampled, and the
post-process skips the profile path. This is the
**T1-beam-off operational mode** — mechanically the sweep runs, restore
runs, CSV is written with 4 columns (`timestamp, chi_rbv, pcharge_cum,
dPC_dt`), no fitting is attempted.

**Test plan (offline):** unit-test `sigma_of_profile` on a synthetic
profile; integration-test the full script's mainline is hard without
EPICS, so defer to the T1 re-run.

**Deliverable:** one commit to `bl4b-scripts/flying-chi`, adjacent to
the existing 206fbea, with the cadence-check fix and the new parameter.
Estimated size: ~30 lines.

## 3. P2: Re-apply orphan commits on `flying-chi`

Two commits exist in the local objects DB but no branch references
them: `10d3bae` (CSV path → `/home/controls/var/tmp/ScriptScan/csv/`)
and `38db2fc` (`.gitignore` adds `__pycache__`). See
refactor-chi-scan-*.md §12.3 for the diagnosis.

**Action:**

```bash
cd bl4b-scripts
git cherry-pick 10d3bae 38db2fc
# expect no conflicts; these are both trivial 2-line edits on the
# post-206fbea tree
```

If the cherry-pick succeeds cleanly, commit messages on the fresh
commits should annotate the provenance — e.g.:

```
Re-apply 10d3bae: put .csv files into /home/controls

Original 10d3bae was force-pushed off origin/flying-chi; orphan object
still in local clone. Re-applied as a fresh commit to restore the CSV
path change the instrument-deployed flying_chi.py already uses.
```

**Confirm with user before pushing.** The CSV path question
(`/SNS/users/6ov/BL4B/` vs `/home/controls/var/tmp/ScriptScan/csv/`) is
a policy choice; the instrument has been using the `/home/controls`
variant since the orphaned commit. Going with `/home/controls` is the
right call if we want CSV output to survive a `/SNS` FUSE dropout and
to be visible to the scientist on the local disk.

## 4. P3: Pre-scripted operator protocols for T1–T5

The original §6.2 protocols are correct but assume the operator reads
the plan inline. For the next dev-window opportunity we want a
**single-page copy-pasteable script per test** that the operator can
run without looking up references.

Deliverable: `plan/chi-test-protocols.md` with five numbered sections,
one per test, each containing:

1. What the test proves (one sentence).
2. `flying_chi.py` parameter block to paste at the top of the file (or
   the full path to the configured variant, e.g.,
   `/home/controls/var/tmp/ScriptScan/2026A/flying_chi-T1.py`).
3. **Pre-run caget script** to paste into an IPython shell — records
   the config fields the archiver doesn't cover:

   ```python
   from epics import caget
   for pv in ['BL4B:Mot:chis.VELO', 'BL4B:Mot:chis.VMAX',
              'BL4B:Mot:chis.BDST', 'BL4B:Det:N1:EventUpdatePeriod',
              'BL4B:Mot:si:Y:Gap:Readback',
              'BL4B:Mot:s1:Y:Gap:Readback',
              'BL4B:Mot:s2:Y:Gap:Readback']:
       print(f'{pv} = {caget(pv)}')
   ```

4. **Run command** (`%run /home/controls/var/tmp/ScriptScan/2026A/flying_chi-TN.py`)
   plus the start/stop timestamps to note.
5. **Post-run caget** — identical to pre-run. Paste output to
   `/home/controls/var/tmp/ScriptScan/csv/<date>/flying_chi-TN-pv.txt`.
6. **What to send to the agent** — the pre/post caget dumps plus the
   run window timestamps. Agent runs the archiver query (§5 below) and
   produces a pass/fail report.

### 4.1 Concrete T1 (re-do) protocol

Config:

```python
# flying_chi-T1.py (copy of flying_chi.py with these overrides)
startCHI = -0.2                # narrow range for mechanical validation
endCHI   = +0.2
flight_velocity = 0.1          # °/s — slow for safety
skip_profile_callbacks = True  # added by P1; suppresses profile path
# FitXdist / FitYdist / FitIdist are ignored when skip_profile_callbacks=True
MoveToXFit = False             # do not try to move after a mechanical-only sweep
```

Pass criteria:
- (a) `chis.RBV` goes monotonically from −0.2 to +0.2 across the
  archiver window (no stalls, no backward spikes > .RDBD).
- (b) Pre/post caget shows `chis.VELO, chis.VMAX, chis.BDST,
  EventUpdatePeriod, si/s1/s2 Y:Gap` identical.
- (c) CSV has ≥ 20 chi samples (narrow range, slow velocity → ≥ 4 s
  sweep at 500 ms default cadence → ≥ 8 samples; with the typical 100 ms
  the archiver sees → 20–40).

### 4.2 T2 (pre-flight failure) protocol

Same as T1 but `event_update_period_ms = 10`. Expect script to abort
pre-flight (RuntimeError printed), no chis motion in archiver. This
validates §5.3 defensive code path.

### 4.3 T3 (back-to-back with CHI_scan.py, beam on)

Run `CHI_scan.py` to completion as-is from
`/home/controls/var/tmp/PYTHON/2026-A/`. Immediately run flying_chi.py
with operator's production geometry (startCHI=-1.8, endCHI=+1.8,
flight_velocity=None, skip_profile_callbacks=False,
FitXdist=True, MoveToXFit=False). Compare fit centres. Pass if
|Δchi_fit| < larger of the two reported fit errors.

### 4.4 T4 (MoveToXFit)

Same as T3 but MoveToXFit=True. Archiver must show a final
`chis.put(fit_centre)` and `chis.RBV` converging to that value within
`.RDBD`.

### 4.5 T5 (Ctrl-C mid-flight)

Same as T1 but operator hits Ctrl-C at the ~halfway chi.RBV mark.
Verify restore block fired (pre/post caget of config fields match);
verify `EventUpdatePeriod`, slit Y:Gaps in archiver returned to initial.

### 4.6 Operator-facing single-page

Deliverable filename: `plan/chi-test-protocols.md`. Target length:
one page per test, five pages total. All copy-pasteable, no cross-refs.

## 5. P4: Pre-canned archiver query templates

For each test, a ready-to-run shell snippet that takes `START` and
`END` timestamps and produces a CSV suitable for agent analysis.

```bash
# T1 / T2 / T5 (mechanical sweep + restore)
./setup/archiver-query.sh \
    --pv 'BL4B:Mot:chis.RBV,BL4B:Mot:chis.DMOV,BL4B:Mot:chis.VELO,BL4B:Mot:chis.VMAX,BL4B:Mot:chis.BDST,BL4B:Det:N1:EventUpdatePeriod,BL4B:Det:N1:Det1:EventRate_RBV,BL4B:Det:N1:PChargeIntegrated_RBV,BL4B:Mot:si:Y:Gap:Readback,BL4B:Mot:s1:Y:Gap:Readback,BL4B:Mot:s2:Y:Gap:Readback' \
    --start "$START" --end "$END" \
    --format csv -o /tmp/flying_chi-TN.csv
```

**Known archiver gaps (from today's probe):**

- `BL4B:Mot:chis.VELO`**not archived**; rely on pre/post caget.
- `BL4B:Mot:chis.VMAX`**not archived**; rely on pre/post caget.
- `BL4B:Mot:chis.BDST`**not archived**; rely on pre/post caget.
- `BL4B:Det:PCharge`**not archived**; use
  `BL4B:Det:N1:PChargeIntegrated_RBV` (the engineering-stored
  integrated pcharge) for the intensity-on-vs-off check.
- Slit `Y:Gap:Readback` — archived.
- `chis.RBV`, `.DMOV` — archived with ~100 ms resolution during motion.

Put the archiver-gap table in the first section of
`plan/chi-test-protocols.md` so the operator knows what caget
invocations are load-bearing.

## 6. P5: Land Option C (scan-server Poisson error propagation)

Plan already exists at `plan/option-c-error-propagation.md`. Three
files, three commits, one new PV asyn flag. Estimated half-day of work
plus a Jython unit test.

**Rollout order** (per §5 of the Option-C plan):

1. `scantools/issue/4850/ioc/fit/FittingIOC.py` — add `Data:Err`
   asyn + external-write branch, bump VERSION to `"1.4"`. Needs
   `bl4b-Fit` IOC restart.
2. `share/scan/writedatatopv.py` — add `err_pv` kwarg with
   back-compat default. Needs scan-server restart (or just script-path
   refresh if the scan server picks up changes on next `scan_client.submit`).
3. `scantools/issue/4850/python/ScanTools/align/__init__.py` — pass
   `err_pv` in the scalar detector path only. No restart.

**Offline work completable now:**

- Write the three patches. Keep them as three commits, one per file,
  each with a reference to `plan/option-c-error-propagation.md` §N.
- Write the Jython unit test
  (`share/scan/tests/test_writedatatopv.py`) per §7 of the plan. Run
  it against a CPython stub that mimics `numjy.array`, `numjy.sqrt`,
  `numjy.abs` — enough to validate the math path.
- Write a pcaspy SIM test for the new `Data:Err` asyn branch; this can
  run on the dev machine without the instrument.

**Why do this offline now?** The IOC restart is quick (~30 seconds)
and can piggyback on any already-scheduled bl4b-Fit restart. Option C
is what moves the scalar-detector (`roi1`, etc.) path from
"sqrt(value)-approximate" errors to propagated Poisson errors, which
is a prerequisite for the `fittings.py` Tier B rewrite that consumes
`Data:Err` with `absolute_sigma=True`. We can't do Tier B until Option
C lands; we can land Option C without beam.

## 7. P6: Offline dry-run harness for `chi_scan.extract_*`

A small test harness that feeds a synthetic Gaussian beam through
`extract_XProfile_pos`, `extract_YProfile_pos`, and `extract_XY_pos`
and validates:

- 5-tuple return shape
- `DiagXY` is the expected shape `(480, 640)` after reshape
- `ErrSigma / Sigma` matches theory for a Gaussian Poisson profile
  (`ErrSigma = Sigma / sqrt(2 M0)`) within 10%
- `M0` consistency between the three entry points (extract_XY_pos
  should produce the same ErrSigma as extract_XProfile_pos with
  profile_scale=1 applied to the axis-summed 2D image)

**Deliverable:** `tests/test_chi_scan_offline.py` in this repo (not in
scantools — we don't own pytest config there yet). Imports the IOC's
`chi_scan.py` via a sys.path insertion pointing at
`/home/controls/common/scantools/issue/4850/ioc/fit/`. Skipped
gracefully if that path doesn't exist (so the test also works on
machines that don't have the instrument mount).

Estimated size: ~100 lines.

## 8. P7: OPI sanity audit of `AlignScan_Chis.bob`

Read `AlignScan_Chis.bob` (1319 lines, committed as `b56389e` on
scantools:issue/4850). Build a checklist:

- Which `BL4B:CS:Align:*` and `BL4B:CS:Fitting:*` PVs does the panel
  bind to?
- Are all of those PVs actually served by the corresponding IOCs
  (`AlignmentIOC.py` v1.6, `FittingIOC.py` v1.3)?
- Does the embedded Jython that forces `Detector` to `'xprofile'` run
  on panel open or on button click? (line 227, per the grep.)
- Is the "Run" button wired to `BL4B:CS:Align:Run` as put-with-completion?
- Any `BL4B:CS:Align:Diag:XY` image widget that expects 640×480?
- Any stale references to `Data:X/Y` instead of `Data:XPROFILE`?

Deliverable: a section in `plan/chi-test-protocols.md` titled
"Pre-click PV checklist" that the operator can run once before the
first button-click smoke test (§12.6 item 5 of the sibling plan).

## 9. P8: Fit-function equivalence check

The Spyder `CHI_scan.py` fits `Gauss_ConstantBkg(x, a, b, c, d) = d +
a*exp(-(x-b)^2/(2c^2))`. The `fittings.py` method `"gauss+const"` should
produce the same peak centre `b` for the same input. We can verify
offline with synthetic data.

**Deliverable:** a one-off Python script under `tests/` (not checked
in as part of a suite — just a script that prints pass/fail and
plots):

```python
import numpy as np
from scipy.optimize import curve_fit
# Path into the IOC's fittings module
sys.path.insert(0, '/home/controls/common/scantools/issue/4850/ioc/fit/')
from fittings import fits, fit_methods

x = np.linspace(-1.8, 1.8, 13)
true = dict(a=5.0, b=0.3, c=0.4, d=1.0)
rng = np.random.default_rng(0)
y = true['d'] + true['a']*np.exp(-(x-true['b'])**2/(2*true['c']**2))
y_noisy = y + rng.normal(0, 0.1*y.max(), size=y.shape)

# Spyder path
def G(x,a,b,c,d): return d + a*np.exp(-(x-b)**2/(2*c**2))
p_spy, _ = curve_fit(G, x, y_noisy, p0=[y_noisy.max()-y_noisy.min(), x[y_noisy.argmax()], 0.5, y_noisy.min()])
print('Spyder b =', p_spy[1])

# fittings path
gc = fits[fit_methods.index('Gauss+const')]
base, amp, pos, wid, fx, fy = gc['method'](x, y_noisy, restrict='None', fwhm=0)
print('fittings pos =', pos)

print('Δb =', abs(p_spy[1] - pos))
```

Pass criterion: |Δb| < 0.01°.

## 10. P9: Draft `theta_scan.py` skeleton

The analysis doc §9 and §12 list this as medium-term. It is a natural
follow-up to chi-scan integration and is almost entirely an offline
exercise until the dual-peak fit is validated.

**Deliverable:** a skeleton `theta_scan.py` in
`/home/controls/common/scantools/issue/4850/ioc/fit/theta_scan.py`
with:

- `extract_DB_RB_peaks(XYarray)` — returns
  `(DB_pos, RB_pos, delta_pixel, DiagXY)`.
- Module-level constants mirroring the Spyder script (ROI, ThetaPixDeg,
  etc.).
- TODO comments noting what the per-step scan sequence will need
  (multi-motor Script scan, second-count per-step, DBpos carryover
  between steps).

Do NOT wire it into `FittingIOC.py` yet. Keep it as an isolated file
so the chi work can continue to ship without regression risk.

## 11. Status of each priority (to be kept current)

| # | Priority | Status | Owner | Completed commit(s) |
|---|---|---|---|---|
| P1 | Cadence-check hardening | not started | | |
| P2 | Orphan commit re-apply | not started | | |
| P3 | Test protocols | not started | | |
| P4 | Archiver query templates | partly (in this file) | | |
| P5 | Option C landing | not started | | |
| P6 | Offline extractor harness | not started | | |
| P7 | OPI audit | not started | | |
| P8 | Fit-function equivalence | not started | | |
| P9 | theta_scan skeleton | not started | | |

Update this table as work lands. When all of P1–P5 are at
`completed`, the dev-window opportunity is fully prepared.

## 12. Dependencies and ordering constraints

- **P1 before P3**: the new `skip_profile_callbacks` parameter changes
  the T1 protocol.
- **P2 any time**: safe, no dependencies.
- **P3 and P4**: produce one file (`plan/chi-test-protocols.md`) jointly.
- **P5 before P8**: P8 checks that the IOC fit path matches the Spyder
  path; Option C doesn't change the fit function itself, so P8 is valid
  pre-P5. But the Tier B fit-side rewrite (`absolute_sigma=True`) that
  Option C enables will change numerics slightly. Re-run P8 after Tier B.
- **P6 any time**: self-contained.
- **P7 before the beam-on button-click smoke test** (#5 in the sibling
  plan's §12.6).
- **P9 any time**: decoupled from all above.

## 13. What explicitly does NOT happen offline

- Any `caput` to the live instrument. No exceptions.
- Any IOC restart, scan-server restart, or `.bob` reload.
- Any test that depends on EPICS connectivity to BL4B (we have
  archiver-only access from the dev machine).
- Any speculative change to `flying_z.py` (plan §10's back-port is
  still gated on the flying-chi try/finally pattern being validated by
  a successful T5).
+196 −3

File changed.

Preview size limit exceeded, changes collapsed.