Commit dfe3521c authored by Vacaliuc, Bogdan's avatar Vacaliuc, Bogdan
Browse files

plan for definitive error propagation improvement in scan server

parent 173beabe
Loading
Loading
Loading
Loading
+311 −0
Original line number Diff line number Diff line
# Option C: Carry Data:Err through the scan server

**Status:** Proposal. Not started.
**Depends on:** Tier A (`chi_scan.py` weighted Poisson errors) landed on
`scantools:issue/4850`; Option B (`FittingIOC.py` Data:Err overwrite guard)
landed on the same branch.
**Motivation:** The current `FittingIOC.Data:Y` write handler synthesises
`Data:Err = sqrt(Data:Y)` which is correct only when `Data:Y` holds raw
Poisson counts. For monitor-normalised scans (`Data:Y = N/Q` via
`WriteDataToPV` with `norm_device='pcharge'`), this underestimates the per-point
error by a factor of `sqrt(Q)`. The fix is to compute the error where `N`
and `Q` are actually known — in the scan server's Jython `WriteDataToPV`
script — and ship it to the IOC as a companion array.

## 1. Scope

In scope:
- `writedatatopv.py` Jython-side compute-and-write of `Data:Err` from raw
  counts (and monitor counts, when normalising).
- `ScanTools.align.__init__` wiring for scalar detectors so the scan sequence
  targets the new error PV.
- `FittingIOC.py` small change so external `Data:Err` writes are accepted
  cleanly and don't race with the `Data:Y` handler's guarded fallback.
- Back-compat path so scans that predate this change (and third-party Jython
  calling `Script('WriteDataToPV', …)` with the old arg list) keep working.

Out of scope:
- The actual use of `Data:Err` inside the final curve fit. That is the
  "Tier B" fit-side work (modify `gauss_base_fit_py.py` / `gauss_line_fit_py.py`
  to accept `sigma=` and use `scipy.optimize.curve_fit(..., absolute_sigma=True)`;
  expose `Fit:{Pos,Amp,Wid,Base}Err` PVs). Option C makes that possible; it
  does not implement it.

## 2. Current data flow (pre-Option C)

At each scan step, with detector = scalar (e.g. `roi1`):

```
scan server Log(all) → Derby DB

Script('WriteDataToPV', [roi1, Data:Y, pcharge, 1.0])
  ↓ Jython:
    data = getData(roi1, pcharge)      # shape (2, N_accumulated)
    normed = data[0] * 1.0 / data[1]   # N/Q per step
    context.write(Data:Y, normed, False)
  ↓ CA put lands in FittingIOC
FittingIOC.write('Data:Y', value)

  setParam('Data:Y', value)
  setParam('Data:Err', sqrt(value))   ← WRONG for normalised scans
  trigger_fit()
  callbackPV('Data:Y')
```

The IOC sees `N/Q` and computes `sqrt(N/Q)`, which is NOT the error on
`N/Q`. The correct error on the normalised ratio (treating `Q` as
effectively noise-free) is `sqrt(N)/Q = sqrt(y/Q) = sqrt(y)/sqrt(Q)`.

For 2D / 1D profile detectors (`xy`, `xprofile`, `yprofile`) this is
already handled correctly by `trigger_data_to_y_and_err` (Tier A), which
writes `Data:Y` AND `Data:Err` via `setParam`, bypassing the `write()`
handler. Option C's scope is the scalar-detector path.

## 3. Proposed data flow (post-Option C)

```
Script('WriteDataToPV', [roi1, Data:Y, pcharge, 1.0, Data:Err])
  ↓ Jython:
    data = getData(roi1, pcharge)      # shape (2, N_accumulated)
    N = data[0]; Q = data[1]; Kn = float(norm_value)
    y = N * Kn / Q
    e = sqrt(N) * Kn / Q                # first-order Poisson error
    context.write(Data:Err, e, False)   # write error FIRST
    context.write(Data:Y,   y, False)   # then value — triggers the fit
```

Why errors first: the `Data:Y` handler in FittingIOC calls `trigger_fit()`
synchronously. If we let the fit see an out-of-date `Data:Err`, then even
Tier B would use stale errors on the Nth step. Writing errors first
guarantees `trigger_fit()` sees the matching error array.

The raw-counts regime is the same with no norm_device:
```
    y = N
    e = sqrt(N)
```

## 4. Changes by file

### 4.1 `/home/controls/share/scan/writedatatopv.py`  (shared repo, Jython 2.7)

Extend `WriteDataToPV.__init__` with an optional `err_pv` keyword. Compute
the error in `run()` and write it ahead of the value. Keep the existing
positional signature working.

```python
class WriteDataToPV(ScanScript):
    def __init__(self, device, pv, norm_device=None, norm_value=1.0, err_pv=None):
        self.device = device
        self.pv = pv
        self.norm_device = None if norm_device in (None, "-") else norm_device
        self.norm_value = norm_value
        self.err_pv = None if err_pv in (None, "-") else err_pv

    def getDeviceNames(self):
        names = [self.device, self.pv]
        if self.err_pv:
            names.append(self.err_pv)
        return names

    def run(self, context):
        kn = float(self.norm_value)
        if self.norm_device:
            data = np.array(context.getData(self.device, self.norm_device))
            N = data[0]; Q = data[1]
            try:
                y = N * kn / Q
                e = np.sqrt(np.abs(N)) * kn / Q   # sqrt(N)/Q * kn
            except:
                # Zero division / bad Q — fall back to the un-normalised value
                y = context.getData(self.device)[0]
                e = np.sqrt(np.abs(y))
        else:
            y = context.getData(self.device)[0]
            e = np.sqrt(np.abs(y))
        # Write error first so the fit sees a consistent (y, e) pair
        if self.err_pv:
            try:
                context.write(self.err_pv, array(e, 'd'), False)
            except Exception, ex:
                print "WriteDataToPV(%s) err_pv write exception: %s" % (self.err_pv, ex)
        try:
            context.write(self.pv, array(y, 'd'), False)
        except Exception, ex:
            print "WriteDataToPV(%s) exception: %s" % (self.pv, ex)
```

Notes:
- Jython 2.7 requires the `except Exception, ex:` form, not `as ex`.
- `numjy` supports `np.sqrt` and `np.abs` on array-ish objects; confirm on
  the target scan server before rollout by running the unit test in §7.
- `err_pv=None` (or `"-"`) keeps the legacy call-site behaviour: no error
  write. The IOC's Data:Y handler (with the Option B guard) still
  synthesises the `sqrt(value)` fallback for those legacy scans.

### 4.2 `/home/controls/common/scantools/issue/4850/python/ScanTools/align/__init__.py`

Only the scalar path needs the wiring; the `xy`/`xprofile`/`yprofile`
paths compute their own errors inside FittingIOC. At line 178–186:

```python
# Scalar (Data:Y) path — pass err_pv so the scan server writes Data:Err.
# 2D / 1D profile paths don't get an err_pv because trigger_data_to_y_and_err
# populates Data:Err internally (Tier A).
y_pv = f'{self.fitting_pv}Data:Y'
e_pv = f'{self.fitting_pv}Data:Err'
...
Script('WriteDataToPV', [ self.detector, y_pv, norm_device, norm_value, e_pv ]),
```

Keep the 2D/1D branches calling `WriteDataToPV` with the old 4-arg form
(err_pv defaults to None). Verify the call sites in the two other places
that use `WriteDataToPV` in the scan config (`table.py`, any bespoke
scripts in `/home/controls/bl4b/scan/`) are not impacted — the new
positional 5th argument is only read when the caller provides it.

### 4.3 `/home/controls/common/scantools/issue/4850/ioc/fit/FittingIOC.py`

Small additive changes:

- Add a `Data:Err` write branch so external puts are accepted and the
  fit is retriggered. This makes the scan-server's `Data:Err` write
  authoritative and keeps it synchronised with `Data:Y`:

  ```python
  elif reason == 'Data:Err':
      try:
          self.setParam(reason, np.atleast_1d(value))
          # Do NOT call trigger_fit() here — the subsequent Data:Y write
          # will trigger it (Option C ordering: Data:Err first, Data:Y
          # second). Retriggering here would double-fit every step.
      except Exception as e:
          logger.error(str(e))
          logger.error(traceback.format_exc())
      finally:
          self.callbackPV(reason)
  ```

- Make `Data:Err` asynchronous so `context.write(..., False)` reliably
  dispatches the write before the next command:

  ```python
  'Data:Err'   : { 'count' : MAX_ARRAY, 'asyn': True },
  ```

- Option B's guard in the `Data:Y` branch is already in place (landed
  with Tier A). No additional change required; when `err_pv` has been
  written, the guard naturally preserves it.

### 4.4 Version bump and ringlog message

- `FittingIOC.VERSION`: bump to `"1.4"`.
- Add an `info` log line in the `Data:Err` branch so the ringlog shows
  that propagated errors are being consumed (helps operators confirm
  the new path is live).

## 5. Back-compat strategy

- `writedatatopv.py` accepts the old 4-positional-arg signature unchanged.
  Scans that were saved to disk before the rollout keep working.
- Old scans that do not pass `err_pv` still land on the FittingIOC
  `Data:Y` handler, whose Option B guard fills `Data:Err = sqrt(value)`
  exactly as today. No regression.
- The new PV asyn flag on `Data:Err` is backward-compatible: existing
  callers that PUT without completion don't wait; the only behavioural
  change is that completion callbacks are emitted. Verify nothing in
  the existing OPI panels polls `Data:Err` synchronously.
- Rollout order matters: deploy `FittingIOC.py` first (IOC restart),
  then `writedatatopv.py` (scan server restart / script-path refresh),
  then `ScanTools/align/__init__.py` (no restart needed; re-run scans).
  Deploying `writedatatopv.py` first would still be safe because
  `FittingIOC` accepts `Data:Err` writes even before the new handler
  (pcaspy Driver stores the value; no fit retrigger matters for scans
  that write `Data:Y` after `Data:Err`).

## 6. Configuration / invariants to document

- Assumed monitor is noise-free (`Var(Q) ≈ 0`). Valid for pcharge in
  mC-scale integrations; documented in a comment at the `sqrt(N)*kn/Q`
  line.
- Raw-counts regime (no `norm_device`) treats `y` itself as a Poisson
  variate. Valid when `y` is an integer count PV. Not valid if the
  detector PV is pre-normalised upstream (e.g., some ROI PVs report
  counts/sec). Any detector that pre-normalises should be flagged in
  `devices.py` so alignment scans steer around it; add a `noraw` tag
  if needed and have Alignment.createScan refuse the combination.
- The `sqrt(abs(N))` guard handles the pathological case where Derby
  returns a small negative (e.g., a background-subtracted value) so the
  Jython doesn't throw. It does not cover genuine background-subtracted
  signal — that needs full propagation through `Var(S) = Var(raw) +
  Var(bkg)` and is a follow-up.

## 7. Test plan

**Jython unit test** (`/home/controls/share/scan/tests/test_writedatatopv.py`,
run on scan server):

1. Mock a `context` object with `getData(dev)` returning
   `[[10, 20, 30]]` and `getData(dev, norm)` returning
   `[[10, 20, 30], [1.0, 2.0, 3.0]]`.
2. Run `WriteDataToPV(dev, 'TEST:Y', 'pcharge', 1.0, 'TEST:Err').run(ctx)`.
   Assert the captured writes are
   `TEST:Err = [sqrt(10), sqrt(20)/2, sqrt(30)/3]` and
   `TEST:Y = [10, 10, 10]`.
3. Run the legacy 4-arg form. Assert only `TEST:Y` is written.
4. Verify `TEST:Err` write happens before `TEST:Y`.

**FittingIOC unit test** (pcaspy in SIM mode, or integration against a
soft IOC):

1. Init driver, put to `Data:Err` (length 3), put to `Data:Y` (length 3).
   Verify `Data:Err` is unchanged by the `Data:Y` handler.
2. Init driver, put only to `Data:Y` (length 3). Verify `Data:Err` equals
   `sqrt(abs(Data:Y))` (Option B fallback).
3. Init driver, put to `Data:Err` (length 2), then put to `Data:Y`
   (length 3). Verify the guard detects the mismatch and overwrites
   `Data:Err` with `sqrt(abs(value))` — this regresses to the
   synthesised Poisson default rather than carrying a stale length.

**Beamline smoke test**:

1. Fire a `caxis` alignment scan via the AlignmentIOC OPI with a scalar
   detector (e.g., roi1) and `norm_device=pcharge`.
2. Watch `Data:Err` during the scan — it should show values
   `sqrt(counts_i)/Q_i` scaled by `norm_value`, not `sqrt(counts_i/Q_i)`.
3. Compare against a manual computation using
   `setup/archiver-query.sh --pv roi1,pcharge` over the scan window.
4. Repeat with an `xprofile` detector — behaviour should be identical to
   post-Tier A (IOC-side errors), since Option C's wiring is scoped to
   the scalar path.

## 8. Known open items

- **Monitor statistical error.** Pcharge's statistical error is tiny
  for alignment-scale integrations but could matter for very short
  dwells (sub-mC). Future extension: `e = sqrt(N/Q^2 + (N/Q^2)^2 *
  Var(Q)/Q^2)` with `Var(Q)` provided by a site convention (e.g. 0.1%).
- **Background-subtracted detectors.** If a detector PV is
  `raw - dark`, the Jython has no visibility into `dark`. Either: (a)
  log the dark as a companion device and propagate, or (b) log the
  un-subtracted raw and let the IOC subtract+propagate. Option (b) is
  cleaner and scans better with Tier B.
- **Dead-time / per-pixel gain on 2D detectors.** Falls outside Option C
  (xy/xprofile/yprofile already handled in the IOC). Noted in
  `PoissonStatistics.mhtml` §6 (collaborator's caveat).

## 9. Estimated effort

- `writedatatopv.py` edit + unit test: ~1 hour
- `ScanTools/align` wiring: 10 minutes
- `FittingIOC.py` handler + asyn PV: 20 minutes
- Smoke test against beam: one scientist-hour of beamtime
- **Total:** half a day, distributed across three repos.

## 10. Redmine / commit references

- Tier A: commit on `scantools:issue/4850``weighted_moment_stats_poisson`
  helper, `extract_*_pos` rewire, `profile_scale` parameter.
- Option B: same commit as Tier A — `FittingIOC.Data:Y` overwrite guard.
- Option C: new Redmine ticket recommended (scope spans Jython
  scan-server repo `beamlines/share` and `scantools/issue/4850`).
+0 −0

File moved.