plan for definitive error propagation improvement in scan server (dfe3521c) · Commits · Vacaliuc, Bogdan / tasking

plan/option-c-error-propagation.md

0 → 100644

+311 −0

Original line number	Diff line number	Diff line
		# Option C: Carry Data:Err through the scan server

		Status: Proposal. Not started.
		Depends on: Tier A (`chi_scan.py` weighted Poisson errors) landed on
		`scantools:issue/4850`; Option B (`FittingIOC.py` Data:Err overwrite guard)
		landed on the same branch.
		Motivation: The current `FittingIOC.Data:Y` write handler synthesises
		`Data:Err = sqrt(Data:Y)` which is correct only when `Data:Y` holds raw
		Poisson counts. For monitor-normalised scans (`Data:Y = N/Q` via
		`WriteDataToPV` with `norm_device='pcharge'`), this underestimates the per-point
		error by a factor of `sqrt(Q)`. The fix is to compute the error where `N`
		and `Q` are actually known — in the scan server's Jython `WriteDataToPV`
		script — and ship it to the IOC as a companion array.

		## 1. Scope

		In scope:
		- `writedatatopv.py` Jython-side compute-and-write of `Data:Err` from raw
		counts (and monitor counts, when normalising).
		- `ScanTools.align.__init__` wiring for scalar detectors so the scan sequence
		targets the new error PV.
		- `FittingIOC.py` small change so external `Data:Err` writes are accepted
		cleanly and don't race with the `Data:Y` handler's guarded fallback.
		- Back-compat path so scans that predate this change (and third-party Jython
		calling `Script('WriteDataToPV', …)` with the old arg list) keep working.

		Out of scope:
		- The actual use of `Data:Err` inside the final curve fit. That is the
		"Tier B" fit-side work (modify `gauss_base_fit_py.py` / `gauss_line_fit_py.py`
		to accept `sigma=` and use `scipy.optimize.curve_fit(..., absolute_sigma=True)`;
		expose `Fit:{Pos,Amp,Wid,Base}Err` PVs). Option C makes that possible; it
		does not implement it.

		## 2. Current data flow (pre-Option C)

		At each scan step, with detector = scalar (e.g. `roi1`):

		```
		scan server Log(all) → Derby DB
		↓
		Script('WriteDataToPV', [roi1, Data:Y, pcharge, 1.0])
		↓ Jython:
		data = getData(roi1, pcharge) # shape (2, N_accumulated)
		normed = data[0] * 1.0 / data[1] # N/Q per step
		context.write(Data:Y, normed, False)
		↓ CA put lands in FittingIOC
		FittingIOC.write('Data:Y', value)
		↓
		setParam('Data:Y', value)
		setParam('Data:Err', sqrt(value)) ← WRONG for normalised scans
		trigger_fit()
		callbackPV('Data:Y')
		```

		The IOC sees `N/Q` and computes `sqrt(N/Q)`, which is NOT the error on
		`N/Q`. The correct error on the normalised ratio (treating `Q` as
		effectively noise-free) is `sqrt(N)/Q = sqrt(y/Q) = sqrt(y)/sqrt(Q)`.

		For 2D / 1D profile detectors (`xy`, `xprofile`, `yprofile`) this is
		already handled correctly by `trigger_data_to_y_and_err` (Tier A), which
		writes `Data:Y` AND `Data:Err` via `setParam`, bypassing the `write()`
		handler. Option C's scope is the scalar-detector path.

		## 3. Proposed data flow (post-Option C)

		```
		Script('WriteDataToPV', [roi1, Data:Y, pcharge, 1.0, Data:Err])
		↓ Jython:
		data = getData(roi1, pcharge) # shape (2, N_accumulated)
		N = data[0]; Q = data[1]; Kn = float(norm_value)
		y = N * Kn / Q
		e = sqrt(N) * Kn / Q # first-order Poisson error
		context.write(Data:Err, e, False) # write error FIRST
		context.write(Data:Y, y, False) # then value — triggers the fit
		```

		Why errors first: the `Data:Y` handler in FittingIOC calls `trigger_fit()`
		synchronously. If we let the fit see an out-of-date `Data:Err`, then even
		Tier B would use stale errors on the Nth step. Writing errors first
		guarantees `trigger_fit()` sees the matching error array.

		The raw-counts regime is the same with no norm_device:
		```
		y = N
		e = sqrt(N)
		```

		## 4. Changes by file

		### 4.1 `/home/controls/share/scan/writedatatopv.py` (shared repo, Jython 2.7)

		Extend `WriteDataToPV.__init__` with an optional `err_pv` keyword. Compute
		the error in `run()` and write it ahead of the value. Keep the existing
		positional signature working.

		```python
		class WriteDataToPV(ScanScript):
		def __init__(self, device, pv, norm_device=None, norm_value=1.0, err_pv=None):
		self.device = device
		self.pv = pv
		self.norm_device = None if norm_device in (None, "-") else norm_device
		self.norm_value = norm_value
		self.err_pv = None if err_pv in (None, "-") else err_pv

		def getDeviceNames(self):
		names = [self.device, self.pv]
		if self.err_pv:
		names.append(self.err_pv)
		return names

		def run(self, context):
		kn = float(self.norm_value)
		if self.norm_device:
		data = np.array(context.getData(self.device, self.norm_device))
		N = data[0]; Q = data[1]
		try:
		y = N * kn / Q
		e = np.sqrt(np.abs(N)) * kn / Q # sqrt(N)/Q * kn
		except:
		# Zero division / bad Q — fall back to the un-normalised value
		y = context.getData(self.device)[0]
		e = np.sqrt(np.abs(y))
		else:
		y = context.getData(self.device)[0]
		e = np.sqrt(np.abs(y))
		# Write error first so the fit sees a consistent (y, e) pair
		if self.err_pv:
		try:
		context.write(self.err_pv, array(e, 'd'), False)
		except Exception, ex:
		print "WriteDataToPV(%s) err_pv write exception: %s" % (self.err_pv, ex)
		try:
		context.write(self.pv, array(y, 'd'), False)
		except Exception, ex:
		print "WriteDataToPV(%s) exception: %s" % (self.pv, ex)
		```

		Notes:
		- Jython 2.7 requires the `except Exception, ex:` form, not `as ex`.
		- `numjy` supports `np.sqrt` and `np.abs` on array-ish objects; confirm on
		the target scan server before rollout by running the unit test in §7.
		- `err_pv=None` (or `"-"`) keeps the legacy call-site behaviour: no error
		write. The IOC's Data:Y handler (with the Option B guard) still
		synthesises the `sqrt(value)` fallback for those legacy scans.

		### 4.2 `/home/controls/common/scantools/issue/4850/python/ScanTools/align/__init__.py`

		Only the scalar path needs the wiring; the `xy`/`xprofile`/`yprofile`
		paths compute their own errors inside FittingIOC. At line 178–186:

		```python
		# Scalar (Data:Y) path — pass err_pv so the scan server writes Data:Err.
		# 2D / 1D profile paths don't get an err_pv because trigger_data_to_y_and_err
		# populates Data:Err internally (Tier A).
		y_pv = f'{self.fitting_pv}Data:Y'
		e_pv = f'{self.fitting_pv}Data:Err'
		...
		Script('WriteDataToPV', [ self.detector, y_pv, norm_device, norm_value, e_pv ]),
		```

		Keep the 2D/1D branches calling `WriteDataToPV` with the old 4-arg form
		(err_pv defaults to None). Verify the call sites in the two other places
		that use `WriteDataToPV` in the scan config (`table.py`, any bespoke
		scripts in `/home/controls/bl4b/scan/`) are not impacted — the new
		positional 5th argument is only read when the caller provides it.

		### 4.3 `/home/controls/common/scantools/issue/4850/ioc/fit/FittingIOC.py`

		Small additive changes:

		- Add a `Data:Err` write branch so external puts are accepted and the
		fit is retriggered. This makes the scan-server's `Data:Err` write
		authoritative and keeps it synchronised with `Data:Y`:

		```python
		elif reason == 'Data:Err':
		try:
		self.setParam(reason, np.atleast_1d(value))
		# Do NOT call trigger_fit() here — the subsequent Data:Y write
		# will trigger it (Option C ordering: Data:Err first, Data:Y
		# second). Retriggering here would double-fit every step.
		except Exception as e:
		logger.error(str(e))
		logger.error(traceback.format_exc())
		finally:
		self.callbackPV(reason)
		```

		- Make `Data:Err` asynchronous so `context.write(..., False)` reliably
		dispatches the write before the next command:

		```python
		'Data:Err' : { 'count' : MAX_ARRAY, 'asyn': True },
		```

		- Option B's guard in the `Data:Y` branch is already in place (landed
		with Tier A). No additional change required; when `err_pv` has been
		written, the guard naturally preserves it.

		### 4.4 Version bump and ringlog message

		- `FittingIOC.VERSION`: bump to `"1.4"`.
		- Add an `info` log line in the `Data:Err` branch so the ringlog shows
		that propagated errors are being consumed (helps operators confirm
		the new path is live).

		## 5. Back-compat strategy

		- `writedatatopv.py` accepts the old 4-positional-arg signature unchanged.
		Scans that were saved to disk before the rollout keep working.
		- Old scans that do not pass `err_pv` still land on the FittingIOC
		`Data:Y` handler, whose Option B guard fills `Data:Err = sqrt(value)`
		exactly as today. No regression.
		- The new PV asyn flag on `Data:Err` is backward-compatible: existing
		callers that PUT without completion don't wait; the only behavioural
		change is that completion callbacks are emitted. Verify nothing in
		the existing OPI panels polls `Data:Err` synchronously.
		- Rollout order matters: deploy `FittingIOC.py` first (IOC restart),
		then `writedatatopv.py` (scan server restart / script-path refresh),
		then `ScanTools/align/__init__.py` (no restart needed; re-run scans).
		Deploying `writedatatopv.py` first would still be safe because
		`FittingIOC` accepts `Data:Err` writes even before the new handler
		(pcaspy Driver stores the value; no fit retrigger matters for scans
		that write `Data:Y` after `Data:Err`).

		## 6. Configuration / invariants to document

		- Assumed monitor is noise-free (`Var(Q) ≈ 0`). Valid for pcharge in
		mC-scale integrations; documented in a comment at the `sqrt(N)*kn/Q`
		line.
		- Raw-counts regime (no `norm_device`) treats `y` itself as a Poisson
		variate. Valid when `y` is an integer count PV. Not valid if the
		detector PV is pre-normalised upstream (e.g., some ROI PVs report
		counts/sec). Any detector that pre-normalises should be flagged in
		`devices.py` so alignment scans steer around it; add a `noraw` tag
		if needed and have Alignment.createScan refuse the combination.
		- The `sqrt(abs(N))` guard handles the pathological case where Derby
		returns a small negative (e.g., a background-subtracted value) so the
		Jython doesn't throw. It does not cover genuine background-subtracted
		signal — that needs full propagation through `Var(S) = Var(raw) +
		Var(bkg)` and is a follow-up.

		## 7. Test plan

		Jython unit test (`/home/controls/share/scan/tests/test_writedatatopv.py`,
		run on scan server):

		1. Mock a `context` object with `getData(dev)` returning
		`[[10, 20, 30]]` and `getData(dev, norm)` returning
		`[[10, 20, 30], [1.0, 2.0, 3.0]]`.
		2. Run `WriteDataToPV(dev, 'TEST:Y', 'pcharge', 1.0, 'TEST:Err').run(ctx)`.
		Assert the captured writes are
		`TEST:Err = [sqrt(10), sqrt(20)/2, sqrt(30)/3]` and
		`TEST:Y = [10, 10, 10]`.
		3. Run the legacy 4-arg form. Assert only `TEST:Y` is written.
		4. Verify `TEST:Err` write happens before `TEST:Y`.

		FittingIOC unit test (pcaspy in SIM mode, or integration against a
		soft IOC):

		1. Init driver, put to `Data:Err` (length 3), put to `Data:Y` (length 3).
		Verify `Data:Err` is unchanged by the `Data:Y` handler.
		2. Init driver, put only to `Data:Y` (length 3). Verify `Data:Err` equals
		`sqrt(abs(Data:Y))` (Option B fallback).
		3. Init driver, put to `Data:Err` (length 2), then put to `Data:Y`
		(length 3). Verify the guard detects the mismatch and overwrites
		`Data:Err` with `sqrt(abs(value))` — this regresses to the
		synthesised Poisson default rather than carrying a stale length.

		Beamline smoke test:

		1. Fire a `caxis` alignment scan via the AlignmentIOC OPI with a scalar
		detector (e.g., roi1) and `norm_device=pcharge`.
		2. Watch `Data:Err` during the scan — it should show values
		`sqrt(counts_i)/Q_i` scaled by `norm_value`, not `sqrt(counts_i/Q_i)`.
		3. Compare against a manual computation using
		`setup/archiver-query.sh --pv roi1,pcharge` over the scan window.
		4. Repeat with an `xprofile` detector — behaviour should be identical to
		post-Tier A (IOC-side errors), since Option C's wiring is scoped to
		the scalar path.

		## 8. Known open items

		- Monitor statistical error. Pcharge's statistical error is tiny
		for alignment-scale integrations but could matter for very short
		dwells (sub-mC). Future extension: `e = sqrt(N/Q^2 + (N/Q^2)^2 *
		Var(Q)/Q^2)` with `Var(Q)` provided by a site convention (e.g. 0.1%).
		- Background-subtracted detectors. If a detector PV is
		`raw - dark`, the Jython has no visibility into `dark`. Either: (a)
		log the dark as a companion device and propagate, or (b) log the
		un-subtracted raw and let the IOC subtract+propagate. Option (b) is
		cleaner and scans better with Tier B.
		- Dead-time / per-pixel gain on 2D detectors. Falls outside Option C
		(xy/xprofile/yprofile already handled in the IOC). Noted in
		`PoissonStatistics.mhtml` §6 (collaborator's caveat).

		## 9. Estimated effort

		- `writedatatopv.py` edit + unit test: ~1 hour
		- `ScanTools/align` wiring: 10 minutes
		- `FittingIOC.py` handler + asyn PV: 20 minutes
		- Smoke test against beam: one scientist-hour of beamtime
		- Total: half a day, distributed across three repos.

		## 10. Redmine / commit references

		- Tier A: commit on `scantools:issue/4850` — `weighted_moment_stats_poisson`
		helper, `extract_*_pos` rewire, `profile_scale` parameter.
		- Option B: same commit as Tier A — `FittingIOC.Data:Y` overwrite guard.
		- Option C: new Redmine ticket recommended (scope spans Jython
		scan-server repo `beamlines/share` and `scantools/issue/4850`).

transcript.md→plan/transcript.md

+0 −0

File moved.

View file