Commit 62fa7145 authored by Vacaliuc, Bogdan's avatar Vacaliuc, Bogdan
Browse files

configure for bl4b tthd investigation

parent 1a48721e
Loading
Loading
Loading
Loading
+27 −132
Original line number Diff line number Diff line
# bl4a-DANGLE-investigation instructions
# bl4b-tthd-investigation instructions

This project contains correspondence, screenshots and log files developed during the course
of an investigation into motion control problems of a newly upgraded instrument control component.
of an investigation into motion control problems of an instrument with chronic concerns.
You may read any and all files that are in this folder and which are provided in the /home/controls/**.
There are also files that may be referenced from the read-only experiment filesystem mounts at /SNS/REF_M/**.
There are also files that may be referenced from the read-only experiment filesystem mounts at /SNS/REF_L/**.
You may use the internet freely to research specifications and documentation.

## Capabilites and Role
@@ -15,166 +15,61 @@ You are able to direct agent teams who are expert system programmers and softwar

There are files that have been collected during the investigation of the fault. These are noted in the following files and folders:

* /SNS/users/6ov/BL4A/2026/04/09/
* /SNS/users/6ov/BL4B/2026/04/12/

## Direct archiver query (use this instead of asking for CSV exports)

The `setup/archiver-query` tool on `main` (committed 2026-04-11) reads the
SNS CSS archiver Oracle database directly — no operator CSV export needed.
For any DANGLE/mDANGLE motion follow-up that wants archived RBV / VAL / DMOV
/ AirPadStatus traces, the assistant can fetch them autonomously:
For any motor motion follow-up that wants archived RBV / VAL / DMOV traces,
the assistant can fetch them autonomously:

```bash
# Reproduce the operator-exported DANGLE incident CSV byte-for-byte
./setup/archiver-query.sh \
    --pv 'BL4A:Mot:DANGLE.RBV,BL4A:Mot:DANGLE,BL4A:Mot:DANGLE.DMOV,BL4A:Mot:AirPadStatus' \
    --start '2026-04-08 13:35:00' --end '2026-04-08 13:37:00' \
    --format csv -o /tmp/dangle.csv
    --pv 'BL4B:Mot:tthd,BL4B:Mot:tthd.RBV' \
    --start '2026-04-10 23:55:00' --end '2026-04-10 23:57:00' \
    --format csv -o /tmp/tthd.csv

# Or get the same data as JSONL for direct programmatic consumption
./setup/archiver-query.sh \
    --pv 'BL4A:Mot:DANGLE.RBV' \
    --start '2026-04-08 13:35:00' --end '2026-04-08 13:37:00'
    --pv 'BL4B:Mot:tthd,BL4B:Mot:tthd.RBV' \
    --start '2026-04-10 23:55:00' --end '2026-04-10 23:57:00'
```

Verified end-to-end against `bl4a-DANGLE-operation-fault-2026-04-08_1335.csv`:
all 128 data rows for all 4 PVs match byte-for-byte. The tool also surfaces
Verified end-to-end in other sessions. The tool also surfaces
**`is_marker=true`** for `Disconnected`/`Archive_Off`/`Write_Error` rows and
**`out_of_range=true`** for samples outside `num_metadata.[low_disp_rng,
high_disp_rng]` — both directly relevant for catching motion-failure context
that previously took manual log-archaeology to find.

The 3390-deg DANGLE recovery glitch from the 2026-03-16 Disconnected window
was found via this exact mechanism (`out_of_range=true` against the motor's
[-1000, 1000] display range). Use the same probe technique on any
multi-day DANGLE/mDANGLE failure recurrence — query a 30-day window and
filter by `is_marker || out_of_range` to find anomalies.

**Other useful verbs:**
- `--describe-channel BL4A:Mot:AirPadStatus` shows that AirPadStatus has
  *no* `enum_metadata` rows in the archiver — values render as raw 0/1, NOT
  as `"Off"`/`"On"`. Don't string-match; integer-compare.
- `--search 'BL4A:Mot:DANGLE*'` lists 24 channels related to DANGLE
- `--describe-schema` dumps the live severity/status integer maps. Recovery
  samples after `Disconnected` typically show `LINK_ALARM` (status_id=39).
Additional information can be found on the 'bl4a-DANGLE-investigation' branch
as needed, review the git log for the tasking repos.

Full docs: `setup/docs/sns-archiver-query.md` on `main`.
Plan: `plan/archiver-query-tool.md` on `main`.

## BL4A DANGLE/mDANGLE — established facts for future work

These are durable facts discovered during the 2026-04 investigation that closed with
`DANGLE-Motion-Failure-Analysis.md`. Always treat them as the starting point and verify
only if the beamline has been reconfigured since 2026-04-10.

### Runtime parameter values differ from the substitutions file

On 2026-02-19/20 `mDANGLE` was recalibrated via live `caput` and the substitutions file
`bl4a-Galil1.substitutions` line 55 was **not** updated. Since then the IOC has been running
with:

| Field | Substitutions default (line 55) | Runtime (authoritative) |
|---|---|---|
| `MRES` | `1.663148032e-04` | **`7.93e-05` deg/step** |
| `SREV` | `51200` | **`25600`** |
| `VELO` | `6.0906 deg/s` | **`1.45202 deg/s`** |
| `BVEL` | `6.0906` | `1.45202` |
| `BDST` | `0.5 deg` | `0.5 deg` *(unchanged, but now means 6305 steps not 3006 — see trap below)* |
| `RDBD` | `0.011` | has bounced between `0.001`, `0.005`, `0.01` — check latest `.sav` |
| `URIP` | *(not set in subs)* | `Yes` *(set by `profibus.template`)* |

**Always verify the live values with `caget` before computing any step-to-deg conversions.**
The substitutions file is stale; do not trust it for math.

### The BDST = 0.5 deg backlash trap (root cause of the 2026-04-08 scan failures)

With `BDST = 0.5 deg` and runtime `MRES = 7.93e-05`, a backlash pre-position move is
`0.5 / 7.93e-05 = 6305 motor steps` (= the "-6320…-6339 step" retries that appear in the
Galil IOC log on every failing DANGLE move since 2026-02-24).

The trap fires whenever the retry phase makes `diff` flip sign at the target boundary:
`preferred_dir` flips from true to false inside one `do_work()` call, Case 3 of the motor
record's move-selection logic fires, and the motor executes a `-6305`-step reverse move
followed by a `+6305`-step takeup — both with step loss on this air-padded heavy arm.

The fix is `BDST=0` (air-padded rotary stages have negligible mechanical backlash).
See `DANGLE-Motion-Failure-Analysis.md` for the full derivation and all the evidence.

### DANGLE/RotationAxis air-pad sequencing

`BL4A:Mot:DANGLE` is a **virtual soft motor** that orchestrates a full air-pad lifecycle
around the physical `mDANGLE` move. Wiring (see `bl4a_airpad_signals.db` and
`bl4a_airpad_signals_motor.db`):

1. `DANGLE:Seq` (sseq): block RotationAxis via DISP → clear Done.VAL → RunCheck (abort if
   RotationAxis busy) → clear SeqError → `AirPadControl=1` → 3 s delay → `AirPadOnCheck`
   (abort if `AirPadStatus≠1`) → `Setpoint.VAL → mDANGLE CA` → FLNK `SeqFinish`
2. `DANGLE:SeqFinish` (sseq): 2 s delay → `AbortOnError``AirPadControl=0` → 7 s delay →
   `AirPadOffCheck``SetSeqDone``SetSeqDone2``SetDone.PROC``Done.VAL=1` → virtual
   motor's DINP sees 1 → `DANGLE.DMOV=1` → scan server `completion=True` put-callback returns.

`RotationAxis` has an analogous `RotationAxis:Seq` / `RotationAxis:SeqFinish` chain.
They share the same air pad and are mutually exclusive via DISP locking and the
`:RunCheck` calcouts.

### mDANGLE readback chain (URIP=Yes via profibus.template)

The physical motor's `DRBV` does **not** come from the Galil step counter. The file
`bl4a-Galil1App/Db/profibus.template` lines 45–49 unconditionally mutates every motor
with a Profibus encoder:

```epics
record(motor, "$(S):Mot:$(M)") {
  field(RDBL, "$(S):Mot:$(M):EncPos")
  field(RRES, "1")
  field(URIP, "Yes")
}
```

For `mDANGLE`:
- Raw Profibus counts: `BL4A:Mot:mDANGLE:Enc` (modbus port `m1` = `10.111.8.46:502`, addr 44,
  100 ms poll interval)
- Converted to degrees: `BL4A:Mot:mDANGLE:EncPos` (calc, `A*B+C` where `A=:Enc`,
  `B=.ERES=-0.000466906`, `C=EOFF=3390`)
- The motor record samples this CP-linked value on every retry decision

Any noise/latency/transient in the Profibus encoder path is promoted to physical motion
by the retry logic. This is a secondary reliability concern; consider making the
`profibus.template` URIP override opt-in per axis.

### Authoritative file locations on bl4a-dassrv1
### Authoritative file locations on bl4b-dassrv1

| Purpose | Path |
|---|---|
| Motor record autosave (VELO, BVEL, BDST, RDBD, RTRY…) | `/home/controls/var/bl4a-Galil1/bl4a-Galil1.sav` + `.sav0/.sav1/.sav2` (rotating) |
| Motor record dated snapshots | `/home/controls/var/bl4a-Galil1/bl4a-Galil1.sav_YYMMDD-hhmmss` (on restart) |
| Motor record pass0 autosave (MRES, ERES, DVAL, OFF) | `/home/controls/var/bl4a-Galil1/bl4a-Galil1_pass0.sav*` |
| Galil command log (every PR/BG/MG/SH per controller) | `/home/controls/var/log/bl4a-Galil1.log` |
| Galil IOC stdout/stderr | `/home/controls/var/log/ioc_bl4a-Galil1.log` |
| Motor record autosave (VELO, BVEL, BDST, RDBD, RTRY…) | `/home/controls/var/bl4b-Galil1/bl4b-Galil1.sav` + `.sav0/.sav1/.sav2` (rotating) |
| Motor record dated snapshots | `/home/controls/var/bl4b-Galil1/bl4b-Galil1.sav_YYMMDD-hhmmss` (on restart) |
| Motor record pass0 autosave (MRES, ERES, DVAL, OFF) | `/home/controls/var/bl4b-Galil1/bl4b-Galil1_pass0.sav*` |
| Galil command log (every PR/BG/MG/SH per controller) | `/home/controls/var/log/dassrv1/ioc_bl4b-Galil1.log` |
| Galil IOC stdout/stderr | `/home/controls/var/log/dassrv1/ioc_bl4b-Galil1.log` |
| Scan server stdout (has real RBV values in `TimeoutException`) | `/home/controls/var/scan/console.log` |
| Scan device definitions (tolerances, timeouts) | `/home/controls/bl4a/python/scantools/devices.py` |
| Air pad virtual motor wiring | `/home/controls/bl4a/applications/bl4a-Galil1/bl4a-Galil1App/Db/bl4a_airpad_signals*.db` |
| Motor substitutions (incl. mDANGLE at line 55) | `/home/controls/bl4a/applications/bl4a-Galil1/bl4a-Galil1App/Db/bl4a-Galil1.substitutions` |
| Profibus URIP override template | `/home/controls/bl4a/applications/bl4a-Galil1/bl4a-Galil1App/Db/profibus.template` |
| Scan device definitions (tolerances, timeouts) | `/home/controls/bl4b/python/scantools/devices.py` |
| Motor substitutions | `/home/controls/bl4b/applications/bl4b-Galil1/bl4b-Galil1App/Db/bl4b-Galil1.substitutions` |

TODO: The above table needs to be expanded by incorporating the 2nd Galil IOC at '/home/controls/bl4b/applications/bl4b-Galil2/'

### Galil controller-to-axis mapping

| Controller | IP | Notable axes |
|---|---|---|
| `DMC1-1` | `10.112.8.41` | F=SANGLE, G=PSC6, H=Slit3Trans |
| `DMC1-2` | `10.112.8.42` | D=mRotationAxis, G=LSlit3, H=RSlit3 |
| `DMC2-1` | `10.112.8.44` | FOMRot, PolLift, SMPol* |
| `DMC2-2` | `10.112.8.45` | Slit1*, Slit2* |
| `DMC3-1` | `10.112.8.50` | Slit0, SampleSlit, SF1Translation |
| `DMC3-2` | `10.112.8.51` | A=LSlit4, B=RSlit4, C=DTrans, **D=mDANGLE**, E=AnalyzerTrans, F=He3AnalyzerTrans, G=AnalyzerRot, H=AnalyzerLift |

### Open mechanical question

The stepper only achieves **~30–50 % of commanded steps** on small correction moves
(measured from the 04-08 archiver CSV vs Galil log). This persists independent of the
BDST trap and will limit achievable scan tolerance even after `BDST=0`. Candidate causes:
air-pad supply pressure or valve timing, motor shaft coupling slip, encoder-to-arm
coupling, drivetrain backlash/compliance. Needs hands-on diagnosis.

TODO: This table needs to be constructed by inspection of /home/controls/bl4b/applications/bl4b-Galil1/iocBoot/iocbl4b-Galil1/st.cmd
and /home/controls/bl4b/applications/bl4b-Galil2/iocBoot/iocbl4b-Galil2/st.cmd

## Secure Temporary Files

+4 −173

File changed.

Preview size limit exceeded, changes collapsed.