Commit 83469746 authored by Vacaliuc, Bogdan's avatar Vacaliuc, Bogdan
Browse files

results: Step 2/3/4 implementation captured



Documents the as-built deltas from plan/archiver-query-tool.md:

- The one bug found in implementation: Python datetime binds as Oracle DATE
  (second precision) by default in oracledb thin mode. Without explicit
  setinputsizes(DB_TYPE_TIMESTAMP), the carry-forward result is silently
  truncated and BETWEEN pulls in extra rows from the same wall-clock second.
- CSS CSV format value-propagation rules — why the database has 121 rows
  for DANGLE.RBV but the operator CSV has 127 (the 6 extras are propagated
  from other PVs' events in the multi-PV export).
- Float precision: CSS uses num_metadata.prec for fixed-precision rendering,
  trailing zeros kept, not stripped. f"{v:.{prec}f}" matches byte-for-byte.
- status_id=39 from the Step 1 live test was LINK_ALARM — EPICS standard
  "input link broken" alarm seen on first samples after Disconnected.
- The full T6/T7/T8/T10 verification gates from the plan all pass.

Tool is committed to main as setup/archiver-query and setup/archiver-query.sh.
This branch's role for the development is now complete; the per-tasking-branch
knowledge capture (Step 6 of the plan) follows in the next commits to other
tasking branches.

Co-Authored-By: default avatarClaude Opus 4.6 (1M context) <noreply@anthropic.com>
parent cf996188
Loading
Loading
Loading
Loading
+221 −0
Original line number Diff line number Diff line
# Step 2 / Step 3 / Step 4 — Implementation Results

**Date:** 2026-04-11
**Branch:** `tasking/css-archiver-query-tool-development`
**Outcome:** Tool committed to `main` as `setup/archiver-query` and
`setup/archiver-query.sh` with byte-for-byte parity to the operator-exported
DANGLE incident CSV.

This document captures the as-built deltas from `plan/archiver-query-tool.md`
that any future maintainer (or follow-up investigation) needs to know.

---

## Summary

| Plan step | Status | Notes |
|-----------|--------|-------|
| Step 1 (live read feasibility) | ✅ already done | See `step1-live-test-results.md` |
| Step 2 (MVP) | ✅ done | One bug fixed live (TIMESTAMP bind precision) |
| Step 3 (CSV correctness gate) | ✅ done | T6, T7, T8, T10 all pass; byte-for-byte CSV match |
| Step 4 (CLI flesh-out) | ✅ done | Multi-PV, csv/tsv, search, describe-* implemented |
| Step 5 (wrapper + docs + commit) | ✅ done | Tool on `main`, merged into `uvdl3` |
| Step 6 (knowledge capture) | in progress | Per-branch tasking/CLAUDE.md updates pending |

The tool source is `setup/archiver-query` (~1090 lines of Python) plus
`setup/archiver-query.sh` (uv-managed bash wrapper). Per-archive credentials
file at `~/.config/sns-archiver/credentials` (mode 600), bootstrapped from
`~/opt/css/product-sns-4.7.4-SNAPSHOT/settings.ini` on first run.

---

## The one live bug found in implementation

The Step 1 live test caught the inline-SQL carry-forward hang and the
mandatory `ALTER SESSION SET TIME_ZONE` requirement. **Step 2 found one
additional issue** that the source review missed:

### Python `datetime` binds as Oracle `DATE`, not `TIMESTAMP`

By default, `oracledb` thin-mode binds Python `datetime.datetime` as Oracle
`DATE` (second precision) rather than `TIMESTAMP` (sub-second precision).
When the tool first ran, `get_actual_start_time()` correctly returned
`2026-04-08 13:20:24.921311` from the stored procedure — but passing that
datetime back to the `BETWEEN` clause as a bind variable truncated it to
`2026-04-08 13:20:24` (microseconds → 0). The 1-second-wider window pulled in
3 extra rows from the same second (4.3931, 4.396, 4.3988) before the
carry-forward sample, producing 124 rows instead of the correct 121.

**Fix:** explicit `setinputsizes(... oracledb.DB_TYPE_TIMESTAMP)` on every
cursor that binds a datetime — `get_actual_start_time` and `fetch_samples`
both. Without this the bind goes through Oracle's implicit DATE conversion
and microseconds are silently dropped.

```python
import oracledb
cur.setinputsizes(
    cid=None,
    sts=oracledb.DB_TYPE_TIMESTAMP,
    ets=oracledb.DB_TYPE_TIMESTAMP,
)
cur.execute(sql, cid=channel_id, sts=start, ets=end)
```

**Why the source analysis didn't catch this:** the Phoebus Java code uses
`PreparedStatement.setTimestamp(int, Timestamp)` which binds as `TIMESTAMP`
unconditionally. Python `oracledb` uses `DATE` as the default Python-datetime
mapping for backwards compatibility with `cx_Oracle`. This is a Python-side
gotcha, not a schema issue.

**Generalization:** any future Python tool that talks to Oracle TIMESTAMP
columns via `oracledb` bind variables needs `DB_TYPE_TIMESTAMP` setinputsizes
or it will silently lose sub-second precision. Worth promoting to the parent
`CLAUDE.md` Cross-Project Patterns once enough Python-Oracle work has been
done to confirm the pattern applies broadly. **For now, captured here.**

---

## CSS Data Browser CSV format — what we learned matching it byte-for-byte

CS-Studio's `--format csv` export wraps the deltas-only sample stream in a
sparse table where every PV has a column at every unique timestamp from any
PV's events. The single-PV view has more rows than the database has samples
because CSS propagates last-known values forward into rows triggered by
*other* PVs' events.

Concrete from the DANGLE window:
- Database has **121** `BL4A:Mot:DANGLE.RBV` samples in the 13:35–13:37 window
- Operator CSV has **127** rows where the DANGLE.RBV column is non-`#N/A`
- The 6 "extra" rows are CSS-propagated values triggered by changes to one
  of the other 3 PVs in the same export

Our `write_css_csv()` does the same:
1. Build per-PV `{ts_ms: value}` maps from the actual database samples
2. Compute the sorted union of all timestamps across all PVs
3. At each timestamp, walk forward: if a PV has a sample at that timestamp,
   update its "last-known" value; otherwise keep the previous last-known
4. Emit `#N/A` for any PV that has no last-known yet (no carry-forward and
   no sample-at-this-time)

Float rendering also matters for byte-for-byte parity:
- CSS uses `num_metadata.prec` (4 for DANGLE.RBV) and emits fixed-precision
  decimals — `4.5710` not `4.571`, and `4.4277` not `4.427700000000001`
- Trailing zeros are **kept**, not stripped
- `_stringify_value(v, prec=4)` does `f"{v:.4f}"` which matches exactly

JSONL output is unaffected — it still emits raw IEEE 754 floats so the agent
has full precision when doing arithmetic on values.

---

## Recovery samples after `Disconnected` are `LINK_ALARM (39)`

The Step 1 live test surfaced `status_id=39` on recovery samples for
`BL4A:Mot:DANGLE.RBV` immediately after `Disconnected` markers and noted it
was unknown. **Step 4's `--describe-schema` revealed it: `LINK_ALARM`**.

This is the EPICS standard "input link broken" alarm — exactly what you'd
expect when an IOC is just coming back online from a network glitch and the
record's input link hasn't re-validated yet. The agent investigating a
Disconnected window can now expect to see `LINK_ALARM` recovery samples and
treat them as the "first sample after the IOC came back" rather than a
mysterious code.

The tool's hard-coded `STATUS_NAMES` dict was extended to include `LINK_ALARM`
during Step 4 work. The full live status table (47 entries) is reachable via
`./setup/archiver-query.sh --describe-schema`.

---

## What's NOT in the v1 tool (deliberate)

1. **No waveform / `array_val` BLOB support.** The BLOB format is documented
   in the analysis doc and would be straightforward to add, but no v1 use
   case needs it.
2. **No OPTIMIZED stored-procedure path** (`get_browser_data` with
   min/max/avg buckets). Raw is fine for minute-to-hour windows.
3. **No PV-name wildcards in `--pv`.** `--search GLOB` is the discovery
   path; `--pv` takes explicit names.
4. **No `--decimate N` for histogram-style compression.** Future v2 if a
   multi-day window investigation needs it.
5. **No write access.** Read-only by design; the credentials file
   authenticates as a read-only Oracle account anyway.

---

## How to use the tool from this branch's investigation context

Even though this branch is for the *development* of the tool, the tool is
already usable from any clone after `git checkout main && git pull`. The
tool's wrapper handles its own venv, so you don't need to be on this branch
to use it.

```bash
# Smoke test
./setup/archiver-query.sh \
    --pv 'BL4A:Mot:DANGLE.RBV' \
    --start '2026-04-08 13:35:00' \
    --end   '2026-04-08 13:37:00'

# Reproduce the operator CSV exactly
./setup/archiver-query.sh \
    --pv 'BL4A:Mot:DANGLE.RBV,BL4A:Mot:DANGLE,BL4A:Mot:DANGLE.DMOV,BL4A:Mot:AirPadStatus' \
    --start '2026-04-08 13:35:00' --end '2026-04-08 13:37:00' \
    --format csv -o /tmp/dangle.csv

# Diff against the original
grep -v '^#' /tmp/dangle.csv | grep -v '^$' > /tmp/ours-data.tsv
grep -v '^#' ~/analysis/BL4A/2026/04/09/bl4a-DANGLE-operation-fault-2026-04-08_1335.csv | grep -v '^$' > /tmp/csv-data.tsv
diff /tmp/ours-data.tsv /tmp/csv-data.tsv && echo "✅ exact match"
```

---

## What this unblocks for future investigations

Any branch on the tasking submodule that needs archive-time-series data can
now:

1. Run `./setup/archiver-query.sh --pv ... --start ... --end ...` directly
   from any session, on any machine that can reach `snsoroda-scan:1521`
2. Get JSONL output the agent can parse without operator help
3. Get sub-millisecond timestamp precision and per-PV anomaly flags
4. Discover unfamiliar PVs via `--search` and inspect their metadata via
   `--describe-channel` before assuming semantics
5. Reproduce a CS-Studio-equivalent CSV for human consumption when needed
   (`--format csv`)

The 3-day operator-CSV-export round-trip that dominated the DANGLE
investigation collapses to a one-shot bash invocation. Each tasking branch
that touches archiver data should be told about this; that's what Step 6
("Knowledge capture") of the plan does.

---

## File inventory

| File | Branch | Purpose |
|------|--------|---------|
| `setup/archiver-query` | `main` | Python entrypoint (~1090 lines) |
| `setup/archiver-query.sh` | `main` | bash wrapper (uv venv mgmt) |
| `setup/docs/sns-archiver-query.md` | `main` | user-facing docs |
| `CLAUDE.md` (Cross-Project Patterns) | `main` | quick-reference summary |
| `plan/archiver-query-tool.md` | `main` | original plan + refinements |
| `tasking/css-rdb-reader-analysis.md` | this | Phoebus source review |
| `tasking/step1-live-test-results.md` | this | live feasibility test |
| `tasking/step2-3-implementation-results.md` | this | this file |

---

## Bottom line

The plan was **mostly** implementable as written — the source review and
Step 1 live test had already eliminated the major uncertainties. The single
implementation surprise was the Python `datetime` → Oracle `DATE` bind
truncation, which would have been invisible in a side-by-side diff against
CSS Java's JDBC binds. Caught and fixed in ~10 minutes by walking from a
122-row count to a 121-row count and noticing the off-by-one was actually
off-by-3.

Tool is committed to `main`, merged into `uvdl3`, ready for Step 6 knowledge
capture across the other tasking branches.