Commit bf4d7a5c authored by Vacaliuc, Bogdan's avatar Vacaliuc, Bogdan
Browse files

plan: complete pdf-tools fitness review — recommend Sphinx+MyST+Mermaid



Primary recommendation: plug the hackathon knowledge-base into the
existing quicknxsv2/docs/ Sphinx tree via myst-parser and
sphinxcontrib-mermaid, and build the stakeholder PDF with Sphinx's
LaTeX (or rinohtype) backend. Reuses the project's own toolchain;
renders Mermaid and math natively; produces both HTML and PDF from
one source.

Fallback: adopt pdf-tools/md2pdf.py from instrument-motion-investigations
with three targeted patches (neutralize the motion-control header,
pre-render Mermaid to SVG via mmdc or one-shot, optional Pygments).
Markdown sources remain valid MyST, so the fallback is a strict
subset with a zero-edit upgrade path to the primary.

Prototyped md2pdf.py against a representative sample-arch.md with
Mermaid + math + tables + code blocks: confirmed Mermaid passes
through as raw text (dealbreaker for architecture docs) and LaTeX
math is not typeset. Other md2pdf features (tables, TOC with page
numbers, code wrap, callouts) render correctly.

Closes plan/pdf-tools-fitness-review.md (status -> completed).

Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
parent a8663a82
Loading
Loading
Loading
Loading
+253 −0
Original line number Diff line number Diff line
# PDF-tools fitness recommendation — quicknxsv2 modularization docs

**Completed:** 2026-04-18
**Supersedes open status of:** `pdf-tools-fitness-review.md`
**Audience:** the session that writes the knowledge-base and the user who
will ask for the PDF.

## TL;DR

**Primary (robust) recommendation:** plug the hackathon knowledge-base
into the existing `quicknxsv2/docs/` Sphinx tree via `myst-parser` +
`sphinxcontrib-mermaid`, and build the stakeholder PDF with Sphinx's
LaTeX/PDF builder (or `rinohtype` if LaTeX is undesirable).  This reuses
the project's own documentation toolchain, renders Mermaid and math
natively, and produces **both** HTML (for developers) and PDF (for the
asked-for deliverable) from one source.

**Simple fallback (subset, additive upgrade path):** if the 5-15
knowledge-base documents are written as stand-alone Markdown files and
must ship a PDF within the hack-a-thon window, adopt `pdf-tools/md2pdf.py`
from the `instrument-motion-investigations` sister branch with three
targeted patches (see "Fallback path" below).  The Markdown sources are
already valid MyST, so the fallback is a strict subset of the robust
path — not a rewrite.

## Why not just copy md2pdf.py verbatim

`md2pdf.py` was built for **short single-file motion-investigation
reports** (DANGLE, S3-Gap, tthd, hs-HLS).  Verified fit-for-that-purpose.
Against the quicknxsv2 modularization knowledge-base it has these gaps:

| Gap | Impact for this use case |
|---|---|
| Mermaid blocks pass through as raw fenced code | **Module-topology graphs render as plaintext, not diagrams.** Dealbreaker for an architecture doc. |
| LaTeX math passes through as literal `$$…$$` text | Reflectivity formulas render un-typeset.  Minor but visible. |
| Hard-coded header `"${BL} — Motion Control"` | Wrong framing for a software-architecture doc. Easy patch, but the template is not neutral. |
| Single-file at a time, no cross-references | 5-15 documents can't link to each other with page numbers; each is its own PDF. |
| No syntax highlighting in code blocks | Mixed-audience readability suffers for long Python snippets. |
| No API-from-docstrings generation | Module reference sections have to be hand-written. |

The first two alone force either a pre-processing step or a different
tool.  Pre-processing Mermaid specifically needs `mermaid-cli`
(node/npm) or a headless browser — neither is already in the workspace
toolchain, both add a new ecosystem.

## Decision matrix

Weights reflect the problem as described: mixed scientist/developer
audience, ~5-15 markdown docs with Mermaid + code blocks + possibly
tables, and the user will "undoubtedly ask for a PDF".

| Criterion (weight) | md2pdf.py | Sphinx+LaTeX | Sphinx+rinohtype | MkDocs+PDF plugin | Quarto | Pandoc+LaTeX | pdoc |
|---|---|---|---|---|---|---|---|
| Fit to mixed audience (3) | Good | Excellent | Good | Excellent | Excellent | Good | Dev-only |
| Mermaid rendering (3) | **Fail** (passthrough) | Native via sphinxcontrib-mermaid | Same | Native (material theme) | Native | Filter required | No |
| Math/equations (2) | Passthrough | MathJax (HTML) / LaTeX (PDF) | Limited | MathJax | MathJax / LaTeX | MathJax / LaTeX | No |
| Code highlighting (2) | None | Pygments | Pygments | Pygments | Pygments | Pygments | Pygments |
| Auto-API from docstrings (2) | No | `autodoc` + `autosummary` (already configured) | Same | `mkdocstrings` | `quartodoc` | No | Yes (dev-oriented) |
| Cross-document links / unified TOC (2) | No | Yes | Yes | Yes | Yes | Partial | Per-module |
| Matches existing quicknxsv2 `docs/` toolchain (3) | No | **Yes (already configured)** | Drop-in backend | New tool | New tool | New tool | New tool |
| Pulls cleanly into pixi/uv (2) | uv OK | **In `[tool.pixi.feature.docs]` already** | `rinohtype` on conda-forge | New feature | Not on pixi-build by default (binary) | `pandoc` + `texlive` | uv OK |
| Maintenance burden beyond hackathon (3) | Low for 1 doc, high for 10 | Low (stdlib Python ecosystem) | Low | Low | Medium (new tool) | Medium | Low |
| Output flexibility (HTML + PDF) (1) | PDF only | Both | Both | Both, HTML-first | Both | Both | HTML-first |
| Time-to-first-PDF (1) | Minutes | Hours (first time) | Hours | Hours | Day (install) | Hours | Minutes |

**Scoring summary:** Sphinx+LaTeX wins on every "fit the existing
project" and "render the actual content" dimension.  md2pdf.py wins
only on time-to-first-PDF and simplicity — and only if Mermaid is
sacrificed.

## Primary path — Sphinx with MyST + Mermaid + LaTeX PDF

### What to do (one-pager)

1. **Add two dependencies to `quicknxsv2/pyproject.toml` under the
   existing `[tool.pixi.feature.docs]`:**

   ```toml
   [tool.pixi.feature.docs.dependencies]
   sphinx = ">=8"                      # already present
   sphinx_rtd_theme = ">=3.0.1"        # already present
   myst-parser = ">=4"
   sphinxcontrib-mermaid = ">=2"

   [tool.pixi.feature.docs.pypi-dependencies]
   sphinx-qt-documentation = "*"       # already present
   ```

   Both new packages are on conda-forge/noarch.  **LaTeX is only needed
   for the PDF build step** — see alternative below for a LaTeX-free
   option.

2. **Enable MyST + Mermaid in `docs/conf.py`:**

   ```python
   extensions = [
       # ... existing ...
       "myst_parser",
       "sphinxcontrib.mermaid",
   ]
   source_suffix = [".rst", ".md"]  # already present
   mermaid_output_format = "svg"    # crisp in LaTeX/PDF
   myst_enable_extensions = ["deflist", "fieldlist", "colon_fence"]
   ```

3. **Drop the hackathon knowledge-base `.md` files into
   `quicknxsv2/docs/developer/modularization/`** (new subdirectory).
   Add an `index.rst` or `index.md` that lists them in a toctree, and
   reference it from `docs/developer/index.rst`.

4. **Build HTML** (already a pixi task):
   ```bash
   pixi run build-docs
   ```

5. **Build PDF** via LaTeX:
   ```bash
   pixi run sphinx-build -b latex docs docs/_build/latex
   make -C docs/_build/latex all-pdf
   ```
   Add a pixi task `build-pdf` to encapsulate this.

   LaTeX is heavy (~2 GB with full `texlive`).  For a lighter setup,
   install only `texlive-xetex texlive-fonts-recommended
   texlive-latex-extra latexmk` (~400 MB).

### LaTeX-free alternative — rinohtype

If the LaTeX install is unacceptable, use `rinohtype` as the PDF
backend:

```toml
[tool.pixi.feature.docs.dependencies]
rinohtype = ">=0.5.5"        # on conda-forge
```

```bash
pixi run sphinx-build -b rinoh docs docs/_build/rinoh
```

rinohtype renders directly without LaTeX.  Output is less polished than
LaTeX/XeLaTeX but passes for internal stakeholder docs.  Mermaid
diagrams render as embedded SVG via `sphinxcontrib-mermaid`'s
image-output mode (requires `mermaid-cli`/`mmdc` on `$PATH` **or**
rendering Mermaid blocks to PNG/SVG as a pre-build step).

### Risks / limitations of the primary path

- **First-time setup cost**: a few hours to land the config, a LaTeX
  install, and the first successful PDF.  Recoverable via commit —
  future sessions copy the pattern.
- **`mermaid-cli` dependency** for SVG output.  Workarounds:
  - `sphinxcontrib-mermaid` can emit raw `<div class="mermaid">` for
    HTML and fall back to in-browser Mermaid.js (no external tool).
    For PDF, it needs `mmdc` to pre-render to SVG/PNG.
  - `mmdc` is not on conda-forge; install via `pixi add --feature docs
    nodejs` then `npm install -g @mermaid-js/mermaid-cli` inside the
    pixi env, or use a container.
  - **Simplest fallback**: render Mermaid to SVG once by hand with an
    online renderer or `mmdc` on any machine, commit the SVGs, and
    reference them from the Markdown.  One-time cost per diagram; zero
    toolchain growth.
- **Docs build time** goes from ~5 s (plain Sphinx) to ~30-60 s with
  Mermaid + LaTeX on first run.  Acceptable.
- **Existing `docs/developer/*.rst`** stays RST.  Hackathon docs are
  MyST Markdown.  Sphinx consumes both transparently; no rewrite
  required.

## Fallback path — md2pdf.py with targeted patches

Scoped for the scenario where the knowledge-base must ship a PDF
**this week** and the Sphinx integration can happen afterward.  The
Markdown sources written for this fallback are valid MyST, so
migrating to the primary path later requires **zero source edits**
only moving files into `quicknxsv2/docs/` and adding the toctree.

### What to do (one-pager)

1. **Copy** `pdf-tools/` from the `instrument-motion-investigations`
   sister branch into this branch (merge or cherry-pick — not file
   copy).

2. **Patch `md2pdf.py`** for use outside motion-investigation reports:

   - Neutralize the hard-coded header: replace `"${BL} — Motion
     Control"` with a CSS `string(doc-subtitle)` sourced from an
     optional `**Subtitle:** ...` metadata line, defaulting to empty.
   - Add a Mermaid pre-processing hook: before `markdown2.markdown(…)`
     runs, detect ```` ```mermaid ```` fenced blocks and:
     - **Option A (preferred)**: pipe them through `mmdc` (mermaid-cli)
       to render SVG, then replace the block with an
       `<img src="…">` tag referencing the SVG file.
     - **Option B (no node dep)**: render to SVG once by hand, reference
       the resulting file from the Markdown via a standard image link,
       and remove the Mermaid fence.  The trade-off is lost source
       provenance in the Markdown.
   - (Optional, recommended) wire Pygments for code-block syntax
     highlighting.  One ~40-line change: wrap `markdown2`'s
     `fenced-code-blocks` output in `<pre class="highlight">…</pre>`
     and import the Pygments CSS into the existing `CSS` string.

3. **Add a per-document batch driver** (a ~30-line `make_pdfs.py` that
   iterates the knowledge-base directory, calling `md2pdf.convert()`
   for each).  One PDF per source file; no cross-references between
   them.  For stakeholders, optionally concatenate the PDFs with
   `pymupdf` (already pinned by the pdf-tools `pyproject.toml`).

### Risks / limitations of the fallback path

- **No cross-document links with live page numbers.**  The TOC is
  per-file.  For a 5-15-document knowledge-base, this is a real
  usability loss that the primary path fixes for free.
- **Mermaid SVGs are managed out-of-band.**  Either the build depends
  on `mmdc` (new dependency) or the repo carries committed SVG/PNG
  artefacts next to the Markdown.
- **Template drift risk.**  Every CSS tweak you make for this doc-set
  (e.g., neutralizing the header) will diverge from the
  motion-investigation CSS on the sister branch.  Reconciliation is
  manual.  The primary path lives inside the project's own
  `docs/conf.py` and is co-owned with the rest of the team.
- **Maintenance burden grows with document count.**  For 1-3 documents
  the fallback is fine.  At ~10 documents, the missing cross-doc TOC
  and the per-file render loop start to feel like infrastructure.

## What "no tool" would look like (and why it's still a fallback)

If the hackathon stakeholders accept HTML or Confluence output, skip
the PDF build entirely — `pixi run build-docs` already emits HTML
under `docs/_build/html/` that can be served or uploaded.  This is
the default outcome of the primary path **before** the PDF build step;
adopting the PDF backend is additive.

Include this option for completeness: the user said they would
"undoubtedly" ask for a PDF, so the zero-tool option is offered only
as the minimum-viable baseline, not a recommendation.

## Suggested commit sequence

1. **This commit** — the recommendation document itself, updating
   `plan/pdf-tools-fitness-review.md` status.
2. **Next session** (if primary path chosen) — add `myst-parser` and
   `sphinxcontrib-mermaid` to `quicknxsv2/pyproject.toml`, wire
   `docs/conf.py`, commit.  Separate repo: do *not* batch with this
   `tasking/` commit.
3. **Subsequent session** — write/land the knowledge-base `.md`
   files under `quicknxsv2/docs/developer/modularization/`, verify
   HTML build, then the PDF build.

## One-line summary

Sphinx is the robust path because the project already owns it; md2pdf
is the correct tool for exactly the use case it was built for
(single-file motion-investigation reports) and nothing else.
+1 −2
Original line number Diff line number Diff line
# Review pdf-generation method fitness-for-purpose for quicknxsv2 docs

**Status**: open — pre-emptive review for documentation/reporting needs
that may arise during the modularization effort.
**Status**: completed — see pdf-tools-fitness-recommendation.md (2026-04-18)

## Note on this branch's current state