Commit 1244261f authored by Vacaliuc, Bogdan's avatar Vacaliuc, Bogdan
Browse files

pdf-tools: upgrade md2pdf.py for distribution-quality investigation reports



Re-renders tasking/DANGLE-Motion-Failure-Analysis.pdf from the markdown
source using WeasyPrint with a fully-revised print stylesheet and light
HTML post-processing. The previous PDF (generated by Chromium headless
print) had a filename header, a file:// footer, a run-on metadata
paragraph, no table of contents, and code blocks that were clipped at
the right margin in the appendices. This commit replaces it with a
professional-looking 15-page report suitable for handing to motion
control engineers.

Rendering features added to md2pdf.py:

- Title page with the H1 title and a parsed metadata key/value grid
  (pulled from the `**Label:** value` lines that follow the H1).
- Table of contents auto-built from H2 headings, using WeasyPrint's
  leader(dotted) and target-counter(attr(href url), page) to produce
  dotted leaders and live page numbers. All 13 H2s appear in the TOC,
  including the appendices and the bottom-line section (fixed a regex
  bug where the earlier version's H2 match required `id="..."` to be
  the first attribute, dropping any H2 we'd tagged with a class).
- Running page header (document title on the left, "BL4A -- Motion
  Control" on the right) and "Page X of Y" footer. No filename or
  file:// URL.
- Executive Summary section wrapped in an amber callout box.
- Bottom line section wrapped in a green callout box, forced onto its
  own final page.
- Appendix H2s ("Appendix A:" through "Appendix D:") styled with a
  tinted purple band and each forced onto its own page for clean
  back-matter separation.
- Recommended Fixes, Open Questions, and Appendix A get CSS
  `break-before: page` so they always start on a fresh page.
- Code blocks dropped from 8 pt to 7.25 pt and switched from
  `white-space: pre` to `white-space: pre-wrap`, so the long lines in
  the substitutions comparison and the Appendix A parameter history
  wrap at word boundaries instead of silently clipping past the right
  margin. Appendix A's trailing notes and the Appendix D timeline now
  render in full.
- DejaVu Sans / DejaVu Sans Mono throughout (already installed on
  every Linux machine; no exotic fonts needed).
- Each H2 has `break-after: avoid` to prevent orphaned headings.
- Debug HTML is now opt-in via `MD2PDF_DEBUG=1` so it doesn't clutter
  the tasking directory by default.

Auxiliary additions:

- pdf-tools/README.md — quickstart, option reference, the
  section-level styling hooks the tool looks for in the source
  markdown, and the trade-offs (ASCII-art wrap vs clipping, no
  syntax highlighting yet, no section numbering yet, wide-table
  landscape as a future option).
- pdf-tools/.gitignore — excludes .venv/, __pycache__/, *.debug.html.

File-size comparison:

- Chromium headless print: 398,830 bytes, 7 pages
- WeasyPrint via md2pdf.py: 111,204 bytes, 15 pages

Co-Authored-By: default avatarClaude Opus 4.6 (1M context) <noreply@anthropic.com>
parent 00e439e1
Loading
Loading
Loading
Loading
−281 KiB (109 KiB)

File changed.

No diff preview for this file type.

pdf-tools/.gitignore

0 → 100644
+4 −0
Original line number Diff line number Diff line
.venv/
__pycache__/
*.pyc
*.debug.html
+125 −0
Original line number Diff line number Diff line
# pdf-tools — render investigation reports to print-quality PDF

Markdown → PDF pipeline tuned for BL4A motion-control failure-analysis
reports like `tasking/DANGLE-Motion-Failure-Analysis.md`. Built on
WeasyPrint with a print-oriented CSS stylesheet and light HTML
post-processing.

Produces distinct **title page + table of contents + body + appendices +
bottom line** with:

- Title block with a metadata key/value grid parsed from the
  `**Label:** value` lines immediately after the H1
- Table of contents with dotted leaders and live page numbers via CSS
  `leader(dotted)` + `target-counter(attr(href url), page)`
- Running page header (document title on left, "BL4A — Motion Control"
  on right) and `Page X of Y` footer
- Executive Summary callout box (amber)
- Bottom line callout box (green) on its own page
- Appendix H2s styled with a tinted purple band
- Each appendix starts on its own page for clean back-matter separation
- Forced page breaks before Recommended Fixes, Open Questions, and each
  Appendix
- Code blocks that wrap long lines rather than clipping at the right
  margin (7.25 pt monospace)
- DejaVu Sans / DejaVu Sans Mono typography

## Quick start

```bash
cd tasking/pdf-tools
uv sync                             # one-time: install weasyprint, markdown2
uv run python md2pdf.py ../DANGLE-Motion-Failure-Analysis.md ../DANGLE-Motion-Failure-Analysis.pdf
```

The resulting PDF is ~15 pages for a ~30 KB markdown file, ~110 KB on
disk (vs ~400 KB for the same document rendered via Chromium headless
print).

## Options

- `MD2PDF_DEBUG=1` — also write `<output>.debug.html` alongside the PDF
  so you can inspect the intermediate HTML or tweak the CSS
  interactively.

## What the tool expects in the source markdown

The front-matter parser expects the document to start with an H1 title
followed by one or more `**Label:** value` lines, then a `---`
separator, then the body:

```markdown
# Report Title

**Date:** 2026-04-11
**Prepared by:** Somebody
**Instrument:** BL4A

---

## First Section
...
```

If the metadata block is absent or formatted differently, the title
block will still render but the metadata grid will be empty.

## Section-level styling hooks

The tool adds CSS classes to certain H2s based on their text:

- `Appendix <letter>` — gets `class="appendix-heading"` (purple band,
  `break-before: page`)
- `Executive Summary` — the following content up to the next H2 is
  wrapped in `<section class="callout callout-summary">` (amber)
- `Bottom line` — same treatment with `callout-bottom` (green,
  `break-before: page`)
- `Recommended Fixes`, `Open Questions Still Unresolved`, `Appendix A`,
  `Bottom line` — get `class="new-page"` so they force a page break

## Implementation notes

- **Why markdown2 and not python-markdown?** markdown2 is a
  single-import pure-Python converter with all the extras we need
  (`tables`, `fenced-code-blocks`, `header-ids`, `cuddled-lists`,
  `strike`). python-markdown would also work but needs a plugin
  ecosystem for the same features.
- **Why WeasyPrint?** It supports the `leader()` CSS function and
  `target-counter(attr(href url), page)` which are essential for a
  proper table-of-contents with page numbers. Chromium headless does
  not implement `leader()`, and its `@page` support is limited.
- **Why 7.25 pt code blocks?** At 8 pt, the longest code lines in the
  DANGLE analysis (~130 characters, including the substitutions-file
  dumps and the Appendix A parameter history table) overflow the page
  content width. 7.25 pt + `white-space: pre-wrap` preserves all
  content and wraps only the lines that would otherwise clip.
- **ASCII-art trade-off.** With `white-space: pre-wrap`, an ASCII-art
  table whose lines are narrower than the page prints perfectly, but a
  line wider than the page wraps at a word boundary, which misaligns
  the columns on the wrapped portion. This is a preservation-vs-
  fidelity trade-off; the alternative (`pre` without wrap) would
  silently clip content. If a future report needs perfect ASCII
  preservation for a wide table, convert it to a proper HTML/markdown
  table in the source.

## Files

| File | Purpose |
|---|---|
| `md2pdf.py` | The tool. All logic and CSS embedded in one file. |
| `pyproject.toml` | uv project metadata + dependencies |
| `uv.lock` | Pinned dependency versions |
| `.gitignore` | Excludes `.venv/`, `__pycache__/`, `*.debug.html` |

## Not yet covered

- No syntax highlighting (Pygments is not wired in). All code blocks
  render with a single flat color. Adding highlighting would require
  switching to python-markdown with the `codehilite` extension, or
  invoking Pygments manually before the HTML is built.
- No automatic section numbering (e.g., "1.0", "1.1"). Easy to add via
  CSS counters if requested.
- Wide HTML tables that overflow the page are not specially handled.
  Currently they fit the DANGLE report, but a wider table would clip
  on the right. The fix when needed is landscape-orientation pages via
  a named `@page` rule and a CSS class on the table.
+676 −94

File changed.

Preview size limit exceeded, changes collapsed.