Skip to content

Latest commit

 

History

History
331 lines (266 loc) · 14.3 KB

File metadata and controls

331 lines (266 loc) · 14.3 KB

Parked experiments

Things we built, ran against real content, and decided not to ship. The point of this record is to save a future contributor (or future-us) the cost of re-exploring directions whose limits are already known.

For what's currently shipped, see design.md. For the active fine-blocks v1 docs, see fine-blocks.md.

Fine-blocks v2 (structural redesign)

Status: removed. v2 lived alongside v1 behind SCROLL_IMG_PROTO=fineblocksv2 for one development cycle, then got pulled. v1 (a 32,768-pattern brute-force PUA font) is what ships. The v2 code, vocabulary, and patcher scripts were all deleted; this section records the why so the same paths don't get walked again.

Motivation

Empirical analysis of v1's pattern usage on a 10-image corpus (NASA photographs + synthetic checker, ~67K rendered cells) suggested a structural redesign:

  • Pattern usage distribution: top 100 patterns covered 48.5% of cells; top 1,000 covered 73.8%; top 5,000 covered 98.7%.
  • 8,513 distinct patterns observed out of 32,768 available (26%). The other 74% never fired on this corpus.
  • 98.4% of cells fell into three structural classes: uniform (14%), single-island (73%), or two-island (11%).
  • Effective island count (min of on-components, off-components, accepting fg/bg swap): 0=14%, 1=73%, 2=11%, 3+=1.5%.
  • Top-50 highest-residual patterns were common patterns used 50–600 times each, with mean residual 7,000–12,000 (vs corpus mean ~5,000). These were "fallback" patterns picked when nothing better fit — a 3-colour cell approximated as 2-colour with visible error.

The hope: a curated structural vocabulary (single islands at quantised positions, two-island configurations, soft-boundary variants) would put glyph budget where it was actually useful and leave headroom to add contrast / density variants for the high-residual cells where v1's hard-binary fit was the worst.

What got built

  • Vocab generator (scripts/v2-shape-vocab.py, removed): hybrid corpus + procedural vocabulary. Final shipped count: 2,034 entries (1 class-0 uniform + 1,533 class-1 single-island
    • 500 class-2 two-island). Soft-boundary variants and shape-interior gradient variants brought the total to ~4,387.
  • Patcher (scripts/v2-patcher.py, removed): read vocab JSON, emitted ScrollV2-family TTF alongside the v1 Scroll font, plus Go codegen (v2_patterns.go) with codepoint↔mask tables.
  • Encoder (ProtocolFineBlocksV2 in internal/imgproto/, removed): same k-means on 3×5 sub-pixels as v1, but the final glyph lookup matched the resulting on/off mask against the v2 vocabulary by Hamming distance, with soft-variant rescoring by perceptual residual.

Why we pulled it

The phase-1 visual output looked "quite good" on the photo-sample corpus, but the project never got past three structural problems:

  1. Two complete pipelines to maintain — vocab generator, patcher, codegen, encoder, and a separate font family. Every change to the sub-pixel grid, vocabulary, or variant scheme required a coordinated rebuild of all five. v1 is a ~150-line k-means + a single PUA codepoint emit; v2 was an order of magnitude more code without an equivalent quality gain.
  2. Quality was on par, not better. Quantitative residual comparison vs v1 was never run, but visual A/B on the corpus showed v2 won some cells (cleaner soft boundaries, fewer blotchy edges) and lost others (the curated vocabulary sometimes had no exact match where v1's brute-force enumeration did, so v2 quantised to a structurally similar but less faithful glyph). Net: a wash on photographs, a small win on diagrams, neither big enough to justify the maintenance cost.
  3. Inter-cell coordination was the real problem. The highest-residual cells in v1 weren't in 3+-island content — they were in smooth gradients where adjacent cells' independent k-means chose slightly different colour pairs. None of the within-cell glyph richness v2 added could fix that; a fix needs cross-cell coordination at encoding time.

We removed v2 outright rather than parking it on a branch — keeping a 52K-line v2_patterns.go table around as a museum piece costs more in code review and IDE indexing than it saves in archaeology. The history is in this doc.

Phase-2 experiments (within v2, all parked / rejected)

These were tried inside the v2 pipeline. Their lessons apply to v1 too if anyone tries to extend its quality.

2a.1 — Class-0 gradient glyphs (rejected)

Idea: add 64 cell-wide gradient glyphs (8 directions × 8 spans). Encoder evaluates gradient residual alongside binary and picks the lowest.

Result: gradients won residual too often, even on fine-detail cells. Visual effect: details smeared, edges lost.

Why it failed: a gradient "average-fits" bimodal sub-pixels with a fractional coverage, producing a moderate residual even when the cell has a real hard edge. Binary residual for the same cell is similar; gradient wins marginally, but the perceived result is worse because the edge is blurred.

A gradientBiasFactor = 0.65 (require gradients to beat binary by 35%+) made gradients never win in practice — no visual change, removed entirely.

2a.2 — Soft-boundary shape variants (shipped in v2, removed with v2)

Idea: for each class-1 / class-2 binary mask, generate a "soft" variant where non-mask sub-pixels adjacent to the mask boundary render at partial coverage (0.35 initially, 0.5 after tuning). Encoder evaluates the matched binary shape AND its soft variant; picks whichever has lower residual.

Result: shipped within v2. Soft variants won ~10–45% of cells per image depending on content. Cell-interior transitions visibly softer without blurring structural edges. Lost when v2 was removed — soft variants share v2's vocabulary infrastructure and don't apply directly to v1's brute-force enumeration. A v1 port would need a parallel soft-variant codepoint range plus encoder rescoring; not done.

Why it worked: soft variants share the shape of their binary counterpart — they only modify the boundary sub-pixels. Per-cell residual comparison correctly identifies which cells benefit and which don't, without the "average-fit wins" failure mode of pure gradients.

Note: soft variants did NOT address inter-cell colour steps (the aurora-halo artefact), because the step is between-cell not within-cell.

2a.5 — F4: post-encoding cross-cell colour nudge (rejected)

Idea: after all cells are encoded, walk 4-neighbour pairs and nudge fg↔fg and bg↔bg together when perceptually close. Threshold + nudge strength configurable via SCROLL_V2_F4.

Result: blotchy, textured noise where the input was smooth gradient. Visibly worse than baseline at any setting.

Why it failed: positional matching (fg↔fg, bg↔bg) assumes neighbouring cells' k-means clusters correspond positionally. They don't. In a gradient, one cell's "fg" may be the next cell's "bg" in absolute luminance terms. Forcing them together destroys the gradient relationship the encoder was trying to preserve.

Correct fix would need to respect luminance-relative roles (brighter↔brighter, darker↔darker) or operate before encoding (shared palette / colour error diffusion).

2a.6 — Shared colour palette via median-cut + snap (rejected)

Idea: before per-cell encoding, build a shared colour palette for the image (median-cut, configurable size 32/64/128/256). After each cell's k-means, snap its (fg, bg) to nearest palette entries so neighbouring cells coordinate their colour choices.

Result: slightly worse than baseline at 128 colours; actively worse at 64. Never better.

Why it failed: palette is a quantising tool, not a smoothing one. Binning colours to a palette sharpens decisions: two adjacent cells whose source colours are barely on opposite sides of a palette-entry boundary snap to different entries → visible step, same as before, possibly sharper.

2a.7 — Regional palettes (rejected)

Idea: divide the image into a tile grid (2×2, 3×3, 4×4), build a separate palette per tile.

Result: higher contrast and more texture than global palette; never produced smoothing.

Why it failed: same fundamental issue as 2a.6 — palette is quantising, not smoothing. Regional palettes quantise each region more finely within its local range, which makes colour boundaries within each region sharper, not softer. Tile boundaries also introduce their own seams.

2c — 4×6 grid (shipped in v2, removed with v2)

Idea: bump sub-pixel grid from 3×5 (15 sub-pixels) to 4×6 (24 sub-pixels). Mask-width 15 bits → 24 bits. Sub-pixel aspect-ratio improves (1.67:1 → 1.5:1 in a 2:1 terminal cell).

Result: shipped within v2. Visually noticeable improvement in detail clarity vs 3×5. Lost when v2 was removed — v1 hard-codes 3×5 because its brute-force enumeration depends on the 15-bit pattern space fitting in PUA-B without compromise. A v1 port to 4×6 would need 16M codepoints; not feasible in a single SFNT font.

Failed-fix lessons (apply to v1 too)

The aurora halo / inter-cell colour-step artefact has a clear shape now:

  • It's caused by per-cell k-means independently picking slightly different colour pairs for adjacent cells in smooth regions.
  • Anything that operates by post-hoc smoothing, colour nudging, or palette quantisation of already-encoded cells produces sharpening, not smoothing. Confirmed across F4, global palette, and regional palette.
  • A real fix would need either cross-cell coordination at encoding time (sequential k-means with neighbour-aware initialisation) or colour error diffusion at the encoder level (Floyd-Steinberg, but distributing the snap residual to neighbour-cell colour targets, not glyph patterns). Neither is straightforward, both kill parallelism in the encoder. Untried; worth a fresh attempt only if the artefact becomes the dominant visible flaw.

Things still on the table

If someone wants to push fine-blocks quality further on v1, the genuinely promising ideas — none of them tried in v2 — are:

  • Sub-cell origin search: for each image, try several fractional-cell offsets, pick the one with lowest summed per-cell residual. Shifts sharp edges into cell interiors where the 2-colour fit is better. Cost: ~16 trials × ~30ms/image = ~500ms per image. See roadmap.md.
  • Cross-cell coordination at encoding time (above).
  • Source-side bilateral filter in low-variance regions only: smooths source noise before it gets amplified by per-cell encoding. Different target — addresses noise in the source, not artefacts of encoding.

Mermaid via mmdc + tesseract OCR

Status: removed. This was the original mermaid renderer before termaid became the default. Both the mmdc and OCR paths got pulled when termaid proved sufficient.

What it did

For each ```mermaid code block:

  1. mmdc (@mermaid-js/mermaid-cli) shelled out to render the diagram source to a PNG via headless Chromium. Cached per-content-hash under $XDG_CACHE_HOME/scroll/mermaid/. The PNG was then fed through scroll's imgproto pipeline (fineblocks / blocks) for terminal display.
  2. tesseract (opt-in via SCROLL_MERMAID_OCR=1) ran on the rendered PNG, returned word-level bounding boxes, and the image renderer overlaid those characters as literal text on top of the fineblocks-approximated cells so labels stayed sharp.

Mode selection via SCROLL_MERMAID_ENGINE (termaid / mmdc / auto). Default was auto: prefer termaid when installed, fall back to mmdc.

Why we pulled it

  • Chromium dependency. mmdc pulls in Puppeteer + a headless Chromium download, ~300MB on first install. For a tool whose point is "just a binary", that's an absurd transitive cost.
  • Per-render cold start: 1–2s of Chromium spin-up per diagram, even with caching warm — the cache hit avoids the re-render but not the process invocation. Live reload felt janky.
  • --no-sandbox requirement. Ubuntu 23.10+ and other distros with AppArmor restrictions on unprivileged user namespaces can't launch Chromium's default sandbox; we had to write a Puppeteer config disabling it. Safe enough for our use case (locally-provided mermaid source) but a flag we'd rather not need.
  • OCR accuracy was poor at typical render scales. tesseract with --psm 11 --dpi 300 and a luminance-aware invert-for-dark-mode preprocess still missed too much text to be reliably helpful — the overlay either covered legible fineblocks output with worse OCR-misread characters or couldn't justify its setup cost. Disabled by default the whole time.
  • Termaid does the same job without any of the above. Direct Unicode box-drawing output, text labels stay as real text by construction, no PNG round-trip, no graphics protocol required, no external rendering process.

What got removed

  • internal/render/mermaid.go — mmdc shell-out + content-hash caching.
  • internal/render/ocr.go — tesseract shell-out, dark-mode invert preprocess, TSV parsing.
  • imgproto.OCRBox and imgproto.RenderWithOCR — the API surface that fed OCR boxes into the image renderer.
  • The OCR-overlay code path inside the fine-blocks encoder.
  • SCROLL_MERMAID_ENGINE, SCROLL_MERMAID_THEME, SCROLL_MERMAID_OCR env vars.

When this might come back

A reason to revive mmdc would be diagrams termaid can't render — termaid covers flowcharts, sequence, state, class, gantt; it doesn't cover mindmap, timeline, quadrantChart, or some mermaid 11+ syntax. If those become important enough that shelling out to Chromium for them is worth it, the path back is:

  • Restore mermaid.go from this repo's git history.
  • Add a per-diagram-type dispatch (termaid for what it knows, mmdc for the rest) instead of the old auto fall-through.
  • Skip the OCR overlay — it didn't justify itself the first time and termaid solves the legibility problem for the diagrams it covers.

The text-overlay-on-image idea is more promising as a content-aware renderer for screenshots-with-text without external OCR — see the "in-canvas OCR-style text matching for non-mermaid images" roadmap item.