Short-circuit inline parsing plain text by dillonkearns · Pull Request #149 · dillonkearns/elm-markdown

dillonkearns · 2026-02-05T14:46:40Z

Summary

This PR adds an early-exit optimization to the inline parser that skips expensive tokenization when text contains no special markdown characters. This speeds up parsing for plain text content.

Changes

hasAnyTokenChar uses String.any to efficiently check if text contains any markdown-relevant characters (`, *, _, ~, [, ], <, >, \n)
When no special characters are present, the tokenizer returns [] immediately, avoiding 8+ separate regex/pattern scans
For text with special characters, individual String.contains checks gate each tokenizer

Why multiple `String.contains` instead of single-pass?

I explored two alternative "cleaner" approaches and benchmarked them:

Alternative 1: Single-pass `String.foldl`

Build all character flags in one pass instead of multiple String.contains calls:

detectTokenChars : String -> TokenFlags
detectTokenChars str =
    String.foldl (\c flags -> case c of ...) emptyFlags str

Result: Slightly slower (0.080ms vs 0.078ms for plain text)

Alternative 2: Single-pass recursive tokenizer

Replace all regex-based tokenizers with a single character-by-character scan:

tokenizeLoop : String -> Int -> TokenizeState -> TokenizeState
tokenizeLoop rawText index state = ...

Result: Significantly slower (0.258ms vs 0.078ms for plain text — 3.3x regression)

Why native operations win

Multiple String.contains calls are actually faster than manual single-pass approaches because:

String.contains compiles to JavaScript's native indexOf, which is heavily optimized at the engine level
Regex.find similarly uses the browser's native regex engine
Manual character iteration in Elm incurs function call overhead for each character
Native string operations can short-circuit early when a match is found

The "inelegant" multiple-pass approach leverages these native optimizations, making it faster than conceptually cleaner single-pass alternatives.

Performance Impact

Plain text (no formatting): ~1.5x faster
Long unformatted lines: ~2x faster
Formatted content: No regression

Benchmarking

You can verify the results by running the benchmark script:

cd spec-tests
npx elm make OutputMarkdownHtml.elm --optimize --output elm.js
node benchmark.js

Sample results (on my machine):

Test Case	Before	After	Speedup
Plain text, no formatting (1400 chars)	0.114ms	0.078ms	1.5x
Plain text with newlines (1410 chars)	0.132ms	0.100ms	1.3x
Long line, no formatting (10k chars)	0.502ms	0.244ms	2.1x
Typical README (595 chars)	0.278ms	0.274ms	~same
README x10 (5950 chars)	2.376ms	2.290ms	~same
Large table (250 cells)	1.708ms	1.352ms	~same

…d expensive parsing for stretches of plain text.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

dillonkearns and others added 4 commits February 4, 2026 15:47

Check for special characters before performing inline parsing to avoi…

54f3dfb

…d expensive parsing for stretches of plain text.

Add benchmark script.

fafb6ea

Format InlineParser.elm with elm-format.

1a57a54

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add type annotations to let bindings for elm-review.

b0a1269

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

dillonkearns merged commit 6b8d7e5 into master Feb 6, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Short-circuit inline parsing plain text#149

Short-circuit inline parsing plain text#149
dillonkearns merged 4 commits intomasterfrom
shortcircuit-inline-parsing

dillonkearns commented Feb 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

dillonkearns commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Why multiple String.contains instead of single-pass?

Alternative 1: Single-pass String.foldl

Alternative 2: Single-pass recursive tokenizer

Why native operations win

Performance Impact

Benchmarking

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dillonkearns commented Feb 5, 2026 •

edited

Loading

Why multiple `String.contains` instead of single-pass?

Alternative 1: Single-pass `String.foldl`