Skip to content

Roadmap: Reduce rust asciidoc parsers to 1 hybrid parser #336

@AMDphreak

Description

@AMDphreak

Problem

There are three or more main Rust-based ASCIIdoc parsers. Each one has a preferred use-case but with tradeoffs. As a community, it would be in our interest to reduce duplicate effort and to provide a single hybrid approach that follows specification while maintaining performance optimizations.

Rust AsciiDoc Parsers Compared

As of early 2026, the Rust ecosystem for AsciiDoc is transitioning from experimental projects to spec-compliant implementations. Unlike Markdown (which has pulldown-cmark), AsciiDoc has a significantly more complex grammar, making high-fidelity Rust parsers difficult to build.

Primary Rust-Based AsciiDoc Parsers

Feature asciidoc-parser acdc (nlopes) asciidocr
Development Status Active (v0.14+) Active / Research Active (v0.2.0)
Grammar Style Spec-driven / Manual PEG-based Scanner/Parser split
Goal Compliance with Eclipse Spec Speed & Correctness TCK Compliance
Completeness High (Inlines, Lists, Attributes) Moderate (Core elements) Moderate
CLI Included Yes Yes (acdc-cli) Yes (asciidocr)
Performance High (Zero-copy focus) High (PEG-optimized) Standard

1. asciidoc-parser (scouten)

This is currently the most robust effort toward a production-ready, pure-Rust AsciiDoc processor.

  • Design Philosophy: It employs "spec-driven development," mapping code coverage directly against the Eclipse Foundation's AsciiDoc Language Specification.
  • Strengths: Strong support for inline substitutions (bold, italic, macros), document attributes, and complex list structures. It includes a built-in HTML5 backend.
  • Limitations: Does not support UTF-16 (requires UTF-8), ignores compat-mode, and does not support the book doctype yet.

2. acdc / acdc-parser (nlopes)

A high-performance parser designed with a focus on formal correctness using a Parsing Expression Grammar (PEG).

  • Design Philosophy: Uses the peg crate for grammar definition. It utilizes a two-pass inline processing system: first identifying boundaries, then parsing content.
  • Strengths: Extremely fast and "fail-fast" by design. Includes an experimental Language Server Protocol (LSP) for editor support.
  • Limitations: Known gaps in table spanning (row/column spans) and specific nested inline markup (e.g., bold inside links).

3. asciidocr

A newer implementation focused on passing the official Technology Compatibility Kit (TCK).

  • Design Philosophy: Implements a standard scanner and parser architecture with a focus on creating a compatible Abstract Syntax Tree (AST).
  • Strengths: Provides clear library access to the scanner and AST, making it useful for developers building custom tooling or converters.

Technical Comparison of Parsing Strategies

The complexity of AsciiDoc requires different handling of "Inlines" versus "Blocks."

  • Block Parsing: All three libraries handle block-level elements (headings, paragraphs, delimited blocks) relatively well using line-by-line scanning.
  • Inline Substitution: asciidoc-parser is the most mature here, handling the intricate "constrained vs. unconstrained" regex-like rules of AsciiDoc more reliably than PEG-based approaches, which can struggle with the "lookahead" required for AsciiDoc's non-regular inline syntax.

Recommendation for 2026

  • For production/static sites: Use asciidoc-parser. It has the highest feature parity with the Ruby reference implementation (Asciidoctor) and the best documentation coverage.
  • For IDE tooling/LSP: Look at acdc. Its PEG grammar is better suited for the incremental parsing needed in text editors.
  • For custom backends: asciidocr provides the most accessible AST structures if you need to transform AsciiDoc into a proprietary format.

Goal: Consolidate projects

Consolidate asciidoc-parser, acdc (nlopes), and asciidocr by aligning them with the Eclipse Foundation’s AsciiDoc Language Specification. The primary obstacle to a single solution is the divergence in parsing architecture (PEG vs. Manual Recursive Descent).

Consolidation Roadmap

1. Unified Intermediate Representation (AST)

Establish a shared Abstract Syntax Tree (AST) crate. Currently, each project defines its own Node or Block enums.

  • Action: Extract the AST definitions from asciidocr or asciidoc-parser into a standalone asciidoc-ast crate.
  • Goal: Enable different parsing front-ends to target the same data structure, allowing back-ends (HTML, PDF, DocBook) to be shared.

2. TCK-Driven Validation

The AsciiDoc Technology Compatibility Kit (TCK) serves as the "source of truth."

  • Action: Create a unified test runner that pulls the TCK JSON/YAML test suite.
  • Goal: Move development from "feature-chasing" to "compliance-filling." A project that passes 100% of the TCK is the de facto winner; merging projects becomes a matter of adopting the logic that passes specific TCK chapters.

3. Hybrid Parsing Architecture

AsciiDoc's grammar is context-sensitive and non-regular, making pure PEG (used by acdc) difficult for complex inlines, while manual parsers (used by asciidoc-parser) are harder to maintain.

  • Action: Adopt a "Lexical Functional" split. Use a fast scanner for block boundaries and a specialized Pratt parser or state machine for inline substitutions.
  • Goal: Combine the performance of acdc with the correctness of asciidoc-parser.

Integration Strategy

Phase Task Primary Contributor
Phase A Common Spec-compliant AST crate asciidocr architecture
Phase B High-performance Block Scanner acdc logic
Phase C Inline Substitution Engine asciidoc-parser logic
Phase D Standard Library / Prelude Shared

Contribution Path

The most efficient path to a "single solution" is contributing to asciidoc-parser (scouten), as it currently holds the closest alignment with the Eclipse specification.

  1. Audit Gaps: Run the TCK against all three.
  2. Port Logic: Identify specific features (e.g., Table Footnotes) present in one but missing in the other.
  3. Deprecate: Once a single crate surpasses the others in TCK compliance and performance, provide migration paths for the CLI tools of the smaller projects.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions