-
-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Problem
There are three or more main Rust-based ASCIIdoc parsers. Each one has a preferred use-case but with tradeoffs. As a community, it would be in our interest to reduce duplicate effort and to provide a single hybrid approach that follows specification while maintaining performance optimizations.
Rust AsciiDoc Parsers Compared
As of early 2026, the Rust ecosystem for AsciiDoc is transitioning from experimental projects to spec-compliant implementations. Unlike Markdown (which has pulldown-cmark), AsciiDoc has a significantly more complex grammar, making high-fidelity Rust parsers difficult to build.
Primary Rust-Based AsciiDoc Parsers
| Feature | asciidoc-parser |
acdc (nlopes) |
asciidocr |
|---|---|---|---|
| Development Status | Active (v0.14+) | Active / Research | Active (v0.2.0) |
| Grammar Style | Spec-driven / Manual | PEG-based | Scanner/Parser split |
| Goal | Compliance with Eclipse Spec | Speed & Correctness | TCK Compliance |
| Completeness | High (Inlines, Lists, Attributes) | Moderate (Core elements) | Moderate |
| CLI Included | Yes | Yes (acdc-cli) |
Yes (asciidocr) |
| Performance | High (Zero-copy focus) | High (PEG-optimized) | Standard |
1. asciidoc-parser (scouten)
This is currently the most robust effort toward a production-ready, pure-Rust AsciiDoc processor.
- Design Philosophy: It employs "spec-driven development," mapping code coverage directly against the Eclipse Foundation's AsciiDoc Language Specification.
- Strengths: Strong support for inline substitutions (bold, italic, macros), document attributes, and complex list structures. It includes a built-in HTML5 backend.
- Limitations: Does not support UTF-16 (requires UTF-8), ignores
compat-mode, and does not support thebookdoctype yet.
2. acdc / acdc-parser (nlopes)
A high-performance parser designed with a focus on formal correctness using a Parsing Expression Grammar (PEG).
- Design Philosophy: Uses the
pegcrate for grammar definition. It utilizes a two-pass inline processing system: first identifying boundaries, then parsing content. - Strengths: Extremely fast and "fail-fast" by design. Includes an experimental Language Server Protocol (LSP) for editor support.
- Limitations: Known gaps in table spanning (row/column spans) and specific nested inline markup (e.g., bold inside links).
3. asciidocr
A newer implementation focused on passing the official Technology Compatibility Kit (TCK).
- Design Philosophy: Implements a standard scanner and parser architecture with a focus on creating a compatible Abstract Syntax Tree (AST).
- Strengths: Provides clear library access to the scanner and AST, making it useful for developers building custom tooling or converters.
Technical Comparison of Parsing Strategies
The complexity of AsciiDoc requires different handling of "Inlines" versus "Blocks."
- Block Parsing: All three libraries handle block-level elements (headings, paragraphs, delimited blocks) relatively well using line-by-line scanning.
- Inline Substitution:
asciidoc-parseris the most mature here, handling the intricate "constrained vs. unconstrained" regex-like rules of AsciiDoc more reliably than PEG-based approaches, which can struggle with the "lookahead" required for AsciiDoc's non-regular inline syntax.
Recommendation for 2026
- For production/static sites: Use
asciidoc-parser. It has the highest feature parity with the Ruby reference implementation (Asciidoctor) and the best documentation coverage. - For IDE tooling/LSP: Look at
acdc. Its PEG grammar is better suited for the incremental parsing needed in text editors. - For custom backends:
asciidocrprovides the most accessible AST structures if you need to transform AsciiDoc into a proprietary format.
Goal: Consolidate projects
Consolidate asciidoc-parser, acdc (nlopes), and asciidocr by aligning them with the Eclipse Foundation’s AsciiDoc Language Specification. The primary obstacle to a single solution is the divergence in parsing architecture (PEG vs. Manual Recursive Descent).
Consolidation Roadmap
1. Unified Intermediate Representation (AST)
Establish a shared Abstract Syntax Tree (AST) crate. Currently, each project defines its own Node or Block enums.
- Action: Extract the AST definitions from
asciidocrorasciidoc-parserinto a standaloneasciidoc-astcrate. - Goal: Enable different parsing front-ends to target the same data structure, allowing back-ends (HTML, PDF, DocBook) to be shared.
2. TCK-Driven Validation
The AsciiDoc Technology Compatibility Kit (TCK) serves as the "source of truth."
- Action: Create a unified test runner that pulls the TCK JSON/YAML test suite.
- Goal: Move development from "feature-chasing" to "compliance-filling." A project that passes 100% of the TCK is the de facto winner; merging projects becomes a matter of adopting the logic that passes specific TCK chapters.
3. Hybrid Parsing Architecture
AsciiDoc's grammar is context-sensitive and non-regular, making pure PEG (used by acdc) difficult for complex inlines, while manual parsers (used by asciidoc-parser) are harder to maintain.
- Action: Adopt a "Lexical Functional" split. Use a fast scanner for block boundaries and a specialized Pratt parser or state machine for inline substitutions.
- Goal: Combine the performance of
acdcwith the correctness ofasciidoc-parser.
Integration Strategy
| Phase | Task | Primary Contributor |
|---|---|---|
| Phase A | Common Spec-compliant AST crate | asciidocr architecture |
| Phase B | High-performance Block Scanner | acdc logic |
| Phase C | Inline Substitution Engine | asciidoc-parser logic |
| Phase D | Standard Library / Prelude | Shared |
Contribution Path
The most efficient path to a "single solution" is contributing to asciidoc-parser (scouten), as it currently holds the closest alignment with the Eclipse specification.
- Audit Gaps: Run the TCK against all three.
- Port Logic: Identify specific features (e.g., Table Footnotes) present in one but missing in the other.
- Deprecate: Once a single crate surpasses the others in TCK compliance and performance, provide migration paths for the CLI tools of the smaller projects.