Skip to content

PLAN: ipynb Export with CommonMark Markdown for mystmd #2

@mmcky

Description

@mmcky

PLAN: ipynb Export with CommonMark Markdown for mystmd

Tracking issue: QuantEcon/meta#292
Branch: myst-to-ipynb on QuantEcon/mystmd (fork of jupyter-book/mystmd)
Date: 2026-02-25


Goal

Add myst build --ipynb support to mystmd that produces notebooks with plain
CommonMark markdown
cells — compatible with vanilla Jupyter Notebook, JupyterLab
(without jupyterlab-myst), and Google Colab.

PR jupyter-book#1882 already provides the infrastructure but delegates markdown cell content
to myst-to-md, which outputs MyST directive syntax (:::{note}, :::{figure},
etc.). We need a commonmark serialization mode so the output notebooks are
portable.


Phase 1 — Setup & Assess

  • Fork jupyter-book/mystmdQuantEcon/mystmd
  • Clone fork, checkout myst-to-ipynb branch
  • Build locally (npm install + npm run build — 36 packages, 21s)
  • Run myst build --ipynb on a sample project (synthetic test cases)
  • Test against lecture-python-programming.myst content (functions.md)
  • Catalog what renders correctly vs. breaks
  • Document findings below

Phase 1 Findings (2026-02-25)

Build & export works

  • myst build --ipynb successfully produces .ipynb files from MyST source
  • Code cells are correctly extracted from {code-cell} blocks
  • Cell splitting at code boundaries works properly
  • 48 cells generated from functions.md (26 code, 22 markdown) — correct ratio

Issues confirmed: MyST syntax in markdown cells

The following MyST-specific syntax appears in output notebook markdown cells and
will not render in vanilla Jupyter or Colab:

Issue MyST syntax in output Needed CommonMark Severity
Inline math roles {math}\E = mc^2`` $E = mc^2$ HIGH — pervasive
Math blocks ```{math}\n...\n``` $$\n...\n$$ HIGH — pervasive
Admonitions :::{note} Note\n...\n::: > **Note**\n>\n> ... HIGH — common
Figures :::{figure} path\n:name: ...\n::: ![Caption](path) HIGH — common
Tabs ::::{tab-set}\n:::{tab-item}...\n:::: Content preserved, wrapper stripped MEDIUM
Code blocks (non-executable) ```{code-block}\n...\n``` Plain fenced code MEDIUM
+++ markers Every markdown cell starts with +++\n Should be stripped HIGH — every cell
Exercise/Solution Unsupported → empty output (silently dropped) **Exercise N** / configurable MEDIUM
Proof/Theorem Unsupported → empty output (silently dropped) **Theorem** (Title)\n... MEDIUM
Raw blocks Unsupported → silently dropped Drop or preserve as HTML LOW

What works correctly (no changes needed)

  • Headings (#, ##, etc.)
  • Bold, italic, inline code
  • External links [text](url)
  • Blockquotes (>)
  • Bullet and numbered lists
  • Definition lists
  • Cross-references → rendered as [Theorem 1](#label) links (good!)
  • Code cell source preservation — exact match

Metadata issues

Architecture assessment for CommonMark mode

After reading the myst-to-md source:

  • myst-to-md has explicit handlers for every node type: directiveHandlers,
    roleHandlers, referenceHandlers, miscHandlers in separate files
  • Admonition handler calls writeFlowDirective(name) → always emits :::{name}
  • Math handler calls writeStaticDirective('math') → always emits ```{math}`
  • Inline math handler calls writeStaticRole('math') → always emits {math}\...``
  • The block handler in misc.ts emits +++ prefix for every block node

Recommended approach: Option B (AST pre-transform in myst-to-ipynb)

Rationale:

  1. myst-to-md is designed specifically to produce roundtrippable MyST — changing
    it risks breaking the MD export path
  2. The ipynb exporter already has the AST available before calling writeMd
  3. A pre-transform can walk the AST and replace directive nodes with their
    CommonMark-equivalent AST nodes (e.g., admonitionblockquote with bold
    title, math directive → plain text $$ block)
  4. After the transform, writeMd will naturally produce CommonMark because the
    AST no longer contains MyST-specific nodes
  5. This keeps myst-to-md unchanged and isolates all CommonMark logic in
    myst-to-ipynb

The transform would live in myst-to-ipynb/src/commonmark.ts and be applied
conditionally when the export config specifies markdown: commonmark.

Phase 2 — Bug Fixes (from meta#292 review)

# Bug Location Status
1 frontmatter parameter accepted but never used — should populate metadata.kernelspec and metadata.language_info myst-to-ipynb/src/index.ts ✅ Fixed
2 Language hardcoded to 'python' — should derive from frontmatter kernelspec myst-to-ipynb/src/index.ts ✅ Fixed
3 Log message says "Exported MD" instead of "Exported IPYNB" myst-cli/src/build/ipynb/index.ts ✅ Fixed
4 Redundant +++ markers leak into markdown cells (stated TODO) myst-to-ipynb/src/index.ts ✅ Fixed
5 package.json homepage URL points to myst-to-md not myst-to-ipynb myst-to-ipynb/package.json ✅ Fixed

All tests pass (vitest run — 3/3). Verified on real functions.md lecture content.

Phase 3 — CommonMark Serialization Mode ✅ COMPLETE

Committed as cb808aec on myst-to-ipynb branch.

What was implemented

Added commonmark.ts — an AST pre-transform (~465 lines) that converts MyST-specific
nodes to CommonMark equivalents before writeMd serialization.

Configuration:

# In page frontmatter or project exports:
exports:
  - format: ipynb
    markdown: commonmark    # default: 'myst' (existing behavior)

Directive → CommonMark mappings (all working)

MyST Node CommonMark Output Verified
math block $$..$$ (via html node — no LaTeX escaping)
inlineMath role $...$ (via html node)
admonition > **Title** blockquote
exercise **Exercise N** + content
solution **Solution** + content (or dropped via option)
proof/theorem/lemma **Theorem N (Title)** + content
tabSet Bold tab titles + tab content
container (figure) ![alt](url) + italic caption
container (table) Bold caption + GFM table
card Bold title + content
grid Unwrapped to child cards
details Blockquote with bold summary
aside Blockquote
mystDirective Unwrapped children or code block
mystRole Unwrapped children or plain text
code blocks Stripped MyST options (lang preserved)
mystTarget Dropped (no CommonMark equivalent)
comment Dropped (% syntax not valid in CommonMark)
Node identifier/label Stripped to prevent (id)= prefixes

Key design decisions

  1. html-type AST nodes for math: Used { type: 'html', value: '$$..$$' } instead
    of { type: 'text' } to prevent mdast-util-to-markdown from escaping LaTeX
    special characters (_, \, etc.)
  2. Bottom-up tree walk: Transforms process children first, so nested directives
    (e.g., exercise containing math) are handled correctly
  3. Deep clone: The original AST is cloned before CommonMark transform to avoid
    mutating cached data

Files changed

  • packages/myst-to-ipynb/src/commonmark.ts — NEW (465 lines)
  • packages/myst-to-ipynb/src/index.ts — Added IpynbOptions, transform wiring
  • packages/myst-cli/src/build/ipynb/index.ts — Passes markdown option from export config

Phase 4 — Tests & Validation ✅ COMPLETE

Unit tests committed as c1cca05f. Real-world validation fixes committed as 2d70076d.
Both pushed to QuantEcon/mystmd.

Test suite: 35 passing tests across 3 YAML files

File Tests Coverage
basic.yml 13 Core features: styles, headings, thematic break, blockquotes, lists, HTML, fenced code, code cells, mixed cells, block marker stripping, links, images, line breaks
frontmatter.yml 4 Kernelspec metadata: default Python, Julia kernel, Python3 kernel, R kernel
commonmark.yml 18 CommonMark mode: inline math ($), math blocks ($$), math with/without labels, underscores not escaped, admonitions→blockquote, admonitions preserved in myst mode, exercises with enumerator, theorems with title, tabSets→bold titles, solutions dropped/kept, frontmatter+CommonMark combined, heading/paragraph identifier stripping, mystTarget drop, comment drop, code block attribute stripping

Test infrastructure improvements

  • Rewrote run.spec.ts to support frontmatter and options fields in YAML test cases
  • Test runner auto-discovers all .yml files in the tests directory
  • IpynbOptions (including commonmark.dropSolutions) fully testable via YAML

Real-world validation: functions.md from lecture-python-programming.myst

Tested by exporting a real QuantEcon lecture file (48 cells: 26 code, 22 markdown)
using the local dev build of myst build --ipynb in /tmp/test-ipynb-export/.

Issues found and fixed (commit 2d70076d):

Issue Root Cause Fix
(pos_args)=, (recursive_functions)= etc. in output myst-to-md's labelWrapper adds (identifier)=\n prefix to headings/paragraphs/blockquotes/lists with identifier/label properties Strip identifier/label from all children after transformNode in transformToCommonMark
(index-vivo0ovzzj)= auto-generated labels Same root cause — {index} directives produce auto-generated identifiers Same fix
+++ markers mid-cell stripBlockMarkers regex only matched at start of string Changed regex to /^\+\+\+[^\n]*\n/gm (global multiline)
```{code-block} python\n:class: no-execute code nodes with extra MyST attributes rendered as directives Added code case to transformNodetransformCodeBlock() strips extra attributes
Empty cells from dropped nodes mystTarget / comment / dropped solution nodes leave empty markdown cells Added .filter() to remove empty markdown cells after transformation
mystTarget nodes Not handled in CommonMark mode Added case 'mystTarget': return null
% comment syntax Not handled in CommonMark mode Added case 'comment': return null

Result: 48 cells, 0 MyST syntax leaks (verified by automated audit script).

Remaining validation (manual)

  • Validate output notebooks open correctly in:
    • Jupyter Notebook (classic)
    • JupyterLab (no jupyterlab-myst)
    • Google Colab
  • Test with real QuantEcon lecture content (lecture-python-programming.myst) — ✅ functions.md clean
  • Test additional lecture files (more diverse MyST features)
  • Verify cell metadata passthrough (hide-input, remove-cell, etc.)

Phase 5 — Submit Upstream

  • Push commits to QuantEcon/mystmd myst-to-ipynb branch
  • Open PR against jupyter-book/mystmd myst-to-ipynb branch (or push
    directly if given access)
  • Coordinate review with @agoose77 and @rowanc1
  • Update QuantEcon/meta#292

Parallel Work

  • Update QuantEcon theme download button to serve built ipynb files
  • Wire CI in lecture-python-programming.myst to use custom mystmd build
    (until PR merges upstream)
  • Track QuantEcon/quantecon-theme-src#26
    (BinderHub launch support)

Key Files in This Branch

packages/myst-to-ipynb/          # New package — AST → ipynb conversion
  src/index.ts                   # Main export logic + IpynbOptions + empty cell filter
  src/commonmark.ts              # CommonMark AST pre-transform (Phase 3 + Phase 4 fixes)
  tests/run.spec.ts              # Test runner — loads all .yml, supports options
  tests/basic.yml                # 13 basic feature tests (Phase 4)
  tests/frontmatter.yml          # 4 kernelspec/metadata tests (Phase 4)
  tests/commonmark.yml           # 18 CommonMark-mode tests (Phase 4)
  package.json
packages/myst-to-md/             # Existing — AST → Markdown string (unchanged)
  src/index.ts
packages/myst-cli/
  src/build/ipynb/index.ts       # CLI wiring for `myst build --ipynb`
packages/myst-frontmatter/
  src/exports/validators.ts      # Export format validators (ipynb added)

Commit History

Commit Description
79c7be0b Phase 2: Bug fixes — kernelspec from frontmatter, language_info, log message, +++ stripping, homepage URL
f67f1188 Phase 3: CommonMark serialization mode — commonmark.ts (465 lines), IpynbOptions, CLI wiring
c1cca05f Phase 4: Expand test suite to 30 cases across 3 YAML files
2d70076d Phase 4: Real-world validation fixes — identifier/label stripping, mystTarget/comment drop, empty cell filter, code block attribute stripping, global +++ regex

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions