Skip to content

CommonMark ipynb export + image attachment embedding#1

Open
mmcky wants to merge 27 commits intomainfrom
myst-to-ipynb
Open

CommonMark ipynb export + image attachment embedding#1
mmcky wants to merge 27 commits intomainfrom
myst-to-ipynb

Conversation

@mmcky
Copy link

@mmcky mmcky commented Feb 25, 2026

Summary

Adds myst build --ipynb support that produces notebooks with plain CommonMark markdown cells — compatible with vanilla Jupyter Notebook, JupyterLab (without jupyterlab-myst), and Google Colab. Also adds an option to embed images as base64 cell attachments for fully self-contained notebooks.


What's Changed

Bug fixes (built on top of upstream PR jupyter-book#1882)

  • Kernelspec from frontmatterfrontmatter parameter was accepted but never used; now populates metadata.kernelspec and metadata.language_info correctly
  • Language detection — no longer hardcoded to "python"; derives from frontmatter kernelspec
  • Log message — says "Exported IPYNB" instead of "Exported MD"
  • +++ markers — stripped globally from markdown cells (was only matching start of string)
  • Package homepage — corrected URL from myst-to-md to myst-to-ipynb

Additionally fixes two issues discovered during real-world validation against QuantEcon lectures:

  • Epigraph/pull-quote directives silently dropped during export (#7)
  • Cross-references with empty URLs in resolved nodes (#8)

CommonMark serialization mode

New markdown: commonmark option in export config triggers an AST pre-transform that converts MyST-specific nodes to CommonMark equivalents before writeMd serialization.

exports:
  - format: ipynb
    markdown: commonmark

18 directive/role mappings implemented:

MyST Node CommonMark Output
math block $$...$$
inlineMath $...$
admonition > **Title** ... blockquote
exercise **Exercise N** ...
solution **Solution** ... (or dropped via dropSolutions)
proof/theorem/lemma **Theorem N (Title)** ...
tabSet Bold tab titles + content
container (figure) ![alt](url) + italic caption
container (table) Bold caption + GFM table
card / grid / details / aside Appropriate blockquote/bold fallbacks
mystDirective / mystRole Unwrapped children
mystTarget / comment Dropped
code blocks Stripped MyST-specific attributes
Node identifier/label Stripped to prevent (id)= prefixes

Key design: uses { type: 'html' } AST nodes for math to prevent mdast-util-to-markdown from escaping LaTeX special characters.

Image attachment embedding

New images: attachment option embeds images as base64 cell attachments for self-contained notebooks.

exports:
  - format: ipynb
    markdown: commonmark
    images: attachment

Two-phase hybrid architecture:

  1. Phase 1 (myst-cli): collectImageData() walks AST image nodes, resolves filesystem paths, reads & base64-encodes into Record<url, ImageData>
  2. Phase 2 (myst-to-ipynb): embedImagesAsAttachments() post-serialization regex rewrites ![alt](url)![alt](attachment:name) with cell attachments field

Validated on 24 QuantEcon lectures: 50 images across 12 notebooks embedded, 0 external references remaining.

Epigraph / pull-quote serialization (fixes #7)

The myst-to-md container handler only handled figure, table, and code kinds — quote containers (produced by {epigraph} and {pull-quote} directives) were silently dropped.

Fix: Added kind === 'quote' branch that:

  • Finds the blockquote child and serializes it via the default blockquote handler
  • Appends optional caption as > — Attribution line

Cross-reference URL fallback (fixes #8)

When MyST resolves same-page cross-references (e.g., {ref}, {eq}), the addChildrenFromTargetNode transform sets html_id on the node but doesn't always propagate identifier or label. The myst-to-md serializer was only checking urlSource → #label → #identifier, resulting in empty [text]() links.

Fix: Extended the URL fallback chain to: urlSource → #label → #identifier → #html_id → url → ''

Note: A third case was identified where {ref} roles with multi-line bodies fail to parse entirely. This is an upstream parser limitation filed as jupyter-book/mystmd#2724.

Exercise / solution code cell lifting

Exercises and solutions can contain code-cell nodes that should become executable notebook cells. These were being serialized as markdown instead of being lifted to top-level code cells.

Fix: liftCodeCellsFromGatedNodes() pre-processes the AST to extract code cells from exercise/solution containers, splitting surrounding markdown content into separate cells. Handles both direct gated nodes and blocks wrapping gated nodes.

Documentation

  • New docs/creating-notebooks.md — comprehensive ipynb export guide
  • Updated docs/documents-exports.md — added ipynb to format table
  • Updated docs/frontmatter.md — added ipynb to format values
  • Updated docs/myst.yml — TOC entry for creating-notebooks
  • Updated packages/myst-to-ipynb/README.md — features and usage

Test suite — 154 tests passing

myst-to-ipynb (55 tests)

File Tests Coverage
basic.yml 13 Core: styles, headings, code, links, images, block markers
frontmatter.yml 4 Kernelspec: Python, Julia, Python3, R
commonmark.yml 18 All directive/role mappings, identifier stripping
attachments.yml 5 Integration: single/multi image, dedup, no-match, no-data
attachments.spec.ts 7 Unit: basename, embed/skip/dedup logic
run.spec.ts 8 Exercise/solution code cell lifting

myst-to-md (99 tests, 6 new)

File New Tests Coverage
directives.yml 3 Epigraph, epigraph with attribution, pull-quote
references.yml 3 URL fallback for remote refs, html_id heading, html_id equation

Files Changed (36 files, +3122 / -11)

New files

  • packages/myst-to-ipynb/src/commonmark.ts — AST pre-transform (~500 lines, 18 node types)
  • packages/myst-to-ipynb/src/attachments.ts — Post-serialization image embedding (~100 lines)
  • packages/myst-to-ipynb/src/types.ts — Shared ImageData interface
  • packages/myst-to-ipynb/tests/commonmark.yml — 18 CommonMark mode tests
  • packages/myst-to-ipynb/tests/attachments.yml — 5 attachment integration tests
  • packages/myst-to-ipynb/tests/attachments.spec.ts — 7 attachment unit tests
  • docs/creating-notebooks.md — New documentation page

Modified files

  • packages/myst-to-ipynb/src/index.tsIpynbOptions, attachment wiring, empty cell filter, exercise lifting
  • packages/myst-to-ipynb/tests/run.spec.ts — Test runner supporting frontmatter + options in YAML
  • packages/myst-to-ipynb/tests/basic.yml — Expanded to 13 tests
  • packages/myst-to-ipynb/tests/frontmatter.yml — 4 kernelspec tests
  • packages/myst-to-ipynb/package.json — Fixed homepage URL
  • packages/myst-to-ipynb/README.md — Updated documentation
  • packages/myst-cli/src/build/ipynb/index.tscollectImageData(), options passthrough, log fix
  • packages/myst-to-md/src/directives.ts — Epigraph/pull-quote container handler
  • packages/myst-to-md/src/references.ts — Cross-reference URL fallback chain with html_id
  • packages/myst-to-md/tests/directives.yml — 3 epigraph/pull-quote tests
  • packages/myst-to-md/tests/references.yml — 3 cross-reference fallback tests
  • docs/documents-exports.md — ipynb in format table
  • docs/frontmatter.md — ipynb in format values
  • docs/myst.yml — TOC entry

Tracking issue: QuantEcon/mystmd#2 (full PLAN)
Related: QuantEcon/meta#292 · jupyter-book/mystmd#1882
Real-world validation: QuantEcon/lecture-python-programming.myst#363 — 24 lectures, 0 MyST syntax leaks


@mmcky mmcky requested a review from Copilot February 25, 2026 04:28
@mmcky mmcky marked this pull request as ready for review February 25, 2026 04:33
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds first-class .ipynb export support to the MyST toolchain, including CommonMark-friendly notebook markdown output and optional image embedding via Jupyter cell attachments.

Changes:

  • Introduces myst-to-ipynb package to serialize MyST mdast into Jupyter notebook JSON, with optional CommonMark transforms.
  • Adds image attachment embedding support (collect image bytes in myst-cli, rewrite image references + add attachments in myst-to-ipynb).
  • Wires ipynb into myst-frontmatter export types/validation, myst-cli build/export flows + --ipynb CLI option, and adds docs.

Reviewed changes

Copilot reviewed 32 out of 32 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
packages/myst-to-ipynb/tsconfig.json Adds TS build configuration for the new package.
packages/myst-to-ipynb/package.json Defines new package metadata, deps, and build/test scripts.
packages/myst-to-ipynb/README.md Documents notebook export features and frontmatter usage.
packages/myst-to-ipynb/CHANGELOG.md Adds initial changelog stub for the package.
packages/myst-to-ipynb/.eslintrc.cjs Enables linting via shared config.
packages/myst-to-ipynb/src/index.ts Implements mdast → ipynb conversion, CommonMark option, and attachment embedding hook.
packages/myst-to-ipynb/src/commonmark.ts Adds AST transforms to convert MyST constructs into CommonMark-compatible mdast.
packages/myst-to-ipynb/src/attachments.ts Adds markdown post-processing to rewrite images into attachment: references + build attachments dict.
packages/myst-to-ipynb/src/types.ts Defines ImageData shape for attachment embedding.
packages/myst-to-ipynb/tests/run.spec.ts Snapshot-style YAML-driven test runner for mdast→ipynb serialization.
packages/myst-to-ipynb/tests/basic.yml Adds coverage for baseline markdown/cell behaviors.
packages/myst-to-ipynb/tests/commonmark.yml Adds coverage for CommonMark conversion behaviors.
packages/myst-to-ipynb/tests/frontmatter.yml Adds coverage for kernelspec/frontmatter→notebook metadata.
packages/myst-to-ipynb/tests/attachments.yml Adds end-to-end coverage for attachment embedding behavior in produced notebooks.
packages/myst-to-ipynb/tests/attachments.spec.ts Unit tests for embedImagesAsAttachments.
packages/myst-to-ipynb/tests/example.ipynb Adds an example notebook fixture.
packages/myst-frontmatter/src/exports/types.ts Adds ipynb to ExportFormats.
packages/myst-frontmatter/src/exports/validators.ts Recognizes .ipynb extension as ipynb export format.
packages/myst-cli/package.json Adds dependency on myst-to-ipynb.
packages/myst-cli/src/cli/options.ts Adds --ipynb CLI option helper.
packages/myst-cli/src/cli/build.ts Exposes --ipynb on myst build.
packages/myst-cli/src/build/build.ts Includes ipynb in format selection logic.
packages/myst-cli/src/build/build.spec.ts Updates format-selection tests to include ipynb.
packages/myst-cli/src/build/utils/collectExportOptions.ts Allows .ipynb as a valid output extension.
packages/myst-cli/src/build/utils/localArticleExport.ts Routes ExportFormats.ipynb to the new export runner.
packages/myst-cli/src/build/ipynb/index.ts Implements runIpynbExport and image-data collection for attachments.
docs/myst.yml Adds notebooks page to docs navigation.
docs/frontmatter.md Documents ipynb as a supported export format.
docs/documents-exports.md Adds ipynb to export overview + CLI examples.
docs/creating-notebooks.md New documentation page for notebook exporting.
.changeset/witty-tigers-hunt.md Declares patch releases for affected packages.
.changeset/config.json Groups versioning for myst-to-ipynb alongside related packages.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

agoose77 and others added 21 commits February 27, 2026 14:36
- Use frontmatter kernelspec to populate notebook metadata (name,
  display_name, language) instead of ignoring the frontmatter parameter
- Derive language_info.name from frontmatter instead of hardcoding 'python'
- Strip leading +++ block markers from markdown cells (MyST-specific
  separators that have no meaning in notebooks)
- Fix log message from 'Exported MD' to 'Exported IPYNB'
- Fix package.json homepage URL to point to myst-to-ipynb (not myst-to-md)

Ref: QuantEcon/meta#292
Add AST pre-transform that converts MyST-specific nodes to CommonMark
equivalents before markdown serialization, producing notebooks compatible
with vanilla Jupyter Notebook and Google Colab.

New option: markdown: 'commonmark' (default: 'myst')

Transforms implemented:
- math block directive to $$ delimiters
- inline math role to $ delimiters
- admonition to blockquote with bold title
- exercise to bold header with content
- solution to bold header with content (or dropped via option)
- proof/theorem/lemma to bold header with content
- tab-set to bold tab titles with tab content
- figure to image + italic caption
- table container to bold caption + table
- card/grid to unwrapped content
- details to blockquote with summary title
- aside/sidebar to blockquote
- mystDirective/mystRole to unwrapped content or plain text

Uses html-type AST nodes for math content to prevent the markdown
serializer from escaping LaTeX special characters (underscores, etc).

CLI wiring: reads 'markdown: commonmark' from export config in myst.yml.

Ref: QuantEcon/meta#292
- Rewrite basic.yml with 13 proper YAML-object test cases (was 2 active)
- Add frontmatter.yml with 4 kernelspec/metadata test cases
- Add commonmark.yml with 13 CommonMark-mode test cases covering:
  inline math, math blocks, admonitions, exercises, theorems,
  tabSets, solutions (kept/dropped), underscore preservation
- Update run.spec.ts to support frontmatter and options fields in
  YAML test cases, enabling CommonMark and metadata tests

Ref: QuantEcon/meta#292
…er empty cells

Real-world validation with QuantEcon lecture content revealed:
- myst-to-md labelWrapper was adding (identifier)= prefixes to headings,
  paragraphs, blockquotes, and lists with identifier/label properties
- mystTarget nodes need to be dropped in CommonMark mode
- comment nodes (% syntax) need to be dropped in CommonMark mode
- code blocks with extra MyST attributes rendered as code-block directives
- +++ block markers appearing mid-cell (not just leading)
- Empty markdown cells from dropped nodes should be filtered out

Changes:
- commonmark.ts: strip identifier/label from all transformed children,
  add mystTarget and comment handlers, add code handler
- index.ts: filter empty markdown cells, fix stripBlockMarkers /gm regex
- commonmark.yml: add 5 new tests, update solution-dropped test
Standalone {image} directives with class/width/align properties were
being serialized as ```{image} directives by myst-to-md. Added
transformImage handler that strips directive-specific properties so
they render as plain ![alt](url) markdown syntax.

Found during full-project validation against lecture-python-programming.myst
(24 lectures, all clean after this fix).
Add 'images: attachment' option that embeds local images as base64
cell attachments in exported notebooks, producing self-contained
.ipynb files that don't depend on external image files.

Architecture (two-phase hybrid):
- Phase 1 (myst-cli): collectImageData() walks AST image nodes,
  resolves filesystem paths, reads files, and base64-encodes them
- Phase 2 (myst-to-ipynb): embedImagesAsAttachments() rewrites
  serialized markdown image refs to attachment: references

Usage in frontmatter:
  exports:
    - format: ipynb
      images: attachment

New files:
- packages/myst-to-ipynb/src/attachments.ts
- packages/myst-to-ipynb/tests/attachments.spec.ts (7 tests)
- packages/myst-to-ipynb/tests/attachments.yml (5 tests)

47/47 tests passing.
- Fix prettier formatting in commonmark.ts, index.ts, and myst-cli ipynb/index.ts
- Add docs/creating-notebooks.md with full ipynb export documentation
  (CommonMark markdown, image attachments, export options table)
- Add ipynb to export format table in docs/documents-exports.md
- Add --ipynb CLI example to docs/documents-exports.md
- Add ipynb to format list in docs/frontmatter.md
- Add creating-notebooks.md to docs/myst.yml TOC
- Update packages/myst-to-ipynb/README.md with features and usage
Move ImageData interface to shared types.ts so attachments.ts and
index.ts no longer import from each other. Fixes madge lint:circular
check.
Address Copilot review comments:
- Strip leading '/' from image URLs before path.join in collectImageData()
  so project-root URLs like '/_static/img/foo.png' resolve correctly.
- Fix misleading 'reverse order' comment in embedImagesAsAttachments().
Include nodes that have been resolved by includeDirectiveTransform
retain type 'include', causing myst-to-md to serialize them back as
```{include} directive syntax. Add an 'include' case to
transformNode() that unwraps resolved children into the parent,
so the included content (e.g. admonitions) is emitted as plain
CommonMark in notebook cells.
…b export

When gated syntax ({exercise-start}/{exercise-end}, {solution-start}/
{solution-end}) is used, joinGatesTransform nests all content between
the gates, including {code-cell} blocks, as children of the
exercise/solution node. During ipynb export these were absorbed into a
single markdown cell, silently dropping executable code cells.

Add liftCodeCellsFromGatedNodes() preprocessing step in writeIpynb that
detects exercise/solution nodes containing code-cell blocks and splits
them into alternating top-level markdown and code cells, preserving
document order. When dropSolutions is true, solution nodes are left
intact for transformToCommonMark to drop entirely.

Also fix stripBlockMarkers regex to handle +++ at end-of-string without
trailing newline, preventing empty markdown cells.

Closes #5
The previous fix (05bdc24) assumed exercise/solution nodes would be the
sole child of a block. In reality, blockNestingTransform groups all
consecutive non-block siblings into a single wrapper block, so the AST is:

  root > block { para, exercise {...}, solution {..., block{code}}, para }

The fix now scans inside each block's children for exercise/solution
nodes containing code-cell blocks, and splits the block accordingly.
Extracted helper functions for clarity:

- isGatedNodeWithCodeCells: identifies target nodes
- liftFromExerciseSolution: splits a single node's children
- splitBlockWithGatedNodes: processes a block with mixed children

Added tests for the shared-block structure (exercise + solution + other
content in the same block) and for dropSolutions with shared blocks.

Refs #5, #6
…k/ipynb export

The container handler in myst-to-md only handled figure, table, and code
kinds. Containers with kind 'quote' (produced by the epigraph, pull-quote,
and blockquote directives) fell through and returned empty string, silently
dropping all content during ipynb export.

Add a 'quote' branch that serializes the blockquote child as a standard
markdown blockquote, with optional attribution rendered as an em-dash line.

Closes #7
…back

Add MYST_DEBUG_XREF env var to dump full AST node details when a
crossReference resolves with an empty URL during CommonMark serialization.
This helps diagnose #8 where some {ref} roles produce
[text]() links in ipynb export.

Also add a defensive fallback to use node.url (set by
MultiPageReferenceResolver for cross-page refs) when urlSource, label,
and identifier are all missing. This prevents empty URLs for resolved
remote references without changing behaviour for any existing case.
…port

The reference resolver (addChildrenFromTargetNode) marks crossReferences as
resolved and sets html_id + kind, but for same-page targets the identifier
and label fields end up undefined. The CommonMark serializer then generates
empty URLs like [Section 7]().

Add html_id to the URL fallback chain:
  urlSource → #label → #identifier → #html_id → url → ''

This fixes all 23 unique empty-URL crossReferences found in the QuantEcon
lectures (headings, equations, exercises, code blocks, paragraphs).

Closes #8
The ad-hoc debug logging served its purpose for diagnosing the html_id
fallback issue and is no longer needed. A system-wide debug infrastructure
should be designed separately.
- Update regex to handle escaped brackets in alt text and escaped
  parentheses in URLs produced by mdast-util-to-markdown
- Unescape URLs before looking up in imageData dictionary
- Refactor to single-pass replacement using md.replace(regex, callback)
- Add tests for escaped parentheses in URLs and escaped brackets in alt text
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cross-references resolve with empty URLs in ipynb export Epigraph directive content silently dropped in ipynb export

4 participants