CommonMark ipynb export + image attachment embedding#1
Open
Conversation
21 tasks
There was a problem hiding this comment.
Pull request overview
Adds first-class .ipynb export support to the MyST toolchain, including CommonMark-friendly notebook markdown output and optional image embedding via Jupyter cell attachments.
Changes:
- Introduces
myst-to-ipynbpackage to serialize MyST mdast into Jupyter notebook JSON, with optional CommonMark transforms. - Adds image attachment embedding support (collect image bytes in
myst-cli, rewrite image references + addattachmentsinmyst-to-ipynb). - Wires
ipynbintomyst-frontmatterexport types/validation,myst-clibuild/export flows +--ipynbCLI option, and adds docs.
Reviewed changes
Copilot reviewed 32 out of 32 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/myst-to-ipynb/tsconfig.json | Adds TS build configuration for the new package. |
| packages/myst-to-ipynb/package.json | Defines new package metadata, deps, and build/test scripts. |
| packages/myst-to-ipynb/README.md | Documents notebook export features and frontmatter usage. |
| packages/myst-to-ipynb/CHANGELOG.md | Adds initial changelog stub for the package. |
| packages/myst-to-ipynb/.eslintrc.cjs | Enables linting via shared config. |
| packages/myst-to-ipynb/src/index.ts | Implements mdast → ipynb conversion, CommonMark option, and attachment embedding hook. |
| packages/myst-to-ipynb/src/commonmark.ts | Adds AST transforms to convert MyST constructs into CommonMark-compatible mdast. |
| packages/myst-to-ipynb/src/attachments.ts | Adds markdown post-processing to rewrite images into attachment: references + build attachments dict. |
| packages/myst-to-ipynb/src/types.ts | Defines ImageData shape for attachment embedding. |
| packages/myst-to-ipynb/tests/run.spec.ts | Snapshot-style YAML-driven test runner for mdast→ipynb serialization. |
| packages/myst-to-ipynb/tests/basic.yml | Adds coverage for baseline markdown/cell behaviors. |
| packages/myst-to-ipynb/tests/commonmark.yml | Adds coverage for CommonMark conversion behaviors. |
| packages/myst-to-ipynb/tests/frontmatter.yml | Adds coverage for kernelspec/frontmatter→notebook metadata. |
| packages/myst-to-ipynb/tests/attachments.yml | Adds end-to-end coverage for attachment embedding behavior in produced notebooks. |
| packages/myst-to-ipynb/tests/attachments.spec.ts | Unit tests for embedImagesAsAttachments. |
| packages/myst-to-ipynb/tests/example.ipynb | Adds an example notebook fixture. |
| packages/myst-frontmatter/src/exports/types.ts | Adds ipynb to ExportFormats. |
| packages/myst-frontmatter/src/exports/validators.ts | Recognizes .ipynb extension as ipynb export format. |
| packages/myst-cli/package.json | Adds dependency on myst-to-ipynb. |
| packages/myst-cli/src/cli/options.ts | Adds --ipynb CLI option helper. |
| packages/myst-cli/src/cli/build.ts | Exposes --ipynb on myst build. |
| packages/myst-cli/src/build/build.ts | Includes ipynb in format selection logic. |
| packages/myst-cli/src/build/build.spec.ts | Updates format-selection tests to include ipynb. |
| packages/myst-cli/src/build/utils/collectExportOptions.ts | Allows .ipynb as a valid output extension. |
| packages/myst-cli/src/build/utils/localArticleExport.ts | Routes ExportFormats.ipynb to the new export runner. |
| packages/myst-cli/src/build/ipynb/index.ts | Implements runIpynbExport and image-data collection for attachments. |
| docs/myst.yml | Adds notebooks page to docs navigation. |
| docs/frontmatter.md | Documents ipynb as a supported export format. |
| docs/documents-exports.md | Adds ipynb to export overview + CLI examples. |
| docs/creating-notebooks.md | New documentation page for notebook exporting. |
| .changeset/witty-tigers-hunt.md | Declares patch releases for affected packages. |
| .changeset/config.json | Groups versioning for myst-to-ipynb alongside related packages. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This was referenced Feb 25, 2026
- Use frontmatter kernelspec to populate notebook metadata (name, display_name, language) instead of ignoring the frontmatter parameter - Derive language_info.name from frontmatter instead of hardcoding 'python' - Strip leading +++ block markers from markdown cells (MyST-specific separators that have no meaning in notebooks) - Fix log message from 'Exported MD' to 'Exported IPYNB' - Fix package.json homepage URL to point to myst-to-ipynb (not myst-to-md) Ref: QuantEcon/meta#292
Add AST pre-transform that converts MyST-specific nodes to CommonMark equivalents before markdown serialization, producing notebooks compatible with vanilla Jupyter Notebook and Google Colab. New option: markdown: 'commonmark' (default: 'myst') Transforms implemented: - math block directive to $$ delimiters - inline math role to $ delimiters - admonition to blockquote with bold title - exercise to bold header with content - solution to bold header with content (or dropped via option) - proof/theorem/lemma to bold header with content - tab-set to bold tab titles with tab content - figure to image + italic caption - table container to bold caption + table - card/grid to unwrapped content - details to blockquote with summary title - aside/sidebar to blockquote - mystDirective/mystRole to unwrapped content or plain text Uses html-type AST nodes for math content to prevent the markdown serializer from escaping LaTeX special characters (underscores, etc). CLI wiring: reads 'markdown: commonmark' from export config in myst.yml. Ref: QuantEcon/meta#292
- Rewrite basic.yml with 13 proper YAML-object test cases (was 2 active) - Add frontmatter.yml with 4 kernelspec/metadata test cases - Add commonmark.yml with 13 CommonMark-mode test cases covering: inline math, math blocks, admonitions, exercises, theorems, tabSets, solutions (kept/dropped), underscore preservation - Update run.spec.ts to support frontmatter and options fields in YAML test cases, enabling CommonMark and metadata tests Ref: QuantEcon/meta#292
…er empty cells Real-world validation with QuantEcon lecture content revealed: - myst-to-md labelWrapper was adding (identifier)= prefixes to headings, paragraphs, blockquotes, and lists with identifier/label properties - mystTarget nodes need to be dropped in CommonMark mode - comment nodes (% syntax) need to be dropped in CommonMark mode - code blocks with extra MyST attributes rendered as code-block directives - +++ block markers appearing mid-cell (not just leading) - Empty markdown cells from dropped nodes should be filtered out Changes: - commonmark.ts: strip identifier/label from all transformed children, add mystTarget and comment handlers, add code handler - index.ts: filter empty markdown cells, fix stripBlockMarkers /gm regex - commonmark.yml: add 5 new tests, update solution-dropped test
Standalone {image} directives with class/width/align properties were
being serialized as ```{image} directives by myst-to-md. Added
transformImage handler that strips directive-specific properties so
they render as plain  markdown syntax.
Found during full-project validation against lecture-python-programming.myst
(24 lectures, all clean after this fix).
Add 'images: attachment' option that embeds local images as base64
cell attachments in exported notebooks, producing self-contained
.ipynb files that don't depend on external image files.
Architecture (two-phase hybrid):
- Phase 1 (myst-cli): collectImageData() walks AST image nodes,
resolves filesystem paths, reads files, and base64-encodes them
- Phase 2 (myst-to-ipynb): embedImagesAsAttachments() rewrites
serialized markdown image refs to attachment: references
Usage in frontmatter:
exports:
- format: ipynb
images: attachment
New files:
- packages/myst-to-ipynb/src/attachments.ts
- packages/myst-to-ipynb/tests/attachments.spec.ts (7 tests)
- packages/myst-to-ipynb/tests/attachments.yml (5 tests)
47/47 tests passing.
- Fix prettier formatting in commonmark.ts, index.ts, and myst-cli ipynb/index.ts - Add docs/creating-notebooks.md with full ipynb export documentation (CommonMark markdown, image attachments, export options table) - Add ipynb to export format table in docs/documents-exports.md - Add --ipynb CLI example to docs/documents-exports.md - Add ipynb to format list in docs/frontmatter.md - Add creating-notebooks.md to docs/myst.yml TOC - Update packages/myst-to-ipynb/README.md with features and usage
Move ImageData interface to shared types.ts so attachments.ts and index.ts no longer import from each other. Fixes madge lint:circular check.
Address Copilot review comments: - Strip leading '/' from image URLs before path.join in collectImageData() so project-root URLs like '/_static/img/foo.png' resolve correctly. - Fix misleading 'reverse order' comment in embedImagesAsAttachments().
Include nodes that have been resolved by includeDirectiveTransform
retain type 'include', causing myst-to-md to serialize them back as
```{include} directive syntax. Add an 'include' case to
transformNode() that unwraps resolved children into the parent,
so the included content (e.g. admonitions) is emitted as plain
CommonMark in notebook cells.
…b export
When gated syntax ({exercise-start}/{exercise-end}, {solution-start}/
{solution-end}) is used, joinGatesTransform nests all content between
the gates, including {code-cell} blocks, as children of the
exercise/solution node. During ipynb export these were absorbed into a
single markdown cell, silently dropping executable code cells.
Add liftCodeCellsFromGatedNodes() preprocessing step in writeIpynb that
detects exercise/solution nodes containing code-cell blocks and splits
them into alternating top-level markdown and code cells, preserving
document order. When dropSolutions is true, solution nodes are left
intact for transformToCommonMark to drop entirely.
Also fix stripBlockMarkers regex to handle +++ at end-of-string without
trailing newline, preventing empty markdown cells.
Closes #5
The previous fix (05bdc24) assumed exercise/solution nodes would be the sole child of a block. In reality, blockNestingTransform groups all consecutive non-block siblings into a single wrapper block, so the AST is: root > block { para, exercise {...}, solution {..., block{code}}, para } The fix now scans inside each block's children for exercise/solution nodes containing code-cell blocks, and splits the block accordingly. Extracted helper functions for clarity: - isGatedNodeWithCodeCells: identifies target nodes - liftFromExerciseSolution: splits a single node's children - splitBlockWithGatedNodes: processes a block with mixed children Added tests for the shared-block structure (exercise + solution + other content in the same block) and for dropSolutions with shared blocks. Refs #5, #6
…k/ipynb export The container handler in myst-to-md only handled figure, table, and code kinds. Containers with kind 'quote' (produced by the epigraph, pull-quote, and blockquote directives) fell through and returned empty string, silently dropping all content during ipynb export. Add a 'quote' branch that serializes the blockquote child as a standard markdown blockquote, with optional attribution rendered as an em-dash line. Closes #7
…back Add MYST_DEBUG_XREF env var to dump full AST node details when a crossReference resolves with an empty URL during CommonMark serialization. This helps diagnose #8 where some {ref} roles produce [text]() links in ipynb export. Also add a defensive fallback to use node.url (set by MultiPageReferenceResolver for cross-page refs) when urlSource, label, and identifier are all missing. This prevents empty URLs for resolved remote references without changing behaviour for any existing case.
…port The reference resolver (addChildrenFromTargetNode) marks crossReferences as resolved and sets html_id + kind, but for same-page targets the identifier and label fields end up undefined. The CommonMark serializer then generates empty URLs like [Section 7](). Add html_id to the URL fallback chain: urlSource → #label → #identifier → #html_id → url → '' This fixes all 23 unique empty-URL crossReferences found in the QuantEcon lectures (headings, equations, exercises, code blocks, paragraphs). Closes #8
The ad-hoc debug logging served its purpose for diagnosing the html_id fallback issue and is no longer needed. A system-wide debug infrastructure should be designed separately.
- Update regex to handle escaped brackets in alt text and escaped parentheses in URLs produced by mdast-util-to-markdown - Unescape URLs before looking up in imageData dictionary - Refactor to single-pass replacement using md.replace(regex, callback) - Add tests for escaped parentheses in URLs and escaped brackets in alt text
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
myst build --ipynbsupport that produces notebooks with plain CommonMark markdown cells — compatible with vanilla Jupyter Notebook, JupyterLab (withoutjupyterlab-myst), and Google Colab. Also adds an option to embed images as base64 cell attachments for fully self-contained notebooks.What's Changed
Bug fixes (built on top of upstream PR jupyter-book#1882)
frontmatterparameter was accepted but never used; now populatesmetadata.kernelspecandmetadata.language_infocorrectly"python"; derives from frontmatter kernelspec+++markers — stripped globally from markdown cells (was only matching start of string)myst-to-mdtomyst-to-ipynbAdditionally fixes two issues discovered during real-world validation against QuantEcon lectures:
CommonMark serialization mode
New
markdown: commonmarkoption in export config triggers an AST pre-transform that converts MyST-specific nodes to CommonMark equivalents beforewriteMdserialization.18 directive/role mappings implemented:
mathblock$$...$$inlineMath$...$admonition> **Title** ...blockquoteexercise**Exercise N** ...solution**Solution** ...(or dropped viadropSolutions)proof/theorem/lemma**Theorem N (Title)** ...tabSetcontainer(figure)+ italic captioncontainer(table)card/grid/details/asidemystDirective/mystRolemystTarget/commentcodeblocksidentifier/label(id)=prefixesKey design: uses
{ type: 'html' }AST nodes for math to preventmdast-util-to-markdownfrom escaping LaTeX special characters.Image attachment embedding
New
images: attachmentoption embeds images as base64 cell attachments for self-contained notebooks.Two-phase hybrid architecture:
collectImageData()walks AST image nodes, resolves filesystem paths, reads & base64-encodes intoRecord<url, ImageData>embedImagesAsAttachments()post-serialization regex rewrites→with cellattachmentsfieldValidated on 24 QuantEcon lectures: 50 images across 12 notebooks embedded, 0 external references remaining.
Epigraph / pull-quote serialization (fixes #7)
The
myst-to-mdcontainer handler only handledfigure,table, andcodekinds —quotecontainers (produced by{epigraph}and{pull-quote}directives) were silently dropped.Fix: Added
kind === 'quote'branch that:blockquotechild and serializes it via the default blockquote handlercaptionas> — AttributionlineCross-reference URL fallback (fixes #8)
When MyST resolves same-page cross-references (e.g.,
{ref},{eq}), theaddChildrenFromTargetNodetransform setshtml_idon the node but doesn't always propagateidentifierorlabel. Themyst-to-mdserializer was only checkingurlSource → #label → #identifier, resulting in empty[text]()links.Fix: Extended the URL fallback chain to:
urlSource → #label → #identifier → #html_id → url → ''Note: A third case was identified where
{ref}roles with multi-line bodies fail to parse entirely. This is an upstream parser limitation filed as jupyter-book/mystmd#2724.Exercise / solution code cell lifting
Exercises and solutions can contain
code-cellnodes that should become executable notebook cells. These were being serialized as markdown instead of being lifted to top-level code cells.Fix:
liftCodeCellsFromGatedNodes()pre-processes the AST to extract code cells from exercise/solution containers, splitting surrounding markdown content into separate cells. Handles both direct gated nodes and blocks wrapping gated nodes.Documentation
docs/creating-notebooks.md— comprehensive ipynb export guidedocs/documents-exports.md— added ipynb to format tabledocs/frontmatter.md— added ipynb to format valuesdocs/myst.yml— TOC entry for creating-notebookspackages/myst-to-ipynb/README.md— features and usageTest suite — 154 tests passing
myst-to-ipynb (55 tests)
basic.ymlfrontmatter.ymlcommonmark.ymlattachments.ymlattachments.spec.tsrun.spec.tsmyst-to-md (99 tests, 6 new)
directives.ymlreferences.ymlFiles Changed (36 files, +3122 / -11)
New files
packages/myst-to-ipynb/src/commonmark.ts— AST pre-transform (~500 lines, 18 node types)packages/myst-to-ipynb/src/attachments.ts— Post-serialization image embedding (~100 lines)packages/myst-to-ipynb/src/types.ts— SharedImageDatainterfacepackages/myst-to-ipynb/tests/commonmark.yml— 18 CommonMark mode testspackages/myst-to-ipynb/tests/attachments.yml— 5 attachment integration testspackages/myst-to-ipynb/tests/attachments.spec.ts— 7 attachment unit testsdocs/creating-notebooks.md— New documentation pageModified files
packages/myst-to-ipynb/src/index.ts—IpynbOptions, attachment wiring, empty cell filter, exercise liftingpackages/myst-to-ipynb/tests/run.spec.ts— Test runner supportingfrontmatter+optionsin YAMLpackages/myst-to-ipynb/tests/basic.yml— Expanded to 13 testspackages/myst-to-ipynb/tests/frontmatter.yml— 4 kernelspec testspackages/myst-to-ipynb/package.json— Fixed homepage URLpackages/myst-to-ipynb/README.md— Updated documentationpackages/myst-cli/src/build/ipynb/index.ts—collectImageData(), options passthrough, log fixpackages/myst-to-md/src/directives.ts— Epigraph/pull-quote container handlerpackages/myst-to-md/src/references.ts— Cross-reference URL fallback chain withhtml_idpackages/myst-to-md/tests/directives.yml— 3 epigraph/pull-quote testspackages/myst-to-md/tests/references.yml— 3 cross-reference fallback testsdocs/documents-exports.md— ipynb in format tabledocs/frontmatter.md— ipynb in format valuesdocs/myst.yml— TOC entryTracking issue: QuantEcon/mystmd#2 (full PLAN)
Related: QuantEcon/meta#292 · jupyter-book/mystmd#1882
Real-world validation: QuantEcon/lecture-python-programming.myst#363 — 24 lectures, 0 MyST syntax leaks