-
Notifications
You must be signed in to change notification settings - Fork 10
feat: add Portable Text serialization and conversion skills #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
kmelve
wants to merge
4
commits into
main
Choose a base branch
from
feat/portable-text-skills
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
6a5156b
feat: add Portable Text serialization and conversion skills
kmelve a11c399
fix: update PT skills with accurate ecosystem info
kmelve 434302c
Fix API accuracy and add migration rule update
kmelve ea9064a
Address review feedback from jonahsnider
kmelve File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,65 @@ | ||
| --- | ||
| name: portable-text-conversion | ||
| description: Convert HTML and Markdown content into Portable Text blocks for Sanity. Use when migrating content from legacy CMSs, importing HTML or Markdown into Sanity, building content pipelines that ingest external content, converting rich text between formats, or programmatically creating Portable Text documents. Covers @portabletext/markdown (markdownToPortableText), @portabletext/block-tools (htmlToBlocks), custom deserializers, and the Portable Text specification for manual block construction. | ||
| license: MIT | ||
| metadata: | ||
| author: sanity | ||
| version: "1.0.0" | ||
| --- | ||
|
|
||
| # Portable Text Conversion | ||
|
|
||
| Convert external content (HTML, Markdown) into Portable Text for Sanity. Three main approaches: | ||
|
|
||
| 1. **`markdownToPortableText`** — Convert Markdown directly using `@portabletext/markdown` (recommended for Markdown) | ||
| 2. **`htmlToBlocks`** — Parse HTML into PT blocks using `@portabletext/block-tools` (for HTML migration) | ||
| 3. **Manual construction** — Build PT blocks directly from any source (APIs, databases, etc.) | ||
|
|
||
| ## Portable Text Specification | ||
|
|
||
| Understand the target format before converting. PT is an array of blocks: | ||
|
|
||
| ```json | ||
| [ | ||
| { | ||
| "_type": "block", | ||
| "_key": "abc123", | ||
| "style": "normal", | ||
| "children": [ | ||
| {"_type": "span", "_key": "def456", "text": "Hello ", "marks": []}, | ||
| {"_type": "span", "_key": "ghi789", "text": "world", "marks": ["strong"]} | ||
| ], | ||
| "markDefs": [] | ||
| }, | ||
| { | ||
| "_type": "block", | ||
| "_key": "jkl012", | ||
| "style": "h2", | ||
| "children": [ | ||
| {"_type": "span", "_key": "mno345", "text": "A heading", "marks": []} | ||
| ], | ||
| "markDefs": [] | ||
| }, | ||
| { | ||
| "_type": "image", | ||
| "_key": "pqr678", | ||
| "asset": {"_type": "reference", "_ref": "image-abc-200x200-png"} | ||
| } | ||
| ] | ||
| ``` | ||
|
|
||
| **Key rules:** | ||
| - Every block and span needs `_key` (unique within the array) | ||
| - `_type: "block"` is for text blocks; custom types use their own `_type` | ||
| - `markDefs` holds annotation data; `marks` on spans reference `markDefs[*]._key` or are decorator strings | ||
| - Lists use `listItem` ("bullet" | "number") and `level` (1, 2, 3...) on regular blocks | ||
|
|
||
| ## Conversion Rules | ||
|
|
||
| Read the rule file matching your source format: | ||
|
|
||
| - **Markdown → Portable Text**: `rules/markdown-to-pt.md` — `@portabletext/markdown` with `markdownToPortableText` (recommended) | ||
| - **HTML → Portable Text**: `rules/html-to-pt.md` — `@portabletext/block-tools` with `htmlToBlocks` | ||
| - **Manual PT Construction**: `rules/manual-construction.md` — build blocks programmatically from any source | ||
|
|
||
| > **Note:** `@sanity/block-tools` is the legacy package name. Always use `@portabletext/block-tools` for new projects. The API is the same. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,242 @@ | ||
| --- | ||
| title: Convert HTML to Portable Text | ||
| description: Use @portabletext/block-tools with htmlToBlocks to convert HTML content into Portable Text blocks | ||
| tags: [portable-text, html, conversion, migration, import] | ||
| --- | ||
|
|
||
| # Convert HTML to Portable Text | ||
|
|
||
| Use `@portabletext/block-tools` to parse HTML into Portable Text blocks. This is the primary tool for migrating HTML content from legacy CMSs. It has built-in support for content from Google Docs, Microsoft Word, and Notion. | ||
|
|
||
| > **Note:** For Markdown sources, use `@portabletext/markdown` instead — it's simpler and more direct. See `rules/markdown-to-pt.md`. | ||
|
|
||
| > **Note:** `@sanity/block-tools` is the legacy package name. Use `@portabletext/block-tools` for new projects. The API is identical. | ||
|
|
||
| ## Setup | ||
|
|
||
| ```bash | ||
| npm install @portabletext/block-tools jsdom @sanity/schema | ||
| ``` | ||
|
|
||
| In Node.js, you must provide a `parseHtml` function that returns a DOM `Document`. Use JSDOM for this: | ||
|
|
||
| ```ts | ||
| import {htmlToBlocks} from '@portabletext/block-tools' | ||
| import {JSDOM} from 'jsdom' | ||
| import Schema from '@sanity/schema' | ||
|
|
||
| // JSDOM is passed to htmlToBlocks via the parseHtml option: | ||
| // htmlToBlocks(html, blockContentType, { | ||
| // parseHtml: (html) => new JSDOM(html).window.document, | ||
| // }) | ||
| ``` | ||
|
|
||
| ## Define Your Schema | ||
|
|
||
| `htmlToBlocks` needs a compiled Sanity block content type to know which marks, styles, and custom types are valid. Use `@sanity/schema` to compile it: | ||
|
|
||
| ```ts | ||
| const defaultSchema = Schema.compile({ | ||
| name: 'mySchema', | ||
| types: [ | ||
| { | ||
| name: 'post', | ||
| type: 'document', | ||
| fields: [ | ||
| { | ||
| name: 'body', | ||
| type: 'array', | ||
| of: [ | ||
| { | ||
| type: 'block', | ||
| marks: { | ||
| decorators: [ | ||
| {title: 'Strong', value: 'strong'}, | ||
| {title: 'Emphasis', value: 'em'}, | ||
| {title: 'Code', value: 'code'}, | ||
| ], | ||
| annotations: [ | ||
| { | ||
| name: 'link', | ||
| type: 'object', | ||
| fields: [{name: 'href', type: 'url'}], | ||
| }, | ||
| ], | ||
| }, | ||
| styles: [ | ||
| {title: 'Normal', value: 'normal'}, | ||
| {title: 'H2', value: 'h2'}, | ||
| {title: 'H3', value: 'h3'}, | ||
| {title: 'Quote', value: 'blockquote'}, | ||
| ], | ||
| lists: [ | ||
| {title: 'Bullet', value: 'bullet'}, | ||
| {title: 'Number', value: 'number'}, | ||
| ], | ||
| }, | ||
| { | ||
| name: 'image', | ||
| type: 'image', | ||
| fields: [{name: 'alt', type: 'string'}], | ||
| }, | ||
| ], | ||
| }, | ||
| ], | ||
| }, | ||
| ], | ||
| }) | ||
|
|
||
| const blockContentType = defaultSchema | ||
| .get('post') | ||
| .fields.find((f) => f.name === 'body').type | ||
| ``` | ||
|
|
||
| ## Basic Conversion | ||
|
|
||
| ```ts | ||
| const html = '<p>Hello <strong>world</strong></p><h2>Heading</h2>' | ||
|
|
||
| const blocks = htmlToBlocks(html, blockContentType, { | ||
| parseHtml: (html) => new JSDOM(html).window.document, | ||
| }) | ||
| ``` | ||
|
|
||
| ## Custom Deserializers | ||
|
|
||
| Handle HTML elements that don't map directly to standard PT: | ||
|
|
||
| ```ts | ||
| const blocks = htmlToBlocks(html, blockContentType, { | ||
| parseHtml: (html) => new JSDOM(html).window.document, | ||
| rules: [ | ||
| // Convert <img> to image blocks | ||
| { | ||
| deserialize(el, next, block) { | ||
| if (el.tagName?.toLowerCase() !== 'img') return undefined | ||
|
|
||
| return block({ | ||
| _type: 'image', | ||
| asset: { | ||
| _type: 'reference', | ||
| _ref: '', // Upload image separately, set ref after | ||
| }, | ||
| alt: el.getAttribute('alt') || '', | ||
| _sanityAsset: `image@${el.getAttribute('src')}`, // for migration tooling | ||
| }) | ||
| }, | ||
| }, | ||
| // Convert <a> with custom attributes | ||
| { | ||
| deserialize(el, next, block) { | ||
| if (el.tagName?.toLowerCase() !== 'a') return undefined | ||
|
|
||
| const href = el.getAttribute('href') || '' | ||
| const target = el.getAttribute('target') || '' | ||
|
|
||
| return { | ||
| _type: '__annotation', | ||
| markDef: { | ||
| _type: 'link', | ||
| href, | ||
| ...(target ? {target} : {}), | ||
| }, | ||
| children: next(el.childNodes), | ||
| } | ||
| }, | ||
| }, | ||
| // Convert <iframe> to embed blocks | ||
| { | ||
| deserialize(el, next, block) { | ||
| if (el.tagName?.toLowerCase() !== 'iframe') return undefined | ||
|
|
||
| return block({ | ||
| _type: 'embed', | ||
| url: el.getAttribute('src') || '', | ||
| }) | ||
| }, | ||
| }, | ||
| ], | ||
| }) | ||
| ``` | ||
|
|
||
| ## Pre-Process HTML Before Conversion | ||
|
|
||
| Strip layout elements and extract metadata: | ||
|
|
||
| ```ts | ||
| function preprocessHtml(rawHtml: string) { | ||
| const dom = new JSDOM(rawHtml) | ||
| const doc = dom.window.document | ||
|
|
||
| // Remove layout elements | ||
| const removeSelectors = ['header', 'footer', 'nav', '.sidebar', '.menu', 'script', 'style'] | ||
| removeSelectors.forEach((sel) => { | ||
| doc.querySelectorAll(sel).forEach((el) => el.remove()) | ||
| }) | ||
|
|
||
| // Extract metadata | ||
| const title = doc.querySelector('h1')?.textContent || doc.title || '' | ||
| const description = doc.querySelector('meta[name="description"]')?.getAttribute('content') || '' | ||
|
|
||
| // Get cleaned body | ||
| const body = doc.querySelector('article')?.innerHTML || doc.body.innerHTML | ||
|
|
||
| return {title, description, body} | ||
| } | ||
| ``` | ||
|
|
||
| ## Upload Images During Migration | ||
|
|
||
| Don't just link external images — upload them to Sanity: | ||
|
|
||
| ```ts | ||
| import type {SanityClient} from '@sanity/client' | ||
|
|
||
| async function uploadImage(client: SanityClient, url: string) { | ||
| const response = await fetch(url) | ||
| const buffer = await response.arrayBuffer() | ||
| const asset = await client.assets.upload('image', Buffer.from(buffer), { | ||
| filename: url.split('/').pop(), | ||
| }) | ||
| return { | ||
| _type: 'image', | ||
| asset: {_type: 'reference', _ref: asset._id}, | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| ## Full Migration Example | ||
|
|
||
| ```ts | ||
| import {defineMigration, createOrReplace} from 'sanity/migrate' | ||
|
|
||
| export default defineMigration({ | ||
| title: 'Import WordPress posts', | ||
| async *migrate(documents, context) { | ||
| const posts = await fetchWordPressPosts() | ||
|
|
||
| for (const post of posts) { | ||
| const {title, description, body} = preprocessHtml(post.content) | ||
| const blocks = htmlToBlocks(body, blockContentType, { | ||
| parseHtml: (html) => new JSDOM(html).window.document, | ||
| rules: [/* custom rules */], | ||
| }) | ||
|
|
||
| yield createOrReplace({ | ||
| _id: `post-${post.slug}`, | ||
| _type: 'post', | ||
| title: title || post.title, | ||
| body: blocks, | ||
| }) | ||
| } | ||
| }, | ||
| }) | ||
| ``` | ||
|
|
||
| Run with: `sanity migration run import-wordpress-posts --no-dry-run` | ||
|
|
||
| ## Reference | ||
|
|
||
| - [@portabletext/block-tools](https://github.com/portabletext/editor/tree/main/packages/block-tools) — part of the `portabletext/editor` monorepo | ||
| - [Sanity Migration docs](https://www.sanity.io/docs/schema-and-content-migrations) | ||
| - [portabletext.org](https://www.portabletext.org) — Editor docs and serializer list |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.