Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,13 @@
"description": "AI agent resources for Sanity development",
"private": true,
"scripts": {
"validate": "skills-ref validate ./skills/sanity-best-practices && skills-ref validate ./skills/content-modeling-best-practices && skills-ref validate ./skills/seo-aeo-best-practices && skills-ref validate ./skills/content-experimentation-best-practices",
"validate": "skills-ref validate ./skills/sanity-best-practices && skills-ref validate ./skills/content-modeling-best-practices && skills-ref validate ./skills/seo-aeo-best-practices && skills-ref validate ./skills/content-experimentation-best-practices && skills-ref validate ./skills/portable-text-serialization && skills-ref validate ./skills/portable-text-conversion",
"validate:sanity": "skills-ref validate ./skills/sanity-best-practices",
"validate:content-modeling": "skills-ref validate ./skills/content-modeling-best-practices",
"validate:seo": "skills-ref validate ./skills/seo-aeo-best-practices",
"validate:experimentation": "skills-ref validate ./skills/content-experimentation-best-practices"
"validate:experimentation": "skills-ref validate ./skills/content-experimentation-best-practices",
"validate:pt-serialization": "skills-ref validate ./skills/portable-text-serialization",
"validate:pt-conversion": "skills-ref validate ./skills/portable-text-conversion"
},
"devDependencies": {
"skills-ref": "^0.1.5"
Expand Down
22 changes: 14 additions & 8 deletions rules/sanity-migration.mdc
Original file line number Diff line number Diff line change
Expand Up @@ -40,18 +40,24 @@ const blocks = htmlToBlocks(htmlString, blockContentType, {
```

## 2. Markdown Import (Static Sites)
Use `@sanity/block-content-to-markdown` (legacy name, often used in reverse) OR use a dedicated parser like `remark` to convert Markdown to HTML, then use `block-tools`.
Use `@portabletext/markdown` for direct, schema-aware Markdown ↔ Portable Text conversion.

**Recommended Path: Markdown -> HTML -> Portable Text**
This is often more robust than direct Markdown-to-PT parsers because `block-tools` handles schema validation better.
**Recommended: Direct Conversion with `@portabletext/markdown`**
```typescript
import {markdownToPortableText} from '@portabletext/markdown'

const blocks = markdownToPortableText(markdownString)
```

This handles headings, lists, bold, italic, code, links, images, and tables. Use `@portabletext/sanity-bridge` to pass your Sanity schema so only valid types are produced.

**Alternative: Markdown → HTML → Portable Text**
For complex Markdown with non-standard extensions, convert to HTML first, then use `htmlToBlocks` (see above).

1. **Parse:** `marked` or `remark` to convert MD to HTML.
2. **Convert:** Use `htmlToBlocks` (see above).
2. **Convert:** Use `htmlToBlocks` from `@portabletext/block-tools`.

**Alternative: Direct Parsing**
If using a library like `markdown-to-sanity` or writing a custom `remark` serializer:
- Ensure you handle "inline" vs "block" nodes correctly.
- Map images to Sanity asset uploads.
> **Note:** `@sanity/block-content-to-markdown` and `@sanity/block-tools` are deprecated. Use `@portabletext/markdown` and `@portabletext/block-tools` instead.

## 3. Image Handling (Universal)
Don't just link to external images. Download them and upload to Sanity Asset Pipeline.
Expand Down
65 changes: 65 additions & 0 deletions skills/portable-text-conversion/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
---
name: portable-text-conversion
description: Convert HTML and Markdown content into Portable Text blocks for Sanity. Use when migrating content from legacy CMSs, importing HTML or Markdown into Sanity, building content pipelines that ingest external content, converting rich text between formats, or programmatically creating Portable Text documents. Covers @portabletext/markdown (markdownToPortableText), @portabletext/block-tools (htmlToBlocks), custom deserializers, and the Portable Text specification for manual block construction.
license: MIT
metadata:
author: sanity
version: "1.0.0"
---

# Portable Text Conversion

Convert external content (HTML, Markdown) into Portable Text for Sanity. Three main approaches:

1. **`markdownToPortableText`** — Convert Markdown directly using `@portabletext/markdown` (recommended for Markdown)
2. **`htmlToBlocks`** — Parse HTML into PT blocks using `@portabletext/block-tools` (for HTML migration)
3. **Manual construction** — Build PT blocks directly from any source (APIs, databases, etc.)

## Portable Text Specification

Understand the target format before converting. PT is an array of blocks:

```json
[
{
"_type": "block",
"_key": "abc123",
"style": "normal",
"children": [
{"_type": "span", "_key": "def456", "text": "Hello ", "marks": []},
{"_type": "span", "_key": "ghi789", "text": "world", "marks": ["strong"]}
],
"markDefs": []
},
{
"_type": "block",
"_key": "jkl012",
"style": "h2",
"children": [
{"_type": "span", "_key": "mno345", "text": "A heading", "marks": []}
],
"markDefs": []
},
{
"_type": "image",
"_key": "pqr678",
"asset": {"_type": "reference", "_ref": "image-abc-200x200-png"}
}
]
```

**Key rules:**
- Every block and span needs `_key` (unique within the array)
- `_type: "block"` is for text blocks; custom types use their own `_type`
- `markDefs` holds annotation data; `marks` on spans reference `markDefs[*]._key` or are decorator strings
- Lists use `listItem` ("bullet" | "number") and `level` (1, 2, 3...) on regular blocks

## Conversion Rules

Read the rule file matching your source format:

- **Markdown → Portable Text**: `rules/markdown-to-pt.md` — `@portabletext/markdown` with `markdownToPortableText` (recommended)
- **HTML → Portable Text**: `rules/html-to-pt.md` — `@portabletext/block-tools` with `htmlToBlocks`
- **Manual PT Construction**: `rules/manual-construction.md` — build blocks programmatically from any source

> **Note:** `@sanity/block-tools` is the legacy package name. Always use `@portabletext/block-tools` for new projects. The API is the same.
242 changes: 242 additions & 0 deletions skills/portable-text-conversion/rules/html-to-pt.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,242 @@
---
title: Convert HTML to Portable Text
description: Use @portabletext/block-tools with htmlToBlocks to convert HTML content into Portable Text blocks
tags: [portable-text, html, conversion, migration, import]
---

# Convert HTML to Portable Text

Use `@portabletext/block-tools` to parse HTML into Portable Text blocks. This is the primary tool for migrating HTML content from legacy CMSs. It has built-in support for content from Google Docs, Microsoft Word, and Notion.

> **Note:** For Markdown sources, use `@portabletext/markdown` instead — it's simpler and more direct. See `rules/markdown-to-pt.md`.

> **Note:** `@sanity/block-tools` is the legacy package name. Use `@portabletext/block-tools` for new projects. The API is identical.

## Setup

```bash
npm install @portabletext/block-tools jsdom @sanity/schema
```

In Node.js, you must provide a `parseHtml` function that returns a DOM `Document`. Use JSDOM for this:

```ts
import {htmlToBlocks} from '@portabletext/block-tools'
import {JSDOM} from 'jsdom'
import Schema from '@sanity/schema'

// JSDOM is passed to htmlToBlocks via the parseHtml option:
// htmlToBlocks(html, blockContentType, {
// parseHtml: (html) => new JSDOM(html).window.document,
// })
```

## Define Your Schema

`htmlToBlocks` needs a compiled Sanity block content type to know which marks, styles, and custom types are valid. Use `@sanity/schema` to compile it:

```ts
const defaultSchema = Schema.compile({
name: 'mySchema',
types: [
{
name: 'post',
type: 'document',
fields: [
{
name: 'body',
type: 'array',
of: [
{
type: 'block',
marks: {
decorators: [
{title: 'Strong', value: 'strong'},
{title: 'Emphasis', value: 'em'},
{title: 'Code', value: 'code'},
],
annotations: [
{
name: 'link',
type: 'object',
fields: [{name: 'href', type: 'url'}],
},
],
},
styles: [
{title: 'Normal', value: 'normal'},
{title: 'H2', value: 'h2'},
{title: 'H3', value: 'h3'},
{title: 'Quote', value: 'blockquote'},
],
lists: [
{title: 'Bullet', value: 'bullet'},
{title: 'Number', value: 'number'},
],
},
{
name: 'image',
type: 'image',
fields: [{name: 'alt', type: 'string'}],
},
],
},
],
},
],
})

const blockContentType = defaultSchema
.get('post')
.fields.find((f) => f.name === 'body').type
```

## Basic Conversion

```ts
const html = '<p>Hello <strong>world</strong></p><h2>Heading</h2>'

const blocks = htmlToBlocks(html, blockContentType, {
parseHtml: (html) => new JSDOM(html).window.document,
})
```

## Custom Deserializers

Handle HTML elements that don't map directly to standard PT:

```ts
const blocks = htmlToBlocks(html, blockContentType, {
parseHtml: (html) => new JSDOM(html).window.document,
rules: [
// Convert <img> to image blocks
{
deserialize(el, next, block) {
if (el.tagName?.toLowerCase() !== 'img') return undefined

return block({
_type: 'image',
asset: {
_type: 'reference',
_ref: '', // Upload image separately, set ref after
},
alt: el.getAttribute('alt') || '',
_sanityAsset: `image@${el.getAttribute('src')}`, // for migration tooling
})
},
},
// Convert <a> with custom attributes
{
deserialize(el, next, block) {
if (el.tagName?.toLowerCase() !== 'a') return undefined

const href = el.getAttribute('href') || ''
const target = el.getAttribute('target') || ''

return {
_type: '__annotation',
markDef: {
_type: 'link',
href,
...(target ? {target} : {}),
},
children: next(el.childNodes),
}
},
},
// Convert <iframe> to embed blocks
{
deserialize(el, next, block) {
if (el.tagName?.toLowerCase() !== 'iframe') return undefined

return block({
_type: 'embed',
url: el.getAttribute('src') || '',
})
},
},
],
})
```

## Pre-Process HTML Before Conversion

Strip layout elements and extract metadata:

```ts
function preprocessHtml(rawHtml: string) {
const dom = new JSDOM(rawHtml)
const doc = dom.window.document

// Remove layout elements
const removeSelectors = ['header', 'footer', 'nav', '.sidebar', '.menu', 'script', 'style']
removeSelectors.forEach((sel) => {
doc.querySelectorAll(sel).forEach((el) => el.remove())
})

// Extract metadata
const title = doc.querySelector('h1')?.textContent || doc.title || ''
const description = doc.querySelector('meta[name="description"]')?.getAttribute('content') || ''

// Get cleaned body
const body = doc.querySelector('article')?.innerHTML || doc.body.innerHTML

return {title, description, body}
}
```

## Upload Images During Migration

Don't just link external images — upload them to Sanity:

```ts
import type {SanityClient} from '@sanity/client'

async function uploadImage(client: SanityClient, url: string) {
const response = await fetch(url)
const buffer = await response.arrayBuffer()
const asset = await client.assets.upload('image', Buffer.from(buffer), {
filename: url.split('/').pop(),
})
return {
_type: 'image',
asset: {_type: 'reference', _ref: asset._id},
}
}
```

## Full Migration Example

```ts
import {defineMigration, createOrReplace} from 'sanity/migrate'

export default defineMigration({
title: 'Import WordPress posts',
async *migrate(documents, context) {
const posts = await fetchWordPressPosts()

for (const post of posts) {
const {title, description, body} = preprocessHtml(post.content)
const blocks = htmlToBlocks(body, blockContentType, {
parseHtml: (html) => new JSDOM(html).window.document,
rules: [/* custom rules */],
})

yield createOrReplace({
_id: `post-${post.slug}`,
_type: 'post',
title: title || post.title,
body: blocks,
})
}
},
})
```

Run with: `sanity migration run import-wordpress-posts --no-dry-run`

## Reference

- [@portabletext/block-tools](https://github.com/portabletext/editor/tree/main/packages/block-tools) — part of the `portabletext/editor` monorepo
- [Sanity Migration docs](https://www.sanity.io/docs/schema-and-content-migrations)
- [portabletext.org](https://www.portabletext.org) — Editor docs and serializer list
Loading