-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Problem
Our Notion translation workflow (scripts/notion-translate/index.ts) converts Notion pages to Markdown via n2m.pageToMarkdown() and translates using OpenAI. The resulting translated Markdown often contains expiring Notion/S3 image URLs, which break over time.
Meanwhile, the Notion fetch pipeline already downloads and rewrites images into the canonical Docusaurus location:
- disk:
static/images/ - web path:
/images/<filename>
(usingscripts/notion-fetch/imageProcessing.tsandscripts/notion-fetch/imageReplacer.ts)
But translation does not reuse this image rewrite step, so translated pages keep expiring URLs.
Goal
All translated Markdown should reference the same stable canonical images as English by using /images/... paths (backed by static/images/). Translations must not include expiring Notion/S3 URLs.
Proposed Solution
Integrate the existing image replacement pipeline into the translation flow:
- In
scripts/notion-translate/index.ts, after:
const markdownContent = await convertPageToMarkdown(englishPage.id);run:
processAndReplaceImages(markdownContent, safeFilename)fromscripts/notion-fetch/imageReplacer.ts
This will:
- detect Notion/S3 image URLs in the markdown (and
<img>tags), - download images into
static/images/(or reuse cache), - rewrite URLs to
/images/<filename>.
- Translate the image-stabilized markdown (already using
/images/...), and ensure translation does not mutate those paths.
Implementation Details
-
Import:
processAndReplaceImages(and optionallyvalidateAndFixRemainingImages) fromscripts/notion-fetch/imageReplacer.ts.
-
Use the same “safe filename” slug style as we already compute in
saveTranslatedContentToDisk()(title → slug). Pass that as thesafeFilenameso image filenames remain deterministic. -
Ensure
/images/...links remain unchanged through translation:- Add a strong instruction in the translation prompt: “Do not modify URLs or paths, especially anything starting with
/images/.” - Optionally add a post-translation guard: run
validateAndFixRemainingImages(translatedMarkdown, safeFilename).
- Add a strong instruction in the translation prompt: “Do not modify URLs or paths, especially anything starting with
Acceptance Criteria
-
After running
bun scripts/notion-translate, translated markdown contains zero URLs matching any of:secure.notion-static.comnotion-static.comamazonaws.comX-Amz-paramswww.notion.so/image/
-
Images in translated pages reference
/images/<filename>and resolve at build time (Docusaurus static). -
Images are not duplicated per language (translations reuse the shared
/imagesassets). -
Works for both:
- Markdown image syntax:
 - Inline HTML:
<img src="...">
- Markdown image syntax:
-
Idempotent: re-running translation does not cause churn in image links.
Tests
Add tests under scripts/notion-translate/__tests__/ (or extend existing notion-fetch tests) that verify:
- markdown with Notion/S3 image URLs is rewritten to
/images/... - translated output contains no Notion/S3 URLs
/images/...is preserved
Notes
This repo already has detailed handling for expiring Notion URLs and image caching under scripts/notion-fetch/ — translation should reuse that instead of introducing a new mapping system.