Skip to content

docs: add translation image stabilization specification for issue #137#140

Open
luandro wants to merge 4 commits intomainfrom
feat/translation-image-stabilization-spec
Open

docs: add translation image stabilization specification for issue #137#140
luandro wants to merge 4 commits intomainfrom
feat/translation-image-stabilization-spec

Conversation

@luandro
Copy link
Contributor

@luandro luandro commented Feb 13, 2026

Summary

This PR adds a comprehensive technical specification document for addressing issue #137 - replacing expiring Notion/S3 image URLs in translated documentation with stable canonical /images/... paths.

Problem

The Notion translation workflow (scripts/notion-translate/index.ts) produces translated Markdown with expiring Notion/S3 image URLs that break over time (~1 hour expiration). Meanwhile, the English fetch pipeline already handles this correctly by downloading images to static/images/ and rewriting URLs.

Solution Overview

The specification proposes integrating the existing image processing pipeline into the translation flow:

  1. Pre-translation: Call processAndReplaceImages() to stabilize image URLs before sending to OpenAI
  2. Post-translation: Call validateAndFixRemainingImages() as a safety net to catch any remaining S3 URLs

This approach reuses existing, tested code rather than creating new image handling logic.

What's Included

📄 TRANSLATION_IMAGE_STABILIZATION_SPEC.md containing:

Implementation Plan Summary

Phase Description Time
Phase 1 Core Integration - Add image processing to translation pipeline 2-3h
Phase 2 Prompt Enhancement - Update OpenAI prompt to preserve /images/ paths 30m
Phase 3 Testing - Create comprehensive test suite 2-3h
Phase 4 Validation - Lint, format, regression tests 1h
Phase 5 PR Creation 30m

Acceptance Criteria (from issue #137)

After implementation, translated markdown will contain:

  • ✅ Zero URLs matching secure.notion-static.com, notion-static.com, amazonaws.com, X-Amz-*, or www.notion.so/image/
  • ✅ Stable /images/<filename> references that resolve at build time
  • ✅ Shared images (no duplication per language)
  • ✅ Support for both ![alt](...) and <img src="..."> syntax
  • ✅ Idempotent behavior (re-running doesn't cause churn)

Testing

This PR only adds documentation (specification). Testing will be part of the implementation PR.


Closes #137

@luandro can click here to continue refining the PR

Greptile Summary

This PR adds four specification/planning documents for issue #137 — stabilizing expiring Notion/S3 image URLs in translated documentation. No code changes are included; this is purely documentation.

  • TRANSLATION_IMAGE_STABILIZATION_SPEC.md: Core technical specification proposing integration of the existing processAndReplaceImages() pipeline into the translation flow (pre-translation image download + post-translation S3 URL validation). Code references verified as accurate against the current codebase.
  • TRANSLATION_IMAGE_STABILIZATION_PLAN.md: Detailed implementation plan with task-level breakdown, dependency graph, concrete code diffs, and test scaffolding.
  • TRANSLATION_IMAGE_STABILIZATION_SPEC_REVIEW.md: Self-review of the spec identifying issues (all resolved in the updated spec). Adds repo root clutter — consider removing or relocating.
  • TRANSLATION_IMAGE_STABILIZATION_SPEC_SIMPLIFIED.md: Non-technical summary that largely duplicates the main spec. Consider consolidating.
  • Important: The PR description says "Closes Notion translation pipeline: replace expiring Notion/S3 image URLs with canonical /images/... paths (reuse EN images) #137" but this is a docs-only PR — merging will auto-close the issue before the actual implementation is delivered. The Closes keyword should be removed.
  • The testing section includes per-language image override tests despite that feature being explicitly deferred to a future enhancement.

Confidence Score: 4/5

Important Files Changed

Filename Overview
TRANSLATION_IMAGE_STABILIZATION_SPEC.md Well-structured technical specification with accurate code references, clear pipeline diagrams, and thorough risk analysis. Code locations verified against source.
TRANSLATION_IMAGE_STABILIZATION_PLAN.md Detailed implementation plan with task-level breakdown, dependency graph, and concrete code examples. Test strategy is comprehensive.
TRANSLATION_IMAGE_STABILIZATION_SPEC_REVIEW.md Self-review document that identified real issues (fallback behavior, validation mismatch). Useful as audit trail but adds to repo root clutter.
TRANSLATION_IMAGE_STABILIZATION_SPEC_SIMPLIFIED.md Simplified version for non-technical readers. Content is accurate but overlaps significantly with the main spec.

Sequence Diagram

sequenceDiagram
    participant N as Notion API
    participant T as Translation Pipeline
    participant IR as imageReplacer
    participant FS as static/images/
    participant AI as OpenAI API
    participant D as Disk (i18n/)

    Note over T: Current Flow (broken)
    N->>T: Markdown with S3 URLs
    T->>AI: Translate (S3 URLs intact)
    AI->>T: Translated markdown (S3 URLs)
    T->>D: Save (URLs expire in ~1hr)

    Note over T: Proposed Flow (stable)
    N->>T: Markdown with S3 URLs
    T->>IR: processAndReplaceImages()
    IR->>FS: Download images (cached)
    IR->>T: Markdown with /images/... paths
    alt Image download failed
        T->>T: Throw error (fail page)
    end
    T->>AI: Translate (stable paths)
    AI->>T: Translated markdown
    T->>IR: getImageDiagnostics()
    alt S3 URLs found
        T->>T: Throw error (fail page)
    end
    T->>D: Save with stable /images/... paths
Loading

Last reviewed commit: 9f73b74

Context used:

  • Context from dashboard - CLAUDE.md (source)

openhands-agent and others added 4 commits February 13, 2026 01:00
This specification document addresses the problem of expiring Notion/S3 image
URLs in translated documentation. It provides:

- Technical analysis of the current translation pipeline
- Proposed solution integrating existing image processing infrastructure
- Detailed 5-phase implementation plan with time estimates
- Testing requirements and acceptance criteria
- Risk analysis and rollback strategy

The solution reuses the existing processAndReplaceImages() and
validateAndFixRemainingImages() functions from the notion-fetch pipeline
to stabilize image URLs before and after translation.

Closes #137

Co-authored-by: openhands <openhands@all-hands.dev>
Updates to TRANSLATION_IMAGE_STABILIZATION_SPEC.md:

Per-language image overrides:
- Add section explaining default vs override behavior
- Document how translators can use localized screenshots
- Define precedence rules (language page wins, EN fallback)
- Add how-to snippet with before/after example

Pipeline improvements:
- Add language-aware source selection step in pipeline flow
- Update acceptance criteria for localized images

Technical clarifications:
- Add function contracts section with exact input/output types
- Fix line number typo (858-518 → 858-890)
- Add invariants section with grep verification commands
- Make failure policy explicit (fail page, no placeholders)
- Clarify validation catches mutated canonical paths

Testing:
- Add per-language override test case
- Require mocked network/filesystem in tests

Cleanup:
- Remove time estimates from phases
- Reuse existing slug logic (no new implementation)
- Expand out-of-scope section (no migration, no new storage)

Co-authored-by: openhands <openhands@all-hands.dev>
- Add detailed review of original spec with 17 issues identified
- Add simplified spec that's easier to understand (ELI5 style)
- Key changes: removed redundant items, fixed fallback behavior,
  clarified edge cases, added implementation details

🤖 Generated with [Qoder][https://qoder.com]
…ons (#141)

* docs: fix spec accuracy issues found during code review

Address all issues identified in the spec review by verifying against
actual codebase:

- Fix function name to processSinglePageTranslation (not processPageTranslation)
- Fix slug helper to include page ID suffix for deterministic filenames
- Replace validateAndFixRemainingImages (warns only) with getImageDiagnostics + throw
- Add totalFailures > 0 check to fail on broken placeholder images
- Defer per-language image overrides to future enhancement (EN-only initially)
- Downgrade prompt enhancement to P2 (already 3 existing instructions)
- Fix import list (remove unused validateAndFixRemainingImages)
- Remove incorrect metrics.skippedSmallSize reference in logging
- Add Prerequisites section for EN fetch recommendation
- Document frontmatter images as out of scope
- Update review document with resolution status

https://claude.ai/code/session_01RMuRBsneMpCL1cBVLuVcmx

* docs: add implementation plan for translation image stabilization (#137)

Machine-readable implementation plan with atomic tasks, exact code
locations, dependency graph, and complete test specifications. Ready
for autonomous execution.

https://claude.ai/code/session_01RMuRBsneMpCL1cBVLuVcmx

* fix: address review feedback inconsistencies

- Fix __tests__/ path to flat path in simplified spec
- Fix title -> originalTitle variable name consistency
- Add note about invoking processSinglePageTranslation in tests

🤖 Generated with [Qoder][https://qoder.com]

* fix(spec): align SPEC with PLAN for logging and TASK-3.6

- Make diagnostic logging conditional (only log when successfulImages > 0)
- Fix TASK-3.6 label to match PLAN (Test title page bypass)

---------

Co-authored-by: Claude <noreply@anthropic.com>
@luandro luandro marked this pull request as ready for review February 17, 2026 15:26
@kiloconnect
Copy link

kiloconnect bot commented Feb 17, 2026

Code Review Summary

Status: No Issues Found | Recommendation: Merge

This PR adds documentation files only (no code changes):

File Lines Description
TRANSLATION_IMAGE_STABILIZATION_SPEC.md 715 Main specification document
TRANSLATION_IMAGE_STABILIZATION_PLAN.md 729 Implementation plan with tasks
TRANSLATION_IMAGE_STABILIZATION_SPEC_REVIEW.md 380 Review of the specification
TRANSLATION_IMAGE_STABILIZATION_SPEC_SIMPLIFIED.md 250 Simplified explanation

Total: 4 new files, 2074 lines of documentation

The documentation appears well-structured and comprehensive, covering:

  • Problem description and solution approach
  • Implementation tasks with dependencies
  • Testing requirements
  • Acceptance criteria
  • Risks and mitigations
  • Review feedback with resolutions

Since this PR contains only markdown documentation files with no executable code, there are no security vulnerabilities, runtime errors, or logic bugs to review.

Files Reviewed (4 files)
  • TRANSLATION_IMAGE_STABILIZATION_PLAN.md
  • TRANSLATION_IMAGE_STABILIZATION_SPEC.md
  • TRANSLATION_IMAGE_STABILIZATION_SPEC_REVIEW.md
  • TRANSLATION_IMAGE_STABILIZATION_SPEC_SIMPLIFIED.md

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +441 to +452
- [ ] After running `bun scripts/notion-translate`, translated markdown contains **zero** URLs matching:
- `secure.notion-static.com`
- `notion-static.com`
- `amazonaws.com`
- `X-Amz-` params
- `www.notion.so/image/`
- [ ] Images in translated pages reference `/images/<filename>` and resolve at build time
- [ ] Images are not duplicated per language (translations reuse shared `/images` assets)
- [ ] Works for both Markdown image syntax (`![alt](...)`) and inline HTML (`<img src="...">`)
- [ ] Idempotent: re-running translation does not cause churn in image links
- [ ] Localized Notion pages may intentionally use different images; these must also be rewritten to stable `/images/...` paths (no Notion/S3 URLs) — **deferred to future enhancement; initial implementation uses EN images only**
- [ ] If any image fails to download, the page translation fails with a diagnostic error (no broken placeholder images)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR should not close #137

The PR description says "Closes #137" but this PR only adds specification documents — no code changes. When this PR merges, GitHub will auto-close issue #137 even though the actual implementation hasn't been done yet. The Closes #137 keyword should be removed from the PR description and reserved for the implementation PR that actually delivers the code changes described in this spec.

Context Used: Context from dashboard - CLAUDE.md (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: TRANSLATION_IMAGE_STABILIZATION_SPEC.md
Line: 441:452

Comment:
**PR should not close #137**

The PR description says "Closes #137" but this PR only adds specification documents — no code changes. When this PR merges, GitHub will auto-close issue #137 even though the actual implementation hasn't been done yet. The `Closes #137` keyword should be removed from the PR description and reserved for the implementation PR that actually delivers the code changes described in this spec.

**Context Used:** Context from `dashboard` - CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=56ef855b-65f1-4964-8b62-134035692d00))

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +1 to +10
# Review: TRANSLATION_IMAGE_STABILIZATION_SPEC.md

**Reviewed:** 2026-02-13
**Spec File:** `TRANSLATION_IMAGE_STABILIZATION_SPEC.md`
**Issue:** #137 - Notion translation pipeline: replace expiring Notion/S3 image URLs with canonical `/images/...` paths

---

## Overall Assessment

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review document adds root-level clutter

This self-review document is useful as an internal working artifact, but committing it to the repository root alongside the spec adds clutter. The review findings have already been incorporated into the updated spec (as noted in the "Resolution Status" section at the bottom). Consider either:

  • Moving this to a context/ subdirectory (e.g., context/development/) consistent with the project structure
  • Removing it from the PR and linking to it as a PR comment instead
  • Keeping only the spec and plan files, since the review's findings are already reflected in them

Context Used: Context from dashboard - CLAUDE.md (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: TRANSLATION_IMAGE_STABILIZATION_SPEC_REVIEW.md
Line: 1:10

Comment:
**Review document adds root-level clutter**

This self-review document is useful as an internal working artifact, but committing it to the repository root alongside the spec adds clutter. The review findings have already been incorporated into the updated spec (as noted in the "Resolution Status" section at the bottom). Consider either:
- Moving this to a `context/` subdirectory (e.g., `context/development/`) consistent with the project structure
- Removing it from the PR and linking to it as a PR comment instead
- Keeping only the spec and plan files, since the review's findings are already reflected in them

**Context Used:** Context from `dashboard` - CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=56ef855b-65f1-4964-8b62-134035692d00))

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +1 to +10
# Translation Image Stabilization - Simplified Specification

**Issue:** #137 - Replace expiring image URLs with stable paths in translated docs

**Goal:** When we translate English docs to other languages, make sure the images don't break over time.

---

## The Problem (Simple Explanation)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider consolidating with the main spec

This simplified version largely duplicates information from TRANSLATION_IMAGE_STABILIZATION_SPEC.md. Adding 4 new root-level markdown files for one feature spec is notable — the repo already has IMAGE_URL_EXPIRATION_SPEC.md and NOTION_FETCH_ARCHITECTURE.md as precedent, but neither has companion review/simplified/plan documents alongside them. Consider consolidating the key "simplified" context as a section within the main spec, or moving this to context/development/ to keep the root directory clean.

Context Used: Context from dashboard - CLAUDE.md (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: TRANSLATION_IMAGE_STABILIZATION_SPEC_SIMPLIFIED.md
Line: 1:10

Comment:
**Consider consolidating with the main spec**

This simplified version largely duplicates information from `TRANSLATION_IMAGE_STABILIZATION_SPEC.md`. Adding 4 new root-level markdown files for one feature spec is notable — the repo already has `IMAGE_URL_EXPIRATION_SPEC.md` and `NOTION_FETCH_ARCHITECTURE.md` as precedent, but neither has companion review/simplified/plan documents alongside them. Consider consolidating the key "simplified" context as a section within the main spec, or moving this to `context/development/` to keep the root directory clean.

**Context Used:** Context from `dashboard` - CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=56ef855b-65f1-4964-8b62-134035692d00))

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +388 to +397
- Verify both markdown and HTML image syntax work

4. **Idempotency**
- Test re-running doesn't cause churn
- Already-processed content should remain unchanged

5. **Per-Language Image Overrides**
- Test: EN markdown references image A, localized markdown references image B (Notion URL)
- Result: uses image B rewritten to `/images/...`
- Confirm EN fallback remains unchanged when no override exists
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per-language override tests contradict scope

The spec explicitly defers per-language image overrides to a "future enhancement" (lines 113-116, 451, 689), yet the testing requirements include tests for per-language image override behavior (tests #5 here). These tests would have no implementation to test against in the initial scope. Either remove these test cases from the initial testing requirements, or add a note that they should be written as skipped/todo tests to document the expected future behavior.

Prompt To Fix With AI
This is a comment left during a code review.
Path: TRANSLATION_IMAGE_STABILIZATION_SPEC.md
Line: 388:397

Comment:
**Per-language override tests contradict scope**

The spec explicitly defers per-language image overrides to a "future enhancement" (lines 113-116, 451, 689), yet the testing requirements include tests for per-language image override behavior (tests #5 here). These tests would have no implementation to test against in the initial scope. Either remove these test cases from the initial testing requirements, or add a note that they should be written as skipped/todo tests to document the expected future behavior.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Notion translation pipeline: replace expiring Notion/S3 image URLs with canonical /images/... paths (reuse EN images)

2 participants