Skip to content

feat: Phase 4 — Personalization MVP#7

Merged
fbmoulin merged 12 commits intomainfrom
feat/personalize-mvp
Apr 27, 2026
Merged

feat: Phase 4 — Personalization MVP#7
fbmoulin merged 12 commits intomainfrom
feat/personalize-mvp

Conversation

@fbmoulin
Copy link
Copy Markdown
Owner

Closes Phase 4 of ROADMAP.md. Implements docs/PERSONALIZATION.md as code.

Summary

  • New personalize/ package: slots, sanitize, openai_client, patcher, pipeline, cli (~750 LoC).
  • 3 Flask routes: GET /personalize + POST /api/personalize/{structure,run} (rate-limited 5/min and 2/min/IP).
  • Hybrid intake template (templates/personalize.html) with browser observability logger reused.
  • 65 new tests (74 → 139 passing in 2.7s) + 2 live OpenAI smoke tests gated by RUN_OPENAI_LIVE=1.
  • Closes audit P2-11 (LLM input/output hardening — text + image + HTML).

Architecture (per docs/PERSONALIZATION.md)

  • Patch-based, not full-regen. LLM emits a JSON patch list with closed-enum slot_id; deterministic Python applies patches via BeautifulSoup. Zero selector hallucination.
  • Responses API with text.format json_schema strict=true for both structure_brief (Step 2) and personalize (Step 5).
  • gpt-image-1 medium for hero/feature illustrations; first call sets style, rest run via asyncio.gather with first as input_images. Avatars are CSS gradient + initials (no synthetic faces — EU AI Act + OpenAI policy).
  • Hard budget cap (default $1.00) on OpenAIBrandClient. Projected cost checked BEFORE every call.

Live validation

```
RUN_OPENAI_LIVE=1 uv run pytest tests/integration -v -s
test_structure_brief_live PASSED ($0.005, 10s)
test_personalize_live_with_vision_no_image_gen PASSED (
$0.10, 70s)
spent: $0.1050; remaining: $0.3950
```

Real gpt-5-mini accepts the schemas. Closed-enum slot_id honored — every patch came back with a slot id from the list we passed.

Security (P2-11 closure)

Threat Mitigation
Prompt injection via brief text sanitize_brief_text strips C0 controls + bounds length; brief fields go in via json.dumps, never f-string
SVG XSS via logo upload verify_image_bytes allow-lists PNG/JPEG by magic; rejects SVG/GIF/anything else
Embedded EXIF in user logos strip_exif re-encodes via Image.copy() round-trip
Future LLM-emitted HTML strip_dangerous_html removes script/style/iframe/object/embed, drops on*=, neutralizes javascript: URLs
Path traversal via html_dir realpath().startswith(downloads/) confinement in /api/personalize/run

Test plan

  • uv run pytest -q → 139 passed, 2 skipped (integration gated)
  • uv run ruff check && ruff format --check → all green
  • python -m personalize --dry-run → extracts 12 slots from sample inventory
  • import app thread-count invariant preserved (factory pattern, P2-7)
  • Live OpenAI E2E smoke run on gpt-5-mini → both tests pass
  • Manual UX walk-through on /personalize after merge

Out of scope

  • Streaming UI / SSE for run progress (long-poll OK for MVP)
  • A/B harness for the +70% claim (P2-9 stays open)
  • Multi-language brief input (English-only first cut)
  • Style reference upload beyond logo

fbmoulin added 11 commits April 27, 2026 14:03
…dit P2-11

- sanitize_brief_text: strip C0 controls (preserve \t \n \r), bound length
- verify_image_bytes: allow-list PNG/JPEG by magic; reject SVG/GIF/other
- strip_exif: copy() round-trip drops exif/xmp/icc, preserves pixels
- strip_dangerous_html: remove script/style/iframe/object/embed,
  drop on*= handlers, neutralize javascript: URLs in href/src/action

Uses html.parser (stdlib) per repo convention, not lxml.
21 new tests; full suite 105 passing in 0.92s.
- Text patches resolve via slot.selector; missing selectors logged + skipped
- word-wrappers structure-aware split for staggered hero h1
- Palette regex covers from/to/via/bg/text/border/shadow/ring × 400/500/600
- Images written as PNG to <out>/assets/gen_<slot_id>.png + src updated
- All LLM-derived values run through strip_dangerous_html (defense-in-depth)
- Uses html.parser (stdlib) per repo convention

7 tests; full suite 112 passing.
- structure_brief: gpt-5-mini Responses API + brand_brief strict schema
- personalize: multimodal (logo + slots) + dynamic closed-enum schema for
  patches[].slot_id and images[].slot_id (deterministic resolution; no
  selector hallucinations)
- generate_images_parallel: AsyncOpenAI gpt-image-1 medium; first call
  sets style, rest run via asyncio.gather with first as input_images
- Hard budget cap (default $1.00) enforced BEFORE every call; raises
  BudgetExceededError without burning the API. Total spent tracked.
- All user inputs sanitized; logo verified + EXIF-stripped before upload
- pytest-asyncio added (strict mode); 12 mocked tests cover all 3 paths

Full suite 124 passing.
- run_pipeline() chains slot extraction → structure_brief → personalize →
  async image gen → apply_personalization
- Fail-fast on bad logo bytes BEFORE any API call (SVG/empty rejected)
- Inventory + index.html missing → FileNotFoundError before LLM
- structured_brief_override skips Step 2 (UI lets user edit fields)
- dry_run=True validates inputs and logs slot summary, no API calls
- Each step's failure logged with step= key before re-raising

7 tests; full suite 131 passing.
Routes:
- GET /personalize — render intake form (template added in Task 10)
- POST /api/personalize/structure — JSON brief → structured fields (5/min)
- POST /api/personalize/run — multipart brief+logo+html_dir → personalized.html (2/min)

Hardening:
- Global MAX_CONTENT_LENGTH bumped 1 MiB → 8 MiB to fit 2 MiB logo upload;
  per-route caps stay strict (4 KiB structure, 5 MiB run, 2 MiB logo)
- html_dir confined to downloads/ via realpath check (no traversal)
- Lazy import of personalize.* inside handlers — keeps import app side-effect-free
- Smoke verified: threading.active_count() unchanged on import

8 route tests; full suite 139 passing.
- templates/personalize.html: 3-step UI (brief textarea + logo + html_dir →
  edit-extracted-fields → run + result panel), no separate JS file
- Browser observability logger copied from index.html (same _rawFetch
  capture-before-wrap pattern; PR #1 lesson)
- Smoke verified: GET /personalize returns 200 with all selectors present
- argparse: html_dir positional + --brief --logo --budget --dry-run
- python-dotenv loads OPENAI_API_KEY from .env if present
- --dry-run validates inputs and emits slot summary, zero API calls
- Smoke-tested: dry-run on sample fixture extracts 12 slots in <1s
- test_structure_brief_live: ~$0.005, 10s — validates brand_brief schema
- test_personalize_live_with_vision_no_image_gen: ~$0.10, 70s — validates
  multimodal personalize call + site_personalization closed-enum schema
- Validated locally: gpt-5-mini available, ~$0.105 spent total
- Image generation not exercised live (gpt-image-1 $0.07/call) — covered by
  mocked tests; live path is identical SDK pattern

Default pytest run skips both: 139 passed, 2 skipped.
- docs/AUDIT.md: P2-11 marked RESOLVED with module pointer + test count
- ROADMAP.md: Phase 4 SHIPPED block with landed-in module map
- TODO.md: Phase 4 done entry; Later section now Phase 5-6 only
- CLAUDE.md: new 'Personalization module' section with conventions
  (always inject OpenAI client in tests, budget guard fires before call,
  closed-enum slot IDs, no synthetic faces, Responses not Assistants)
- docs/superpowers/plans/: 13-task TDD plan that drove the implementation

Branch state: 11 commits, 139 tests + 2 live (gated), all green.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

Warning

Rate limit exceeded

@fbmoulin has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 14 minutes and 52 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1a33da78-4398-4e0b-888b-57f4fb075000

📥 Commits

Reviewing files that changed from the base of the PR and between 2b48d3a and f20d7d2.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (28)
  • .github/workflows/ci.yml
  • CLAUDE.md
  • ROADMAP.md
  • TODO.md
  • app.py
  • docs/AUDIT.md
  • docs/superpowers/plans/2026-04-27-personalization-mvp.md
  • personalize/__init__.py
  • personalize/__main__.py
  • personalize/cli.py
  • personalize/openai_client.py
  • personalize/patcher.py
  • personalize/pipeline.py
  • personalize/sanitize.py
  • personalize/slots.py
  • pyproject.toml
  • templates/personalize.html
  • tests/fixtures/sample_captured.html
  • tests/fixtures/sample_inventory.json
  • tests/integration/__init__.py
  • tests/integration/test_personalize_live.py
  • tests/test_personalize_app.py
  • tests/test_personalize_openai_client.py
  • tests/test_personalize_patcher.py
  • tests/test_personalize_pipeline.py
  • tests/test_personalize_sanitize.py
  • tests/test_personalize_slots.py
  • tests/test_personalize_smoke.py
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/personalize-mvp

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Phase 4 introduced these deps but the CI install lines are explicit (legacy
pattern, not pip install -e .) so each new dep needs to be added. pytest job
also needs pytest-asyncio for the new async tests.

Affects: pytest job (collection-time PIL ImportError), pip-audit job
(installed-package set), smoke job (defensive — kratos_clone doesn't import
PIL today but the env should match).
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements Phase 4 of the project, adding a personalization module that leverages OpenAI's gpt-5-mini and gpt-image-1 to customize captured websites. The changes include a new "personalize" package for slot extraction, security hardening, and HTML patching, alongside new Flask routes and a frontend intake form. The review highlights a potential issue with global regex replacements in the palette application logic, suggesting direct DOM manipulation for better correctness and performance. It also recommends making the pipeline orchestrator asynchronous to avoid event loop conflicts when integrated into async environments.

Comment thread personalize/patcher.py
Comment on lines +134 to +150
html = str(soup)
html = re.sub(
rf"\b({_PREFIX_GROUP})-orange-500\b",
rf"\1-[#{primary}]",
html,
)
html = re.sub(
rf"\b({_PREFIX_GROUP})-orange-600\b",
rf"\1-[#{primary_pressed}]",
html,
)
html = re.sub(
rf"\b({_PREFIX_GROUP})-orange-400\b",
rf"\1-[#{primary_hover}]",
html,
)
return BeautifulSoup(html, "html.parser")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation of _apply_palette performs a global string replacement on the entire HTML document using regex and then re-parses the result. This approach has two significant drawbacks:

  1. Correctness: It may inadvertently modify text content, IDs, or other attributes that happen to match the Tailwind class pattern (e.g., a paragraph explaining Tailwind classes or an element with a matching ID).
  2. Efficiency: Converting the entire BeautifulSoup object to a string and re-parsing it multiple times is computationally expensive, especially for large documents.

Consider iterating over elements that have a class attribute and updating the classes directly in the DOM. For example:

for tag in soup.find_all(class_=True):
    classes = tag.get("class", [])
    new_classes = []
    for cls in classes:
        # Apply replacement logic to individual class names
        new_classes.append(cls.replace("orange-500", f"[#{primary}]"))
    tag["class"] = new_classes

Comment thread personalize/pipeline.py

# Step 6 — async image generation.
try:
images = asyncio.run(client.generate_images_parallel(plan, slots))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using asyncio.run() inside a synchronous function like run_pipeline can lead to a RuntimeError if the code is executed in an environment where an event loop is already running (e.g., within an async web framework or a task runner).

To improve robustness and future-proof the pipeline, consider making run_pipeline an async function and allowing the caller to manage the event loop execution, or providing an asynchronous version of the orchestrator.

@fbmoulin fbmoulin merged commit 5e4863a into main Apr 27, 2026
6 checks passed
@fbmoulin fbmoulin deleted the feat/personalize-mvp branch April 27, 2026 17:37
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c17e5afa16

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

{"type": "input_text", "text": user_payload},
{
"type": "input_image",
"image_url": f"data:image/png;base64,{logo_b64}",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Set logo data URL MIME from validated image type

OpenAIBrandClient.personalize accepts both PNG and JPEG logos (verify_image_bytes), but this payload is always sent as data:image/png to Responses. For valid JPEG uploads, that MIME/content mismatch can make the vision input fail to decode, so /api/personalize/run may return a 502 for inputs the API explicitly claims to support. Build the data URL from the detected format (image/png vs image/jpeg) to keep accepted JPEG uploads functional.

Useful? React with 👍 / 👎.

Comment thread app.py
@limiter.limit(os.getenv("PERSONALIZE_RUN_RATE_LIMIT", "2 per minute"))
def personalize_run():
"""Steps 4–8 — apply personalization to a captured site."""
if (request.content_length or 0) > _PERSONALIZE_RUN_MAX_BYTES:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Enforce run payload cap using actual body length

This size gate only checks request.content_length, which is unset for chunked uploads; in that case it evaluates as 0 and skips the 5 MiB guard. A client can therefore bypass this per-route limit and still force multipart parsing up to the global 8 MiB cap, increasing memory/CPU exposure on /api/personalize/run. Mirror the /api/client-errors pattern by checking the buffered body length (or equivalent) instead of relying solely on Content-Length.

Useful? React with 👍 / 👎.

fbmoulin added a commit that referenced this pull request Apr 27, 2026
(c1) DOM-walk palette swap (was global regex on serialized HTML)
- _apply_palette now iterates every element with a 'class' attribute and
  rewrites individual class tokens via match-or-keep. Previously a regex
  against str(soup) could match 'from-orange-500' inside text content,
  IDs, aria-labels, etc. Test case test_palette_swap_does_not_touch_text_content
  asserts <code>from-orange-500</code>, id=, aria-label= are preserved.
- Class normalization handles both bs4 multi-valued (list) and html.parser
  (str) shapes; explicit type annotations satisfy mypy.

(c2) async pipeline variant (was bare asyncio.run inside sync function)
- arun_pipeline: async public API; awaits client.generate_images_parallel
  directly. Composes inside any caller-managed event loop (FastAPI etc.).
- run_pipeline: sync wrapper. Detects a running loop via asyncio.get_running_loop
  and raises a clear RuntimeError pointing at arun_pipeline instead of dying
  inside asyncio.run with the confusing 'cannot be called from a running
  event loop' trace.
- 2 new tests: arun works inside event loop; run refuses inside event loop.

Tests: 183 → 186 passing. mypy clean on personalize/. bandit HIGH=0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant