Replace uReadability with markdown.new for content extraction by grayodesa · Pull Request #156 · radio-t/super-bot

grayodesa · 2026-03-14T11:54:10Z

Summary

Problem: uReadability (ureadability.radio-t.com) often fails to extract content from websites that block server-side requests or require JS rendering, resulting in error messages or stubs being posted as summaries in the chat.
Solution: Replace uReadability with markdown.new — a Cloudflare-powered service with a three-tier fallback pipeline (content negotiation → Workers AI → browser rendering) that handles JS-heavy and bot-protected sites reliably.
Changes:
- New MarkdownNewClient in app/bot/openai/mdnew.go implementing the existing uKeeperGetter interface (drop-in replacement)
- Parses markdown.new response format: Title: header line + YAML frontmatter + markdown body
- Replaced UKeeperClient wiring in main.go; removed --ur-api/--ur-token flags, added --mdnew-api (default: https://markdown.new, env: MDNEW_API)

Deployment notes

Remove UREADABILITY_API and UREADABILITY_TOKEN env vars from deployment config
Optionally set MDNEW_API if a different endpoint is needed (default works out of the box, no auth required)

Test plan

Unit tests for MarkdownNewClient.Get() with mock HTTP server
Unit tests for response parser (parseMarkdownNewResponse, stripFrontmatter)
Verified against real markdown.new with https://openai.com/index/introducing-gpt-5-4/ — title and content extracted correctly (31K chars of clean markdown)
All existing tests pass (go test ./app/...)
End-to-end: run bot locally, send RTJC message with ⚠️ + link, verify summary appears in chat

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

uReadability often fails to extract content from websites that block server-side requests. markdown.new (Cloudflare-powered) provides a three-tier fallback pipeline (content negotiation → Workers AI → browser rendering) that handles JS-heavy and bot-protected sites. - add MarkdownNewClient implementing uKeeperGetter interface - parse markdown.new response format (Title header + YAML frontmatter) - replace UKeeperClient wiring in main.go - remove --ur-api/--ur-token flags, add --mdnew-api flag - add unit tests for client and response parser Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

umputun

thx for the PR, the idea of improving content extraction makes sense. I tested markdown.new against real URLs from news.radio-t.com and it works for most sites (4/5), failing only on X/Twitter. couple concerns though:

wrong level of abstraction - this change belongs in ukeeper-readability, not in super-bot. ukeeper is the content extraction layer, super-bot is the consumer. if we improve how ukeeper extracts content, the uKeeperGetter interface stays the same and super-bot doesn't change at all
markdown.new is not a cloudflare product - it was built by an independent developer on top of cloudflare APIs. no SLA, no guarantees it stays up or keeps the same response format. cloudflare has an official Browser Rendering /markdown endpoint that does the same thing - fetches any URL, renders JS, returns clean markdown. it works on the free plan (10 min/day browser time, 1 req/10 sec rate limit) and doesn't require the target site to opt in or be on cloudflare. I tested it and it actually handles X/Twitter correctly (which markdown.new can't), returns clean JSON, and doesn't need custom response parsing
CLAUDE.md rewrite should be a separate PR, unrelated to the content extraction change. it also drops some useful project-specific guidelines

the real problem here is sites behind cloudflare protection returning "just a moment..." to ureadability. the right fix would be upgrading ukeeper to use cloudflare's Browser Rendering /markdown API (POST /accounts/{id}/browser-rendering/markdown with {"url": "..."}) - official, stable, handles JS rendering, works with any URL

umputun · 2026-04-12T21:29:03Z

closing this — the content extraction improvement was addressed at the proper level in ukeeper-readability, which is where this logic belongs. thx for the idea though, it pointed to a real problem.

grayodesa and others added 2 commits March 14, 2026 13:53

update CLAUDE.md with architecture overview and accurate build commands

9155142

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

grayodesa requested a review from umputun as a code owner March 14, 2026 11:54

umputun reviewed Mar 14, 2026

View reviewed changes

paskal mentioned this pull request Mar 29, 2026

Modularise URL retrieval with Cloudflare Browser Rendering support ukeeper/ukeeper-readability#73

Merged

umputun closed this Apr 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace uReadability with markdown.new for content extraction#156

Replace uReadability with markdown.new for content extraction#156
grayodesa wants to merge 2 commits intoradio-t:masterfrom
grayodesa:feature/markdown-new-content-extraction

grayodesa commented Mar 14, 2026

Uh oh!

umputun left a comment

Uh oh!

umputun commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

grayodesa commented Mar 14, 2026

Summary

Deployment notes

Test plan

Uh oh!

umputun left a comment

Choose a reason for hiding this comment

Uh oh!

umputun commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants