Skip to content

thieung/defuddle

Repository files navigation

Defuddle

A Cloudflare Worker that extracts the main content of any web page and returns clean Markdown. Built on top of Defuddle with special handling for X/Twitter posts, including text, media, polls, quotes, and long-form Articles.

Defuddle — Live Demo

🔗 Live demo: defuddle.thieunv.workers.dev

Examples:

# Regular web page
https://defuddle.thieunv.workers.dev/vividkit.dev

# X/Twitter post
https://defuddle.thieunv.workers.dev/x.com/thieunguyen_it/status/2021461660310044828

# X Article (long-form with multiple mediums)
https://defuddle.thieunv.workers.dev/x.com/trq212/status/2024574133011673516

Supported Platforms

Platform Method Details
Any web page Defuddle + Turndown Smart content extraction, strips ads/nav/footers
X / Twitter FxTwitter API Posts, articles, media, polls, quotes, threads
Facebook Custom extractor Public posts and content
Substack Defuddle built-in Newsletter articles (new in Defuddle 0.15)
YouTube Defuddle built-in Video metadata, transcripts
Reddit Defuddle built-in Posts and comments
GitHub Defuddle built-in Issues, READMEs, discussions

Features

  • Any web page → Markdown via Defuddle + Turndown
  • X/Twitter posts → rich Markdown via the FxTwitter API
    • Tweet text with t.co link expansion
    • Photos, videos, GIFs with thumbnails & duration
    • X Articles (long-form DraftJS content with inline media)
    • Quote tweets with media
    • Polls with visual progress bars
    • Engagement stats (likes, retweets, replies, views)
    • Community notes, replying-to context, broadcasts
    • External media (YouTube embeds, etc.)
  • Facebook posts → Markdown with custom extractor
  • JSON and Markdown output formats
  • CORS support

Usage

# Get any web page as Markdown
curl https://<your-worker>.workers.dev/medium.com/@richardhightower/claude-code-todos-to-tasks-5a1b0e351a1c

# Get an X/Twitter post
curl https://<your-worker>.workers.dev/x.com/thieunguyen_it/status/2021461660310044828

# Get X Article (long-form with multiple mediums)
curl https://<your-worker>.workers.dev/x.com/trq212/status/2024574133011673516

# Get JSON output
curl -H 'Accept: application/json' https://<your-worker>.workers.dev/x.com/thieunguyen_it/status/2021461660310044828

Local Development

Prerequisites

Setup

# Clone the repo
git clone <repo-url>
cd defuddle

# Install dependencies
npm install

# Start local dev server
npm run dev

The worker will be available at http://localhost:8787.

# Test locally
curl http://localhost:8787/x.com/thieunguyen_it/status/2021461660310044828

Run Tests

npm test

Deploy to Cloudflare Workers

First-time setup

  1. Login to Cloudflare CLI

    npx wrangler login
  2. Deploy

    npm run deploy

    This runs wrangler deploy which:

    • Bundles the TypeScript source
    • Uploads to Cloudflare Workers
    • Assigns a *.workers.dev subdomain
  3. Verify

    curl https://defuddle.<your-subdomain>.workers.dev/example.com

Custom domain (optional)

  1. Go to Cloudflare Dashboard → Workers & Pages → defuddle → Settings → Domains & Routes
  2. Add a custom domain (must be on Cloudflare DNS) or a route pattern

Configuration

The worker config is in wrangler.jsonc:

{
  "name": "defuddle",        // Worker name (= subdomain)
  "main": "src/index.ts",           // Entry point
  "compatibility_date": "2026-03-01",
  "compatibility_flags": ["nodejs_compat"]  // Required for linkedom
}

Key settings:

  • nodejs_compat — required for the linkedom DOM parser used by Defuddle
  • observability.enabled — enables Workers logs in the dashboard

Project Structure

src/
├── index.ts                        # Worker entry point, request routing
├── convert.ts                      # Orchestrator: routes URLs to extractors
├── convert-types.ts                # Shared types (ConvertResult, ConvertOptions)
├── web-page-extractor.ts           # Generic web page extraction (Defuddle + Turndown)
├── x-twitter-fetcher.ts            # X/Twitter post extraction via FxTwitter API
├── x-twitter-types.ts              # X/Twitter type definitions
├── x-twitter-media-renderer.ts     # X/Twitter media to markdown
├── x-twitter-text-processor.ts     # X/Twitter text processing utilities
├── draftjs-to-markdown-converter.ts # DraftJS → Markdown for X Articles
├── facebook-fetcher.ts             # Facebook post extraction
└── polyfill.ts                     # Workers runtime polyfills for DOM APIs

API Reference

GET /<url>

Extracts content from the given URL.

Response formats:

  • text/markdown (default) — Markdown with YAML frontmatter
  • application/json — set Accept: application/json header

Frontmatter fields:

Field Description
title Page/tweet title
author Author name
published Publication date
source Original URL
domain Source domain
description Page description or tweet preview
word_count Content word count
likes ❤️ (X/Twitter only)
retweets 🔁 (X/Twitter only)
replies 💬 (X/Twitter only)
views 👁 (X/Twitter only)

License

MIT

About

Cloudflare Worker that extracts and converts web content to Markdown using Defuddle - supports X/Twitter posts with media, polls, and articles

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages