Hacker News daily digest fetcher with Jina Reader and crawl4ai. Fetches top stories from yesterday, crawls content and comments, saves to markdown.
- Fetches top stories from Hacker News (yesterday) via Algolia API (default 10, configurable)
- Fetches article markdown with Jina Reader first for external URLs, then falls back to local crawling
- Crawls story content and comments using crawl4ai
- Saves story markdown files to
drafts/(configurable via--output) - Daily digest posts are stored in
daily/asdaily/YYYY/MM/YYYY-MM-DD.mdfor the Hugo site - Rich CLI output with progress tracking
pip install -r requirements.txt
python -m playwright install chromium
# Optional: improve Reader throughput and limits
export JINA_API_KEY=your_api_key# Run with defaults (yesterday's top 15 stories)
python -m hn_daily
# With options
python -m hn_daily --date 2025-01-19 --limit 15 --output my_draftsMarkdown files are saved to drafts/ with format:
{story_title}_{YYYYMMDD}.md
Each file contains:
- Story metadata (author, points, URL, date)
- Crawled content
- Comments section
hn-daily/
├── hn_daily/
│ ├── cli.py # CLI entry point
│ ├── models.py # Story, Comment, CrawlResult
│ └── services/
│ ├── story_service.py # Fetch from HN API
│ ├── comment_service.py # Fetch comments
│ ├── crawler_service.py # crawl4ai integration
│ └── storage_service.py # Save to markdown
├── tests/
├── drafts/
├── requirements.txt
└── pyproject.toml
- Python 3.10+
- Playwright browsers (
python -m playwright install chromium)