Export all your Kindle highlights from Amazon's notebook page to structured JSON.
Note: You must disable/delete Amazon passkey authentication if enabled, as it interferes with password-based login.
- β¨ Scrape all highlights from your entire Kindle library
- π Secure authentication with 2FA/TOTP support
- πΎ Progressive saving with resume capability
- π¨ Preserves highlight colors and notes
- π Captures page numbers and locations
- π Session persistence (no repeated logins)
- Python 3.13+
- uv package manager
brew install uv
requires-python >= 3.13(seepyproject.toml), tested locally on CPython 3.13.6 via uv.- Playwrightβs Python docs list support through Python 3.13, so
playwright>=1.40.0remains valid on the latest interpreter. beautifulsoup4,python-dotenv, andpyotppublish universal wheels with no CPython ABI constraints, so upgrading to 3.13 is safe without pin changes.
# Clone repository
git clone https://github.com/Shane-Neeley/kindle-highlights.git
cd kindle-highlights
# Install dependencies
uv sync
# Install browser
uv run playwright install --with-deps chromium
# Configure credentials
cp .env.example .envEdit .env with your Amazon credentials:
AMAZON_EMAIL=your-email@example.com
AMAZON_PASSWORD=your-password
AMAZON_TOTP_SECRET=your-totp-secret # Optional, for 2FAScrape all books:
uv run kindle-highlights scrape --headfulScrape specific book:
uv run kindle-highlights scrape --asin B00X57B4JGCustom output location:
uv run kindle-highlights scrape --out my-highlights.jsonManual 2FA (visible browser):
uv run kindle-highlights scrape --headfulRun the FastAPI service locally (reload for development):
uv run uvicorn app:app --host 0.0.0.0 --port 8000 --app-dir src --reloadEndpoints:
GET /healthβ service heartbeat + output pathGET /booksβ cached export payloadGET /highlightsβ flattened highlight list with book metadataPOST /scrapeβ trigger a scrape ({"asin": "<ASIN>", "fresh": true}to rescrape all)
HIGHLIGHTS_PATH (optional) overrides where the API reads/writes the export file (default data/highlights.json).
Build and run the service in a container:
docker build -t kindle-highlights .
docker run --rm -p 8000:8000 --env-file .env \
-v $(pwd)/data:/app/data \
-v $(pwd)/playwright:/app/playwright \
kindle-highlightsMounting data/ persists exports, and mounting playwright/ keeps the cached Amazon auth state (.auth/user.json) between runs.
{
"run": {
"timestamp": "2025-11-22T12:34:56Z"
},
"books": [
{
"asin": "B00X57B4JG",
"title": "Why Greatness Cannot Be Planned",
"author": "Kenneth O. Stanley; Joel Lehman",
"cover_url": "https://...",
"highlights": [
{
"id": "highlight-abc123",
"color": "yellow",
"text": "The highlight text...",
"page": 42,
"location": 283,
"note": "Optional note..."
}
]
}
]
}- Authenticate - Logs into Amazon with Playwright (headless browser)
- Discover - Scrolls through library to find all annotated books
- Extract - For each book, loads and scrolls annotations to get all highlights
- Parse - BeautifulSoup extracts structured data from HTML
- Save - Progressively saves each book to JSON (enables resume)
By default, scraping resumes from where it left off. Already-processed books are skipped:
# Interrupted? Just run again - it continues from where it stopped
uv run kindle-highlights scrape
# Need a clean export? Disable resume to reprocess every book
uv run kindle-highlights scrape --fresh# Run test suite
uv run pytest tests/ -v
# Run specific tests
uv run pytest tests/test_parser.py -v# Format with Ruff
uv run ruff format
# Lint (pycodestyle/pyflakes/async rules/etc.)
uv run ruff check src tests
# Static analysis (Ty)
ty check src tests
# Packaging smoke test
uv buildInstall pre-commit once (uv tool install pre-commit) and Tyβs CLI (uv tool install ty), then enable the hooks:
pre-commit install
pre-commit run --all-filesTy is currently in early access; see https://docs.astral.sh/ty/ for the latest usage notes.
This repository is public. Please keep personal work artifacts (notes, scratch output, etc.) inside docs/scratchpad/ so they stay out of version control; the directory is gitignored for that purpose.
| Issue | Solution |
|---|---|
| Login fails | Delete playwright/.auth/user.json and try again |
| 2FA required | Add AMAZON_TOTP_SECRET to .env or use --headful mode |
| Passkey blocking | Disable/delete Amazon passkey in your account settings |
| Missing highlights | Some books have Amazon export limits (scraper will warn) |
| Browser executable missing (Playwright install message) | Run uv run playwright install --with-deps chromium to download Chromium, then retry the scrape |
| General browser issues | Reinstall: uv run playwright install --with-deps chromium |
src/
βββ parser.py # HTML parsing with BeautifulSoup
βββ scraper.py # Browser automation with Playwright
βββ main.py # CLI interface (argparse)
βββ app.py # FastAPI app for HTTP access
tests/
βββ test_parser.py # Unit tests
βββ test_api.py # FastAPI route tests
Dockerfile # Containerized service runner
AMAZON_EMAIL(required) - Your Amazon account emailAMAZON_PASSWORD(required) - Your Amazon account passwordAMAZON_TOTP_SECRET(optional) - Base32 TOTP secret for 2FAHIGHLIGHTS_PATH(optional) - Override where the API reads/writes the export JSON
If you use authenticator apps for 2FA, the TOTP secret is the base32 key shown when you set up the authenticator. Save it during initial 2FA setup to enable automated authentication.
- Amazon passkeys must be disabled (password-based login required)
- Export limits - Amazon may restrict highlights for some books
- Session cookies - Stored in
playwright/.auth/user.json(gitignored) - Rate limiting - Uses 2-second delays to be respectful to Amazon's servers
See LICENSE file for details.
- Read
AGENTS.mdfor repository-wide coding standards, commit expectations, and security reminders before opening a PR. - Run
uv run pytest -v,uv run ruff check, andty check src testsplus at least one dry scrape (uv run kindle-highlights scrape --headfulwhen debugging selectors) to validate DOM changes. - Sanitize or delete
data/highlights.jsonwhen sharing logs; never include.envor Playwright auth files in commits. - Keep the
pre-commithooks enabled so Ruff and Ty run automatically before each commit.
Built with Playwright and BeautifulSoup.