Skip to content

Evaluate Scrapling + StealthyFetcher for Planespotters source #17

@michaelmorandi

Description

@michaelmorandi

Background

Research was done evaluating whether adopting Scrapling with StealthyFetcher
would benefit the project.

Finding

The only source where adoption makes sense is Planespotters.
Current pipeline:

  1. Playwright → browser visit → harvest Cloudflare cookies (cached 1h)
  2. curl_cffi → reuse cookies for HTML page requests
  3. BeautifulSoup → 6-method fallback chain to extract aircraft data

StealthyFetcher uses Camoufox (modified Firefox) with engine-level
fingerprint spoofing (stronger than current JS injection) and a built-in
session class that can replace all three steps.

Expected Benefits

  • Remove beautifulsoup4 dependency (only used in planespotters.py)
  • Eliminate ~120 lines of Playwright boilerplate + 6-method HTML parsing fallback
  • Stronger Cloudflare resilience (C++ engine spoofing vs JS injection)
  • Scrapling's selector API replaces manual BeautifulSoup traversal

Non-benefits

  • No Docker image size reduction — StealthyFetcher still needs a browser binary, comparable size to current Chromium
  • ADSBexchange: not worth migrating — JSON API + custom TokenBucketRateLimiter; cookie-harvest-once + curl_cffi pattern is more efficient
  • Flightradar24: no change needed

Scope (if implemented)

  • planespotters.py — replace PlanespottersCookieManager + curl_cffi session + BeautifulSoup parsing
  • pyproject.toml — add scrapling[fetchers], remove beautifulsoup4
  • Dockerfile — add scrapling install step; verify Firefox system deps

Notes

  • Scrapling recently migrated StealthyFetcher from Camoufox to Patchright — verify current engine before implementing
  • Scrapling API is still evolving; check stability before adopting

Metadata

Metadata

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions