Skip to content

Move TikTok scraper to standalone API with multi-service architecture#74

Open
karilaa-dev wants to merge 10 commits intomainfrom
extract-tiktok-media-scraper
Open

Move TikTok scraper to standalone API with multi-service architecture#74
karilaa-dev wants to merge 10 commits intomainfrom
extract-tiktok-media-scraper

Conversation

@karilaa-dev
Copy link
Copy Markdown
Owner

@karilaa-dev karilaa-dev commented Mar 13, 2026

Summary

Closes #134

  • Extract TikTok scraper into a standalone FastAPI service under tt-scrap/
  • Introduce a service registry pattern so adding new platform scrapers (Instagram, YouTube, etc.) requires only creating a new services/<name>/ directory — no changes to core app code
  • Routes are namespaced per service: GET /tiktok/video, GET /tiktok/music, etc.
  • Generic exception hierarchy (ScraperError, ContentDeletedError, etc.) shared across all services
  • TikTok-specific env vars now use TIKTOK_ prefix (e.g., TIKTOK_URL_RESOLVE_MAX_RETRIES)

Architecture

tt-scrap/app/
├── app.py              # FastAPI app, service registry, generic error handler
├── base_client.py      # BaseClient Protocol (contract for all services)
├── registry.py         # ServiceRegistry + ServiceEntry
├── exceptions.py       # Generic scraper exceptions
├── models.py           # Shared response models
├── proxy_manager.py    # Shared proxy rotation
├── routes/health.py    # GET /health
└── services/
    └── tiktok/         # Self-contained TikTok service
        ├── client.py   # TikTok extraction via yt-dlp
        ├── config.py   # TIKTOK_* env vars
        ├── parser.py   # TikTok response → VideoResponse/MusicResponse
        └── routes.py   # GET /tiktok/video, GET /tiktok/music

Test plan

  • uvicorn app.app:app starts without errors
  • GET /health{"status": "ok"}
  • GET /tiktok/video?url=<tiktok_url> → video/slideshow response
  • GET /tiktok/music?video_id=<id> → music response
  • GET /docs → Swagger UI shows /tiktok/video, /tiktok/music, /health
  • Env vars TIKTOK_URL_RESOLVE_MAX_RETRIES and TIKTOK_VIDEO_INFO_MAX_RETRIES respected
  • Old endpoints (/video, /music) no longer exist (404)

Transient failures (network hiccups, 429 rate limits, 5xx errors) now
retry up to 3 attempts with 3s/5s delays, using a tighter per-request
timeout (10s total, 3s connect). Non-retryable errors (404) still fail
immediately.
Show 👨‍💻 reaction after fetching media info and before uploading,
matching the existing TikTok download behavior.
Reorganize TikTok scraper functionality into a reusable `tiktok_scrapper`
package with the following changes:

- Move client, models, exceptions, and proxy_manager from tiktok_api/ to
  tiktok_scrapper/
- Create standalone config system based on environment variables
- Add FastAPI REST API server (app.py) with endpoints:
  * GET /video - Extract video/slideshow metadata and CDN URLs
  * GET /music - Extract music metadata
  * GET /check - Validate TikTok URLs via regex
  * GET /health - Health check
- Add two new client methods for metadata extraction without downloading:
  * extract_video_info() - Get raw video data from TikTok API
  * extract_music_info() - Get raw music data from TikTok API
- Add Pydantic models for JSON API responses
- Add Dockerfile for containerized API deployment
- Keep tiktok_api as backward-compatible shim re-exporting from tiktok_scrapper
- Move yt-dlp and curl_cffi dependencies to tiktok_scrapper package
Extract core TikTok client functionality into a new tiktok_api library package.
The tiktok_scrapper service now uses tiktok_api as a dependency, separating
library code from REST API implementation.

- Move client, models, exceptions, proxy_manager to tiktok_api/
- Remove Pydantic response models from core library (API-specific)
- Restructure tiktok_scrapper/ as standalone FastAPI service
- Update package imports and module structure
- Remove PROXY_DATA_ONLY, MAX_VIDEO_DURATION, STREAMING_DURATION_THRESHOLD, HOST, and PORT config options
- Remove data_only_proxy parameter and feature from TikTokClient
- Remove python-dotenv dependency
- Update fastapi, uvicorn, yt-dlp, curl-cffi, and pydantic-settings to latest versions
…er to app

- Move application code from tiktok_scrapper/tiktok_scrapper/ to tiktok_scrapper/app/
- Remove editable package configuration and build-system requirements
- Update Docker and documentation to reference new module path (tiktok_scrapper.app → app.app)
- Simplify dependency management by treating tiktok_scrapper as an application, not a library package
Rename the package directory and all references from tiktok_scrapper to
tt-scrap. Updates project name in pyproject.toml, documentation, Docker
configuration, and dependency lock files. Removes tt-scrap as a local
dependency from the main project's uv.lock.
…pattern

- Move TikTok client, routes, and config into dedicated service module
- Introduce ServiceRegistry for managing multiple scraper services
- Create BaseClient protocol for service implementations
- Generalize exceptions from TikTok-specific to service-agnostic names
- Replace dependency injection with service registry initialization
- Support dynamic service registration and initialization at startup
Update endpoints to reflect /{service}/... routing, document TIKTOK_
env prefix, and add instructions for adding new services.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant