-
Notifications
You must be signed in to change notification settings - Fork 0
Python Scraper Functions
Akshay B edited this page Mar 16, 2026
·
1 revision
This document details scraper routes in PluckIt.Processor.
- Audience: external contributors
- Last reviewed: 2026-03-16
- Scope: scraper contract only
-
GET /api/scraper/sources- Returns configured scraping sources.
- Source metadata is typically read-only unless managed via POST.
-
POST /api/scraper/sources- Creates a new scraper source configuration.
- Establishes crawl targets for later lease/run execution.
-
POST /api/scraper/lease/{source_id}- Leases source work for a scheduler cycle.
- Prevents duplicate concurrent runs for a given source.
-
POST /api/scraper/ingest/reddit- Triggers reddit ingestion flow.
- Accepts payload specific to Reddit ingestion requests.
-
POST /api/admin/unban/{target_user_id}- Removes scraper-side restriction for a user in admin context.
-
POST /api/scraper/subscribe/{source_id}- Subscribes current user to source updates.
- Writes user-source preferences used by item fanout.
-
DELETE /api/scraper/subscribe/{source_id}- Removes user subscription for a source.
- Stops future delivery for that source in user context.
-
GET /api/scraper/items- Lists scraped items available to the current user context.
- Represents the current scraped dataset view for authenticated users.
-
POST /api/scraper/items/{item_id}/feedback- Records item feedback.
- Feeds preference signals into downstream taste and digest behavior.
-
POST /api/scraper/run/{source_id}- Runs source pipeline on demand.
- Admin-only operation for manual reprocessing and investigation.
POST /api/admin/unban/{target_user_id}POST /api/scraper/run/{source_id}- These validate that the caller id is included in
ADMIN_USER_IDS.
- These routes interact with the timer-based scraper pipeline described in
Python-Background-Processing.mdfor delayed processing stages. - Admin routes can also be used to correct source state and remove user-level restrictions during operational incidents.
- Scraped item and feedback endpoints are part of the active preference signal chain feeding analysis jobs.