CrawlRec is a humanized Playwright-based recorder and extractor that allows you to record element selectors interactively and later re-run automated extractions.
It includes stealth behavior, randomized user agents, and graceful shutdown handling to avoid detection and prevent stuck sessions.
- Interactive Recorder — Capture element selectors directly by clicking within a live browser session.
- Stealth Mode — Implements multiple anti-bot evasion techniques to reduce automation fingerprints.
- Humanized Behavior — Simulates realistic browsing actions with randomized user agents, mouse movement, and interaction delays.
- Reusable Templates — Saves all recorded actions as structured JSON for replay or integration with other tools.
pip install git+https://github.com/stexz01/crawlrec.gitCrawlRec requires:
- Playwright browsers (installed via
playwright install)
Interactively record selectors from any website.
crawlrec record https://example.com -o example.jsonDuring recording:
- Click elements directly in the opened browser to capture them.
- Choose what to extract (
textorhref) from the prompt. - Use
Ctrl + Cor select “Exit & Save” to stop and save your session.
Run a saved JSON template to extract data from a page.
crawlrec extract -t crawls/example.jsonYou can override the URL if needed:
crawlrec extract -t crawls/template.json -u https://newpage.comA recording session produces a JSON file similar to:
{
"url": "https://example.com",
"actions": [
{
"selector": "a[href='/about']",
"extract": "href",
"text": "About Us"
},
{
"selector": "h1.main-title",
"extract": "text",
"text": "Welcome to Example"
}
]
}usage: crawlrec.py [-h] {record,extract} ...
CrawlRec — Humanized Playwright Recorder & Extractor
positional arguments:
{record,extract}
record Record selectors interactively
extract Extract data from saved JSON
options:
-h, --help Show this help message and exitstexz01
GitHub: @stexz01
Pull requests and feature improvements are welcome.
If you encounter bugs or have feature suggestions, please open an issue on the GitHub repository.
Thank You (:
