A comprehensive Python + Rust automation system for extracting torrent links from javdb.com and automatically adding them to qBittorrent. The system features a high-performance Rust core (via PyO3) for HTML parsing and proxy management, multi-threaded parallel processing, intelligent history tracking, git integration, automated pipeline execution, and duplicate download prevention.
It can be played as an ingestion pipeline before the automated scrapping platform for JAV (e.g. MDC-NG).
English | 简体中文
- Modular spider package (
scripts/spider/) with 14 specialized modules - Fetches data in real-time from
javdb.com/?vft=2tojavdb.com/?page=5&vft=2 - Filters entries with both "含中字磁鏈" and "今日新種" tags (supports multiple language variations)
- Extracts magnet links based on specific categories with priority ordering
- Saves results to timestamped CSV files in
reports/DailyReport/directory - Comprehensive logging with different levels (INFO, WARNING, DEBUG, ERROR)
- Multi-page processing with progress tracking
- Additional metadata extraction (actor, rating, comment count)
- High-performance Rust core extension (
javdb_rust_core) built with PyO3 + maturin - "Rust first, Python fallback" pattern — all features work without Rust installed
- HTML parsing 5-10x faster than BeautifulSoup (index, detail, category pages)
- Thread-safe proxy pool management with
Arc<Mutex> - Accelerated history management, CSV operations, magnet extraction, URL helpers
- Automatic detection: system uses Rust when available, falls back to pure Python
- Multi-threaded detail page processing with one worker thread per proxy
- Activated automatically when using proxy pool mode with 2+ proxies
- Task queue / result queue architecture for safe concurrent scraping
- Independent
MovieSleepManagerper worker for rate limiting - Thread-safe login refresh with
_login_lock - Force sequential mode with
--sequentialflag
- 字幕 (subtitle): Magnet links with "Subtitle" tag
- hacked: Magnet links with priority order:
- UC无码破解 (-UC.无码破解.torrent) - Highest priority
- UC (-UC.torrent)
- U无码破解 (-U.无码破解.torrent)
- U (-U.torrent) - Lowest priority
The spider operates in two modes:
- Uses base URL:
https://javdb.com/?vft=2 - Saves results to
reports/DailyReport/directory - Checks history by default to avoid re-downloading
- Uses "JavDB" category in qBittorrent
- Activated with
--urlparameter for custom URLs (actors, tags, etc.) - Saves results to
reports/AdHoc/directory - Now checks history by default to skip already downloaded entries
- Use
--ignore-historyto re-download everything - Uses "Ad Hoc" category in qBittorrent
- Example:
python3 scripts/spider --url "https://javdb.com/actors/EvkJ"
- Automatically reads current date's CSV file
- Connects to qBittorrent via Web UI API
- Adds torrents with proper categorization and settings
- Comprehensive logging and progress tracking
- Detailed summary reports
- Automatically filters small files from recently added torrents
- Configurable minimum file size threshold (default: 50MB)
- Sets priority to 0 (do not download) for files below threshold
- Filters out NFO files, samples, screenshots, etc.
- Supports dry-run mode for preview
- Category-based filtering option
- Scheduled via GitHub Actions (2 hours after daily ingestion)
- Automatic Downloaded Detection: Automatically identifies which torrents have been downloaded by checking the history CSV file
- Download Indicators: Adds
[DOWNLOADED]prefix to downloaded torrents in daily report CSV files - Skip Duplicate Downloads: qBittorrent uploader automatically skips torrents with
[DOWNLOADED]indicators - Multiple Torrent Type Support: Supports four types: hacked_subtitle, hacked_no_subtitle, subtitle, no_subtitle
- Enhanced History Tracking: Tracks create_date (first discovery) and update_date (latest modification) for each movie
- Automated git commit and push functionality
- Incremental commits throughout pipeline execution
- Email notifications with results and logs
- Complete workflow automation
- Automatic session cookie refresh
- Captcha handling (manual input or 2Captcha API)
- Updates config.py automatically
- Supports custom URL scraping with authentication
- See JavDB Login Guide for setup
- Integration with CloudflareBypassForScraping
- Request Mirroring mode for transparent CF bypass
- Automatic cookie caching and management
- Works with both local and remote proxy setups
- Automatically activated as a fallback when direct requests fail
- Install Python dependencies:
pip install -r requirements.txt- (Optional) Install SOCKS5 proxy support if you want to use SOCKS5 proxies:
pip install requests[socks]- (Optional) Install Rust acceleration extension for 5-10x faster HTML parsing:
# Install Rust toolchain
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Build and install the extension
cd rust_core
pip install maturin
maturin develop --release
cd ..Note: The Rust extension is optional. All features work without it — the system automatically falls back to pure Python implementations.
- Configure the system by copying and editing the configuration file:
cp config.py.example config.py- (Optional) For CloudFlare bypass feature, install and run CloudflareBypassForScraping service:
# See CloudFlare Bypass section below for setup instructionsYou can also run the application using Docker containers, which simplifies dependency management and deployment.
- Pull the image from GitHub Container Registry:
docker pull ghcr.io/YOUR_USERNAME/javdb-autospider:latest- Prepare configuration files:
cp config.py.example config.py
cp env.example .env
# Edit config.py with your settings- Run the container:
docker run -d \
--name javdb-spider \
--restart unless-stopped \
-v $(pwd)/config.py:/app/config.py:ro \
-v $(pwd)/logs:/app/logs \
-v $(pwd)/Ad\ Hoc:/app/Ad\ Hoc \
-v $(pwd)/Daily\ Report:/app/Daily\ Report \
--env-file .env \
ghcr.io/YOUR_USERNAME/javdb-autospider:latest- Use the automated build script:
./docker/docker-build.shOr manually:
# Prepare configuration
cp config.py.example config.py
cp env.example .env
# Build and start
docker-compose -f docker/docker-compose.yml build
docker-compose -f docker/docker-compose.yml up -d- View logs:
docker-compose -f docker/docker-compose.yml logs -fThe Docker image uses multi-stage builds: a Rust builder stage compiles the javdb_rust_core extension, and the runtime stage only includes the compiled wheel.
If you installed via Docker, you can manage the container with the following commands:
# View container logs
docker logs -f javdb-spider
# View cron logs
docker exec javdb-spider tail -f /var/log/cron.log
# Run spider manually
docker exec javdb-spider python3 scripts/spider --use-proxy
# Run pipeline manually
docker exec javdb-spider python pipeline.py
# Execute commands inside container
docker exec -it javdb-spider bash
# Stop container
docker stop javdb-spider
# Start container
docker start javdb-spider
# Restart container
docker restart javdb-spider# Start containers
docker-compose -f docker/docker-compose.yml up -d
# Stop containers
docker-compose -f docker/docker-compose.yml down
# View logs
docker-compose -f docker/docker-compose.yml logs -f
# Restart containers
docker-compose -f docker/docker-compose.yml restart
# Rebuild and restart
docker-compose -f docker/docker-compose.yml build --no-cache
docker-compose -f docker/docker-compose.yml up -dEdit the .env file to configure scheduled tasks:
# Spider runs daily at 3:00 AM
CRON_SPIDER=0 3 * * *
SPIDER_COMMAND=cd /app && /usr/local/bin/python scripts/spider --use-proxy >> /var/log/cron.log 2>&1
# Pipeline runs daily at 4:00 AM
CRON_PIPELINE=0 4 * * *
PIPELINE_COMMAND=cd /app && /usr/local/bin/python pipeline.py >> /var/log/cron.log 2>&1After modifying .env, restart the container:
docker-compose -f docker/docker-compose.yml restartRun the spider to extract data:
python3 scripts/spider
# Or equivalently:
python -m scripts.spiderRun the qBittorrent uploader:
# Daily mode (default)
python qbtorrent_uploader.py
# Ad hoc mode (for custom URL scraping results)
python qbtorrent_uploader.py --mode adhoc
# Use proxy for qBittorrent API requests
python qbtorrent_uploader.py --use-proxyRun the qBittorrent File Filter (filter out small files):
# Default: filter files smaller than 50MB from last 2 days
python scripts/qb_file_filter.py --min-size 50
# Custom threshold and days
python scripts/qb_file_filter.py --min-size 100 --days 3
# Dry run (preview without changes)
python scripts/qb_file_filter.py --min-size 50 --dry-run
# Filter specific category only
python scripts/qb_file_filter.py --min-size 50 --category JavDB
# With proxy
python scripts/qb_file_filter.py --min-size 50 --use-proxyRun the PikPak bridge (transfer old torrents from qBittorrent to PikPak):
# Default: process torrents older than 3 days in batch mode
python pikpak_bridge.py
# Custom days threshold
python pikpak_bridge.py --days 7
# Dry run mode (test without actual transfers)
python pikpak_bridge.py --dry-run
# Individual mode (process torrents one by one instead of batch)
python pikpak_bridge.py --individual
# Use proxy for qBittorrent API requests
python pikpak_bridge.py --use-proxy
# Combine options
python pikpak_bridge.py --days 5 --dry-run --use-proxyThe JavDB Spider supports various command-line arguments for customization:
# Dry run mode (no CSV file written)
python3 scripts/spider --dry-run
# Specify custom output filename
python3 scripts/spider --output-file my_results.csv
# Custom page range
python3 scripts/spider --start-page 3 --end-page 10
# Parse all pages until empty page is found
python3 scripts/spider --all# Run only Phase 1 (subtitle + today/yesterday tags)
python3 scripts/spider --phase 1
# Run only Phase 2 (today/yesterday tags with quality filter)
python3 scripts/spider --phase 2
# Run both phases (default)
python3 scripts/spider --phase all# Ignore history file and scrape all pages (for both daily and ad hoc modes)
python3 scripts/spider --ignore-history
# Custom URL scraping (saves to reports/AdHoc/, checks history by default)
python3 scripts/spider --url "https://javdb.com/?vft=2"
# Custom URL scraping, ignoring history to re-download everything
python3 scripts/spider --url "https://javdb.com/actors/EvkJ" --ignore-history
# Ignore today/yesterday release date tags and process all matching entries
python3 scripts/spider --ignore-release-date
# Use proxy for all HTTP requests
python3 scripts/spider --use-proxy# Quick test run with limited pages
python3 scripts/spider --start-page 1 --end-page 3 --dry-run
# Full scrape ignoring history
python3 scripts/spider --all --ignore-history
# Custom URL with specific output file
python3 scripts/spider --url "https://javdb.com/?vft=2" --output-file custom_results.csv
# Phase 1 only with custom page range
python3 scripts/spider --phase 1 --start-page 5 --end-page 15
# Download all subtitle entries regardless of release date
python3 scripts/spider --ignore-release-date --phase 1
# Download all high-quality entries regardless of release date
python3 scripts/spider --ignore-release-date --phase 2 --start-page 1 --end-page 10
# Ad hoc mode: Download specific actor's movies (skips already downloaded)
python3 scripts/spider --url "https://javdb.com/actors/EvkJ" --ignore-release-date
# Ad hoc mode: Re-download everything from an actor (ignores history)
python3 scripts/spider --url "https://javdb.com/actors/EvkJ" --ignore-history --ignore-release-date
# Use proxy to access JavDB (useful for geo-restricted regions)
python3 scripts/spider --use-proxy --start-page 1 --end-page 5
# Combine multiple options: proxy + custom URL + ignore release date
python3 scripts/spider --url "https://javdb.com/actors/EvkJ" --use-proxy --ignore-release-date| Argument | Description | Default | Example |
|---|---|---|---|
--dry-run |
Print items without writing CSV | False | --dry-run |
--output-file |
Custom CSV filename | Auto-generated | --output-file results.csv |
--start-page |
Starting page number | 1 | --start-page 5 |
--end-page |
Ending page number | 20 | --end-page 10 |
--all |
Parse until empty page | False | --all |
--ignore-history |
Skip history checking (both daily & ad hoc) | False | --ignore-history |
--url |
Custom URL to scrape (enables ad hoc mode) | None | --url "https://javdb.com/?vft=2" |
--phase |
Phase to run (1/2/all) | all | --phase 1 |
--ignore-release-date |
Ignore today/yesterday tags | False | --ignore-release-date |
--use-proxy |
Enable proxy from config.py | False | --use-proxy |
--sequential |
Force sequential processing (disable parallel) | False | --sequential |
--max-movies-phase1 |
Limit phase 1 movies (for testing) | None | --max-movies-phase1 10 |
--max-movies-phase2 |
Limit phase 2 movies (for testing) | None | --max-movies-phase2 5 |
--use-history |
Enable history filter in ad-hoc mode | False | --use-history |
JavDB Auto Login (for custom URL scraping):
# Run when session cookie expires or before using --url parameter
python3 javdb_login.py
# The script will:
# 1. Login to JavDB with your credentials
# 2. Handle captcha (manual input or 2Captcha API)
# 3. Extract and update session cookie in config.py
# 4. Verify the cookie works
# See JavDB Auto Login section above for setup detailsCheck Proxy Ban Status:
# View ban records
cat "reports/proxy_bans.csv"
# Ban information is also included in pipeline email reportsRun Migration Scripts:
cd migration
# Clean up duplicate history entries
python3 cleanup_history_priorities.py
# Update history file format (if upgrading from older version)
python3 update_history_format.py
# Reclassify torrents (after classification rule changes)
python3 reclassify_c_hacked_torrents.pyRun the complete workflow:
# Basic pipeline run
python pipeline_run_and_notify.py
# Pipeline with custom arguments (passed to Javdb_Spider)
python pipeline_run_and_notify.py --start-page 1 --end-page 5
# Pipeline ignoring release date tags
python pipeline_run_and_notify.py --ignore-release-date --phase 1
# Pipeline with custom URL
python pipeline_run_and_notify.py --url "https://javdb.com/actors/EvkJ"
# Pipeline with proxy enabled
python pipeline_run_and_notify.py --use-proxy
# Pipeline with PikPak individual mode (process torrents one by one)
python pipeline_run_and_notify.py --pikpak-individualThe pipeline will:
- Run the JavDB Spider to extract data (with provided arguments)
- Commit spider results to GitHub immediately
- Run the qBittorrent Uploader to add torrents
- Commit uploader results to GitHub immediately
- Run PikPak Bridge to handle old torrents (batch mode by default, individual mode with
--pikpak-individual) - Perform final commit and push to GitHub
- Analyze logs for critical errors
- Send email notifications with appropriate status
Note: The pipeline accepts the same arguments as scripts/spider and passes them through automatically. Additional pipeline-specific arguments include --pikpak-individual for PikPak Bridge mode control.
The pipeline now includes sophisticated error analysis that distinguishes between:
Critical Errors (email marked as FAILED):
- Cannot access JavDB main site (all pages fail)
- Cannot connect to qBittorrent
- Cannot login to qBittorrent
- All torrent additions failed
- Network completely unreachable
Non-Critical Errors (email marked as SUCCESS):
- Some specific JavDB pages failed (but main site accessible)
- Some individual torrents failed to add (but qBittorrent accessible)
- PikPak API issues (PikPak service problem, not infrastructure)
- No new torrents found (expected behavior)
This ensures you only get FAILED emails when there's a real infrastructure problem that needs attention, not just when there's no new content or minor issues.
All configuration settings are now centralized in a single config.py file:
# =============================================================================
# GIT CONFIGURATION
# =============================================================================
GIT_USERNAME = 'your_github_username'
GIT_PASSWORD = 'your_github_token_or_password'
GIT_REPO_URL = 'https://github.com/your_username/your_repo_name.git'
GIT_BRANCH = 'main'
# =============================================================================
# QBITTORRENT CONFIGURATION
# =============================================================================
QB_HOST = 'your_qbittorrent_ip'
QB_PORT = 'your_qbittorrent_port'
QB_USERNAME = 'your_qbittorrent_username'
QB_PASSWORD = 'your_qbittorrent_password'
TORRENT_CATEGORY = 'JavDB' # Category for daily mode torrents
TORRENT_CATEGORY_ADHOC = 'Ad Hoc' # Category for adhoc mode torrents
TORRENT_SAVE_PATH = ''
AUTO_START = True
SKIP_CHECKING = False
REQUEST_TIMEOUT = 30
DELAY_BETWEEN_ADDITIONS = 1
# =============================================================================
# SMTP CONFIGURATION (for email notifications)
# =============================================================================
SMTP_SERVER = 'smtp.gmail.com'
SMTP_PORT = 587
SMTP_USER = 'your_email@gmail.com'
SMTP_PASSWORD = 'your_email_password_or_app_password'
EMAIL_FROM = 'your_email@gmail.com'
EMAIL_TO = 'your_email@gmail.com'
# =============================================================================
# PROXY CONFIGURATION
# =============================================================================
# Proxy mode: 'single' (use first proxy only) or 'pool' (automatic failover)
PROXY_MODE = 'single'
# Proxy pool - list of proxies (first one used in single mode, all used in pool mode)
PROXY_POOL = [
{'name': 'Main-Proxy', 'http': 'http://127.0.0.1:7890', 'https': 'http://127.0.0.1:7890'},
{'name': 'Backup-Proxy', 'http': 'http://127.0.0.1:7891', 'https': 'http://127.0.0.1:7891'},
]
# Proxy pool behavior (only for pool mode)
PROXY_POOL_COOLDOWN_SECONDS = 691200 # 8 days cooldown for banned proxies
PROXY_POOL_MAX_FAILURES = 3 # Max failures before cooldown
# Legacy proxy config (deprecated - use PROXY_POOL instead)
PROXY_HTTP = None
PROXY_HTTPS = None
# Modular proxy control - which modules use proxy
PROXY_MODULES = ['all'] # 'all' or list: 'spider', 'qbittorrent', 'pikpak'
# =============================================================================
# SPIDER CONFIGURATION
# =============================================================================
START_PAGE = 1
END_PAGE = 20
BASE_URL = 'https://javdb.com'
# Phase 2 filtering criteria
PHASE2_MIN_RATE = 4.0 # Minimum rating score for phase 2 entries
PHASE2_MIN_COMMENTS = 80 # Minimum comment count for phase 2 entries
# Release date filter
IGNORE_RELEASE_DATE_FILTER = False # Set True to ignore today/yesterday tags
# Sleep time configuration (in seconds)
PAGE_SLEEP = 2 # Sleep between index pages
MOVIE_SLEEP_MIN = 5 # Minimum random sleep between movies
MOVIE_SLEEP_MAX = 15 # Maximum random sleep between movies
# =============================================================================
# JAVDB LOGIN CONFIGURATION (for automatic session cookie refresh)
# =============================================================================
# JavDB login credentials (optional - for custom URL scraping)
JAVDB_USERNAME = '' # Your JavDB email or username
JAVDB_PASSWORD = '' # Your JavDB password
# Session cookie (auto-updated by javdb_login.py)
JAVDB_SESSION_COOKIE = ''
# Optional: 2Captcha API key for automatic captcha solving
# Get from: https://2captcha.com/ (~$1 per 1000 captchas)
TWOCAPTCHA_API_KEY = '' # Leave empty for manual captcha input
# =============================================================================
# CLOUDFLARE BYPASS CONFIGURATION (Optional)
# =============================================================================
# CloudFlare bypass service port (must match the service port)
# See: https://github.com/sarperavci/CloudflareBypassForScraping
CF_BYPASS_SERVICE_PORT = 8000
# =============================================================================
# LOGGING CONFIGURATION
# =============================================================================
LOG_LEVEL = 'INFO'
SPIDER_LOG_FILE = 'logs/spider.log'
UPLOADER_LOG_FILE = 'logs/qb_uploader.log'
PIPELINE_LOG_FILE = 'logs/pipeline.log'
EMAIL_NOTIFICATION_LOG_FILE = 'logs/email_notification.log'
# =============================================================================
# FILE PATHS
# =============================================================================
REPORTS_DIR = 'reports'
DAILY_REPORT_DIR = 'reports/DailyReport'
AD_HOC_DIR = 'reports/AdHoc'
PARSED_MOVIES_CSV = 'parsed_movies_history.csv'
# =============================================================================
# PIKPAK CONFIGURATION (for PikPak Bridge)
# =============================================================================
# PikPak login credentials
PIKPAK_EMAIL = 'your_pikpak_email@example.com'
PIKPAK_PASSWORD = 'your_pikpak_password'
# PikPak settings
PIKPAK_LOG_FILE = 'logs/pikpak_bridge.log'
PIKPAK_REQUEST_DELAY = 3 # Delay between requests (seconds) to avoid rate limiting
# =============================================================================
# qBittorrent File Filter Configuration
# =============================================================================
# Minimum file size threshold in MB
# Files smaller than this will be set to "do not download" priority
# This helps filter out small files like NFO, samples, screenshots, etc.
QB_FILE_FILTER_MIN_SIZE_MB = 50
# Log file for the file filter script
QB_FILE_FILTER_LOG_FILE = 'logs/qb_file_filter.log'Setup Instructions:
- Copy
config.py.exampletoconfig.py - Update all the placeholder values with your actual credentials
- The
config.pyfile is automatically excluded from git commits for security
GitHub Authentication Setup:
- Go to GitHub Settings → Developer settings → Personal access tokens
- Generate a new token with
repopermissions - Use this token as
GIT_PASSWORD
qBittorrent Setup:
- Enable Web UI in qBittorrent settings
- Note the IP address, port, username, and password
- Update the qBittorrent configuration section in
config.py
Email Setup (Optional):
- For Gmail, use an App Password instead of your regular password
- Enable 2-factor authentication and generate an App Password
- Update the SMTP configuration section in
config.py
The spider generates CSV files with the following columns:
href: The video page URLvideo-title: The video titlepage: The page number where the entry was foundactor: The main actor/actress namerate: The rating scorecomment_number: Number of user comments/ratingshacked_subtitle: Magnet link for hacked version with subtitleshacked_no_subtitle: Magnet link for hacked version without subtitlessubtitle: Magnet link for subtitle versionno_subtitle: Magnet link for regular version (prefers 4K if available)size_hacked_subtitle,size_hacked_no_subtitle,size_subtitle,size_no_subtitle: Corresponding sizes
All report files are organized under the reports/ directory:
reports/
├── DailyReport/YYYY/MM/ # Daily report CSV files
│ └── Javdb_TodayTitle_YYYYMMDD.csv
├── AdHoc/YYYY/MM/ # Ad hoc report CSV files
│ └── Javdb_AdHoc_*.csv
├── Dedup/ # Rclone dedup reports
├── parsed_movies_history.csv # History tracking
├── pikpak_bridge_history.csv # PikPak transfer history
└── proxy_bans.csv # Proxy ban records
- Daily Report CSV files:
reports/DailyReport/YYYY/MM/Javdb_TodayTitle_YYYYMMDD.csv - Ad Hoc CSV files:
reports/AdHoc/YYYY/MM/Javdb_AdHoc_*.csv - History file:
reports/parsed_movies_history.csv - PikPak history:
reports/pikpak_bridge_history.csv - Proxy ban records:
reports/proxy_bans.csv - Log files:
logs/directoryspider.logqb_uploader.logpipeline.log
The spider includes an intelligent history system that tracks which torrent types have been found for each movie:
- Tracks ALL available torrent types per movie (e.g., both
hacked_subtitleandsubtitle) - Prevents redundant processing when movies already have complete torrent collections
- Only searches for torrent types that are missing based on preference rules
- Phase 1: Processes movies with missing torrent types based on preferences
- Phase 2: Only processes movies that can be upgraded from
no_subtitletohacked_no_subtitleor meet quality criteria - New Movies: Always processed regardless of history
Phase 2 includes configurable quality filtering based on user ratings and comment counts:
- Minimum Rating: Configurable via
PHASE2_MIN_RATE(default: 4.0) - Minimum Comments: Configurable via
PHASE2_MIN_COMMENTS(default: 80) - Purpose: Ensures only high-quality content is processed in phase 2
- Hacked Category: Always prefer
hacked_subtitleoverhacked_no_subtitle - Subtitle Category: Always prefer
subtitleoverno_subtitle - Complete Collection Goal: Each movie should have both categories represented
By default, the spider filters entries based on release date tags ("今日新種" or "昨日新種"). You can override this behavior in two ways:
# Ignore release date tags for a single run
python3 scripts/spider --ignore-release-date
# Or via pipeline
python pipeline_run_and_notify.py --ignore-release-dateSet IGNORE_RELEASE_DATE_FILTER = True in config.py to permanently ignore release date tags.
Behavior with --ignore-release-date or IGNORE_RELEASE_DATE_FILTER = True:
- Phase 1: Downloads ALL entries with subtitle tags, regardless of release date
- Phase 2: Downloads ALL entries meeting quality criteria (rate > 4.0, comments > 80), regardless of release date
This is useful when:
- You want to backfill your collection with older content
- You're scraping a custom URL (like an actor's page) where release date is not relevant
- You want to download everything matching the quality criteria
The system supports both single proxy and proxy pool modes for improved reliability:
Configure multiple proxies for automatic failover:
- Automatic Switching: When one proxy fails, automatically switches to another
- Passive Health Checking: Only marks proxies as failed on actual failures (no active probing)
- Cooldown Mechanism: Failed proxies are temporarily disabled to allow recovery (8 days default)
- Ban Detection: Automatically detects when proxies are banned by JavDB
- Persistent Ban Records: Ban history stored in
reports/proxy_bans.csvand persists across runs - Statistics Tracking: Detailed success rates and usage statistics for each proxy
- Perfect for JavDB: Respects strict rate limiting while providing redundancy
See PROXY_POOL_GUIDE.md for detailed configuration and usage guide.
Quick Setup:
# In config.py
PROXY_MODE = 'pool'
PROXY_POOL = [
{'name': 'Proxy-1', 'http': 'http://127.0.0.1:7890', 'https': 'http://127.0.0.1:7890'},
{'name': 'Proxy-2', 'http': 'http://127.0.0.1:7891', 'https': 'http://127.0.0.1:7891'},
]
PROXY_POOL_COOLDOWN_SECONDS = 691200 # 8 days cooldown (JavDB bans for 7 days)
PROXY_POOL_MAX_FAILURES = 3 # Max failures before cooldownProxy Ban Management:
The system includes intelligent ban detection and management:
- Automatic Detection: Detects when JavDB blocks a proxy IP
- Persistent Records: Ban history stored in
reports/proxy_bans.csv - 8-Day Cooldown: Default cooldown matches JavDB's 7-day ban period
- Exit Code 2: Spider exits with code 2 when proxies are banned (helps with automation)
- Ban Summary: Detailed ban status included in pipeline email reports
Checking Ban Status:
# Ban records are logged in:
cat "reports/proxy_bans.csv"
# Pipeline emails include ban summary with:
# - Proxy name and IP
# - Ban timestamp
# - Cooldown expiry time
# - Current status (BANNED/AVAILABLE)Then run with --use-proxy flag:
python3 scripts/spider --use-proxyThe spider also supports traditional single proxy configuration for HTTP/HTTPS/SOCKS5 proxies. This is useful if:
- JavDB is geo-restricted in your region
- You need to route traffic through a specific network
- You want to use a VPN or proxy service
1. Configure proxy in config.py:
# HTTP/HTTPS proxy
PROXY_HTTP = 'http://127.0.0.1:7890'
PROXY_HTTPS = 'http://127.0.0.1:7890'
# Or SOCKS5 proxy
PROXY_HTTP = 'socks5://127.0.0.1:1080'
PROXY_HTTPS = 'socks5://127.0.0.1:1080'
# With authentication
PROXY_HTTP = 'http://username:password@proxy.example.com:8080'
PROXY_HTTPS = 'http://username:password@proxy.example.com:8080'
# Control which modules use proxy (modular control)
PROXY_MODULES = ['all'] # Enable for all modules
# PROXY_MODULES = ['spider'] # Only spider module (includes login)
# PROXY_MODULES = ['spider', 'qbittorrent'] # Spider and qBittorrent
# PROXY_MODULES = [] # Disable for all modules2. Enable proxy with command-line flag:
# Enable proxy for spider
python3 scripts/spider --use-proxy
# Enable proxy for qBittorrent uploader
python qbtorrent_uploader.py --use-proxy
# Enable proxy for PikPak bridge
python pikpak_bridge.py --use-proxy
# Combine with other options
python3 scripts/spider --use-proxy --url "https://javdb.com/actors/EvkJ"
# Via pipeline (enables proxy for all components)
python pipeline_run_and_notify.py --use-proxyNote:
- Proxy is disabled by default. You must use
--use-proxyto enable it. - If
--use-proxyis set but no proxy is configured inconfig.py, a warning will be logged. - You can control which parts of the spider use proxy via
PROXY_MODULESconfiguration.
The PROXY_MODULES setting allows fine-grained control over which parts use proxy:
| Module | Description | Use Case |
|---|---|---|
spider |
JavDB Spider | Use proxy to access all JavDB pages (index, detail, login/session refresh) |
qbittorrent |
qBittorrent Web UI API | Use proxy for qBittorrent API requests |
pikpak |
PikPak bridge qBittorrent API | Use proxy for PikPak bridge operations |
all |
All modules | Use proxy for everything (default) |
Examples:
# Use proxy for everything
PROXY_MODULES = ['all']
# Only use proxy for spider module (includes login)
PROXY_MODULES = ['spider']
# Use proxy for spider and qBittorrent
PROXY_MODULES = ['spider', 'qbittorrent']
# Only use proxy for qBittorrent and PikPak, not spider
PROXY_MODULES = ['qbittorrent', 'pikpak']
# Use proxy for spider only, not qBittorrent/PikPak
PROXY_MODULES = ['spider']
# Disable proxy for all modules (even if --use-proxy is set)
PROXY_MODULES = []Common Scenarios:
- Geo-restricted JavDB only:
PROXY_MODULES = ['spider'] - Local qBittorrent behind firewall:
PROXY_MODULES = ['qbittorrent', 'pikpak'] - Everything through proxy:
PROXY_MODULES = ['all']
- HTTP:
http://proxy.example.com:8080 - HTTPS:
https://proxy.example.com:8080 - SOCKS5:
socks5://proxy.example.com:1080(requiresrequests[socks]package)
If you want to use SOCKS5 proxy, install the additional dependency:
pip install requests[socks]Error: 500 Internal Server Error
- Check if proxy server is running and accessible
- Verify proxy credentials (username/password)
- If password contains special characters, URL-encode them:
from urllib.parse import quote password = "My@Pass!" encoded = quote(password, safe='') print(encoded) # Output: My%40Pass%21
- Test proxy manually:
curl -x http://username:password@proxy:port https://javdb.com
Error: Connection refused or timeout
- Check if proxy server is running:
telnet proxy_ip proxy_port - Verify firewall rules allow connection to proxy
- Check if proxy requires authentication
Proxy works but downloads fail
- Some proxies don't support magnet links or torrents
- Try different proxy or use direct connection for qBittorrent/PikPak:
PROXY_MODULES = ['spider']
Password with special characters Common special characters that need URL encoding:
@→%40:→%3A(only in password, not after@)/→%2F?→%3F#→%23&→%26=→%3D+→%2B- Space →
%20 !→%21
Example: http://user:My@Pass!123@proxy:8080 becomes http://user:My%40Pass%21123@proxy:8080
The system supports integration with CloudflareBypassForScraping for handling CloudFlare protection on JavDB.
CloudFlare Bypass is an optional feature that helps you access JavDB when CloudFlare protection is enabled. It uses the CloudflareBypassForScraping service which automatically:
- Handles CloudFlare challenges
- Manages cf_clearance cookies
- Provides transparent request forwarding (Request Mirroring mode)
1. Install CloudflareBypassForScraping:
# Clone the repository
git clone https://github.com/sarperavci/CloudflareBypassForScraping.git
cd CloudflareBypassForScraping
# Install dependencies
pip install -r requirements.txt
# Configure (edit config.json if needed)
# Default port is 80002. Start the CF Bypass Service:
# Local setup (default)
python app.py
# Custom port (update CF_BYPASS_SERVICE_PORT in config.py to match)
python app.py --port 80003. Configure Spider:
Edit config.py to set the CF bypass service port:
# CloudFlare Bypass Configuration
CF_BYPASS_SERVICE_PORT = 8000 # Must match the service port4. CF Bypass Behavior:
CF bypass is automatically activated as a fallback when direct requests fail during the proxy pool fallback mechanism. No command-line flag is needed.
When CF bypass is activated during fallback:
- Request Mirroring: Requests are forwarded through the CF bypass service
- URL Rewriting: Original URL
https://javdb.com/page→http://localhost:8000/page - Host Header: The original hostname is sent via
x-hostnameheader - Cookie Management: CF bypass service handles cf_clearance cookies automatically
- Transparent: Your spider code doesn't need any changes
Local Setup:
Spider → http://localhost:8000 → CloudFlare Bypass Service → https://javdb.com
With Proxy:
Spider → http://proxy_ip:8000 → CF Bypass on Proxy Server → https://javdb.com
When using proxy pool, the CF bypass service URL automatically adjusts to match the current proxy IP.
# In config.py
CF_BYPASS_SERVICE_PORT = 8000 # CF bypass service port (default: 8000)Service Location Logic:
- No Proxy: Uses
http://localhost:8000 - With Proxy Pool: Uses
http://{proxy_ip}:8000(extracts IP from current proxy URL)
This allows you to run CF bypass service on the same server as your proxy for better performance.
Use CloudFlare Bypass when:
- ✅ JavDB shows CloudFlare challenge page
- ✅ You get "Access Denied" or "Checking your browser" errors
- ✅ Direct access works in browser but fails in script
- ✅ Proxy alone doesn't bypass CloudFlare protection
Error: "Connection refused to localhost:8000"
- Make sure CF bypass service is running
- Check if port 8000 is available:
netstat -an | grep 8000 - Update
CF_BYPASS_SERVICE_PORTif using different port
Error: "No movie list found" with CF bypass
- Check CF bypass service logs for errors
- Verify
x-hostnameheader is being sent correctly - Try restarting the CF bypass service
CF Bypass + Proxy Not Working
- Ensure CF bypass service is running on the proxy server
- Verify proxy IP extraction is correct (check logs)
- Test CF bypass service directly:
curl http://proxy_ip:8000/
- First Request: Slower (CF challenge solving)
- Subsequent Requests: Fast (cookie cached)
- Cookie TTL: Varies (usually hours to days)
- Overhead: Minimal after first request
The system includes automatic login functionality to maintain session cookies for custom URL scraping.
When scraping custom URLs (actors, tags, etc.) with --url parameter, JavDB requires a valid session cookie. This cookie expires after some time, causing failures with age verification or login issues.
Auto login solves this by:
- ✅ Automatically logging into JavDB
- ✅ Handling age verification automatically
- ✅ Extracting and updating session cookie
- ✅ Supporting captcha (manual input or 2Captcha API)
1. Configure credentials in config.py:
# JavDB login credentials (for automatic session cookie refresh)
JAVDB_USERNAME = 'your_email@example.com' # or username
JAVDB_PASSWORD = 'your_password'
# Optional: 2Captcha API key for automatic captcha solving
TWOCAPTCHA_API_KEY = '' # Leave empty for manual captcha input2. Run the login script:
python3 javdb_login.py3. Enter captcha when prompted:
The script will:
- Download and save captcha image to
javdb_captcha.png - Automatically open the image (if possible)
- Prompt you to enter the captcha code
4. Use the spider with custom URLs:
# Spider with custom URL
python3 scripts/spider --url "https://javdb.com/actors/RdEb4"
# Pipeline with custom URL
python3 pipeline_run_and_notify.py --url "https://javdb.com/actors/RdEb4"Manual Input (Default):
- Script downloads captcha image
- Opens image automatically (platform-dependent)
- You enter the code when prompted
- Simple and free
2Captcha API (Optional):
- Sign up at 2Captcha
- Add API key to
config.py:TWOCAPTCHA_API_KEY = 'your_key' - Script automatically solves captchas (~$1 per 1000 captchas)
- Fully automated but costs money
# In config.py
# Login credentials (required)
JAVDB_USERNAME = 'your_email@example.com'
JAVDB_PASSWORD = 'your_password'
# Session cookie (auto-updated by javdb_login.py)
JAVDB_SESSION_COOKIE = ''
# Optional: 2Captcha API key
TWOCAPTCHA_API_KEY = '' # For automatic captcha solving
# Optional: Manual cookie extraction
# Get from browser DevTools → Application → Cookies → _jdb_session
# JAVDB_SESSION_COOKIE = 'your_session_cookie_here'Re-run python3 javdb_login.py when:
- ✅ Session cookie expires (usually after days/weeks)
- ✅ Spider shows "No movie list found" on valid URLs
- ✅ Age verification or login errors appear
- ✅ Before using
--urlparameter for first time
Cron Job (Linux/Mac):
# Refresh cookie every 7 days
0 0 */7 * * cd ~/JAVDB_AutoSpider && python3 javdb_login.py >> logs/javdb_login.log 2>&1Task Scheduler (Windows):
- Set up scheduled task to run
javdb_login.pyweekly
The script includes an optional OCR-based captcha solver in utils/login/javdb_captcha_solver.py:
# Free methods (included)
solve_captcha(image_data, method='ocr') # Local OCR (Tesseract)
solve_captcha(image_data, method='manual') # Manual input
# Paid method (requires API key)
solve_captcha(image_data, method='2captcha') # 2Captcha API
solve_captcha(image_data, method='auto') # Try OCR first, fallback to 2CaptchaInstalling Tesseract OCR (Optional):
# Ubuntu/Debian
sudo apt-get install tesseract-ocr
# macOS
brew install tesseract
# Windows
# Download installer from: https://github.com/UB-Mannheim/tesseract/wikiLogin Failed - Incorrect Captcha:
- Captcha is case-sensitive
- Try again for a new captcha
- Consider using 2Captcha API
Login Failed - Invalid Credentials:
- Verify username/password in config.py
- Test credentials in browser first
- Check for typos
Session Cookie Not Working:
- Verify cookie updated in config.py
- Use same proxy/network for login and spider
- Try logging in again
For detailed troubleshooting and manual cookie extraction, see JavDB Login Guide.
The system includes an advanced duplicate download prevention feature that automatically marks downloaded torrents and skips them in future runs.
This feature implements automatic marking of downloaded torrents in daily reports and skips these downloaded torrents in the qBittorrent uploader to avoid duplicate downloads. The system also includes enhanced history tracking with create and update timestamps.
- Automatic Detection of Downloaded Torrents: Automatically identifies which torrents have been downloaded by checking the history CSV file
- Add Indicators: Adds
[DOWNLOADED]prefix to downloaded torrents in daily report CSV files - Skip Duplicate Downloads: qBittorrent uploader automatically skips torrents with
[DOWNLOADED]indicators - Support Multiple Torrent Types: Supports four types: hacked_subtitle, hacked_no_subtitle, subtitle, no_subtitle
- Enhanced History Tracking: Tracks create_date (first discovery) and update_date (latest modification) for each movie
The history CSV file now uses an enhanced format with individual columns for each torrent type:
Old Format:
href,phase,video_code,parsed_date,torrent_type
New Format:
href,phase,video_code,create_date,update_date,last_visited_datetime,hacked_subtitle,hacked_no_subtitle,subtitle,no_subtitle
create_date: When the movie was first discovered and loggedupdate_date: When the movie was last updated with new torrent typeslast_visited_datetime: When the movie detail page was last visitedhacked_subtitle: Download date for hacked version with subtitles (empty if not downloaded)hacked_no_subtitle: Download date for hacked version without subtitles (empty if not downloaded)subtitle: Download date for subtitle version (empty if not downloaded)no_subtitle: Download date for regular version (empty if not downloaded)- Backward compatibility is maintained for existing files
- Daily Report Generation: Spider generates daily report CSV file
- History Check: Uploader checks history CSV file when starting
- Add Indicators: Add
[DOWNLOADED]prefix to downloaded torrents - Skip Processing: Skip torrents with indicators when reading CSV
- Upload New Torrents: Only upload torrents that haven't been downloaded
- Update History: When new torrent types are found, update_date is modified
CSV Before Modification:
href,video_code,hacked_subtitle,subtitle
/v/mOJnXY,IPZZ-574,magnet:?xt=...,magnet:?xt=...
CSV After Modification:
href,video_code,hacked_subtitle,subtitle
/v/mOJnXY,IPZZ-574,[DOWNLOADED] magnet:?xt=...,[DOWNLOADED] magnet:?xt=...
History File Format:
href,phase,video_code,create_date,update_date,hacked_subtitle,hacked_no_subtitle,subtitle,no_subtitle
/v/mOJnXY,1,IPZZ-574,2025-07-09 20:00:57,2025-07-09 20:05:30,2025-07-09 20:05:30,,2025-07-09 20:05:30,
Uploader Log:
2025-07-09 22:09:23,182 - INFO - Adding downloaded indicators to CSV file...
2025-07-09 22:09:23,183 - INFO - Added downloaded indicators to Daily Report/Javdb_TodayTitle_20250709.csv
2025-07-09 22:09:23,183 - INFO - Found 0 torrent links in Daily Report/Javdb_TodayTitle_20250709.csv
2025-07-09 22:09:23,183 - INFO - Skipped 20 already downloaded torrents
- History File Dependency: Feature depends on
reports/parsed_movies_history.csvfile - Indicator Format: Downloaded indicator format is
[DOWNLOADED](note the space) - Backward Compatibility: If history file doesn't exist, feature will gracefully degrade without affecting normal use
- Performance Optimization: History check uses efficient CSV reading, won't significantly impact performance
- Timestamp Tracking: create_date remains constant while update_date changes with each modification
- Torrent Type Merging: When updating existing records, new torrent types are merged with existing ones
The system automatically handles migration from the old format (parsed_date) to the new format (create_date, update_date). Existing files are automatically converted with backward compatibility.
This feature ensures system stability and efficiency, avoiding duplicate downloads while maintaining comprehensive history tracking with enhanced timestamp management.
The migration/ directory contains utility scripts for maintaining and upgrading the system:
cleanup_history_priorities.py
- Removes duplicate entries from history file
- Ensures data integrity
- Safe to run multiple times
update_history_format.py
- Migrates old history format to new format
- Converts
parsed_datetocreate_date/update_date - Automatic backward compatibility
rename_columns_add_last_visited.py
- Renames date columns and adds
last_visited_datetimefield - Required when upgrading to support the new history format
migrate_reports_to_dated_dirs.py
- Migrates flat report files into
YYYY/MM/dated subdirectories - Required when upgrading to the new reports directory structure
reclassify_c_hacked_torrents.py
- Reclassifies torrents with specific naming patterns
- Updates torrent type classification
- Useful after classification rule changes
Run migration scripts when:
- ✅ Upgrading from older versions
- ✅ History file shows duplicate entries
- ✅ Format changes are introduced
- ✅ Data cleanup is needed
cd migration
python3 cleanup_history_priorities.py
python3 update_history_format.py
python3 rename_columns_add_last_visited.py
python3 reclassify_c_hacked_torrents.pyNote: Always backup your reports/parsed_movies_history.csv before running migration scripts.
The system provides comprehensive logging:
- INFO: General progress information with tracking
- WARNING: Non-critical issues
- DEBUG: Detailed debugging information
- ERROR: Critical errors
Progress tracking includes:
[Page 1/5]- Page-level progress[15/75]- Entry-level progress across all pages[1/25]- Upload progress for qBittorrent
Spider Issues:
- No entries found: Check if the website structure has changed
- Connection errors: Verify internet connection and website accessibility
- CSV not generated: Check if the
reports/DailyReportdirectory exists
qBittorrent Issues:
- Cannot connect: Check if qBittorrent is running and Web UI is enabled
- Login failed: Verify username and password in configuration
- CSV file not found: Run the spider first to generate the CSV file
Git Issues:
- Authentication failed: Verify username and token/password
- Repository not found: Check repository URL and access permissions
- Branch issues: Ensure the branch exists in your repository
Downloaded Indicator Issues:
- Indicators not added: Check if history file exists and has correct format
- Uploader skipping too many torrents: Check if history file contains outdated records
- Import errors: Ensure
utils/history_manager.pyfile exists - History format issues: Ensure history file has correct column structure with backward compatibility
JavDB Login Issues:
- Login failed: Check credentials in config.py
- Captcha errors: Try again for new captcha, or use 2Captcha API
- Cookie not working: Verify cookie updated in config.py, use same proxy for login and spider
- See JavDB Login Guide for detailed troubleshooting
CloudFlare Bypass Issues:
- Connection refused: Ensure CF bypass service is running
- Port errors: Verify CF_BYPASS_SERVICE_PORT matches service port
- No movie list found: Check CF bypass service logs
- Proxy + CF not working: Ensure CF bypass service runs on proxy server
Proxy Ban Issues:
- All proxies banned: Check
reports/proxy_bans.csvfor ban status - Spider exits with code 2: Indicates proxy ban detected, wait for cooldown or add new proxies
- Cooldown not working: Default is 8 days, adjust PROXY_POOL_COOLDOWN_SECONDS if needed
- Ban false positives: Check if JavDB is actually accessible from proxy IP
To see detailed operations, you can temporarily increase logging level in the scripts:
# In config.py
LOG_LEVEL = 'DEBUG' # Shows detailed debug information- Configuration file:
config.pyis automatically excluded from git commits (check.gitignore) - Never commit credentials: GitHub tokens, passwords, API keys should stay in
config.pyonly - GitHub authentication: Use personal access tokens instead of passwords
- JavDB credentials: Only stored locally in
config.py, never transmitted except to JavDB - PikPak credentials: Stored in
config.py, used only for PikPak API - 2Captcha API key: Optional, only used if configured for automatic captcha solving
- Proxy passwords: Use URL encoding for special characters in passwords
- Session cookies: Auto-updated by login script, expire after some time
- Sensitive logs: Pipeline automatically masks sensitive info in logs and emails
- Environment variables (optional): Consider for production deployments
import os JAVDB_USERNAME = os.getenv('JAVDB_USER', '') JAVDB_PASSWORD = os.getenv('JAVDB_PASS', '')
- The system includes delays between requests to be respectful to servers:
- Index pages: 2 seconds (configurable via
PAGE_SLEEP) - Movies: 5-15 seconds random (configurable via
MOVIE_SLEEP_MIN/MOVIE_SLEEP_MAX) - Volume-based adjustment:
MovieSleepManagerautomatically increases sleep intervals when processing large batches - qBittorrent additions: 1 second (configurable via
DELAY_BETWEEN_ADDITIONS) - PikPak requests: 3 seconds (configurable via
PIKPAK_REQUEST_DELAY)
- Index pages: 2 seconds (configurable via
- The system uses proper headers to mimic a real browser
- CSV files are automatically saved to the
reports/DailyReport/YYYY/MM/orreports/AdHoc/YYYY/MM/directory - The pipeline provides incremental commits for monitoring progress in real-time
- History file tracks all downloaded movies with timestamps
- Rust acceleration is automatically detected and used when available
- Exit code 2 indicates proxy ban detection (useful for automation)
- Logs automatically mask sensitive information (passwords, tokens, etc.)
- scripts/spider/: Spider package (modular architecture)
__main__.py: Package entry point (python3 scripts/spider)main.py: Main orchestration flowcli.py: Command-line argument parsingparallel.py: Multi-threaded detail processing (ProxyWorker)sequential.py: Sequential detail processingindex_fetcher.py: Index page fetchingfallback.py: Multi-level fallback (proxy/CF/login)session.py: Login and session managementsleep_manager.py: Volume-based sleep managementstate.py: Global state managementcsv_builder.py: CSV row constructionreport.py: Summary report generation
- rust_core/: Rust acceleration extension (PyO3 + maturin)
src/scraper/: HTML parsing (index, detail, category pages)src/proxy/: Proxy pool, ban manager, maskingsrc/requester/: HTTP request handlersrc/history/: History CSV managementsrc/csv_writer.rs,src/magnet_extractor.rs,src/url_helper.rs
- api/: FastAPI REST API layer
- reports/: Contains all report files and history
DailyReport/YYYY/MM/: Daily scraping resultsAdHoc/YYYY/MM/: Custom URL scraping resultsparsed_movies_history.csv: History trackingpikpak_bridge_history.csv: PikPak transfer historyproxy_bans.csv: Proxy ban records
- logs/: Contains all log files
spider.log: Spider execution logsqb_uploader.log: Upload execution logspipeline.log: Pipeline execution logspikpak_bridge.log: PikPak bridge execution logsqb_file_filter.log: File filter execution logs
- migration/: Contains database migration scripts
- utils/: Utility modules (history, parser, proxy pool, etc.)
- utils/login/: JavDB login related files and documentation
- docker/: Docker configuration files
# Basic daily scraping
python3 scripts/spider
python3 qbtorrent_uploader.py
# Full automated pipeline
python3 pipeline_run_and_notify.py
# Scrape with proxy
python3 scripts/spider --use-proxy
python3 pipeline_run_and_notify.py --use-proxy
# Scrape with proxy (CF bypass activates automatically as fallback)
python3 scripts/spider --use-proxy
python3 pipeline_run_and_notify.py --use-proxy
# Custom URL scraping (requires login)
python3 javdb_login.py # First time setup
python3 scripts/spider --url "https://javdb.com/actors/RdEb4"
python3 pipeline_run_and_notify.py --url "https://javdb.com/actors/RdEb4"
# Scrape ignoring release date
python3 scripts/spider --ignore-release-date --phase 1
python3 pipeline_run_and_notify.py --ignore-release-date
# Ad hoc mode
python3 scripts/spider --url "https://javdb.com/tags/xyz"
python3 qbtorrent_uploader.py --mode adhoc
# PikPak bridge
python3 pikpak_bridge.py # Default: 3 days, batch mode
python3 pikpak_bridge.py --days 7 --individual # Custom days, individual mode
# qBittorrent File Filter
python3 scripts/qb_file_filter.py --min-size 50 # Filter files < 50MB
python3 scripts/qb_file_filter.py --min-size 100 --days 3 --dry-run # Preview mode- Main config:
config.py(copy fromconfig.py.example) - History file:
reports/parsed_movies_history.csv - Ban records:
reports/proxy_bans.csv - Login docs:
utils/login/JAVDB_LOGIN_README.md
- CloudFlare Bypass Service
- 2Captcha API (optional, for automatic captcha solving)
- JavDB Login Guide
- Rust Installation Guide (macOS)
- API Usage Guide
Contributions are welcome! Please feel free to submit issues or pull requests.
This project is for educational and personal use only. Please respect the terms of service of the websites you scrape.