A Python web scraper built with Selenium WebDriver (Edge) to extract data from vidange*tn.
- 🌐 Selenium WebDriver with Edge browser
- 🔐 Interactive CLI Menu: Choose between TUN or RS plates before starting.
- 📝 Comprehensive Logging: Console and file-based logs.
- 🛡️ Anti-detection: User agents and automation flags.
- 🔄 Automatic WebDriver Management: Uses local driver or downloads if needed.
- Python 3.8 or higher
- Microsoft Edge browser installed
- Internet connection
- Navigate to the project directory
- Install dependencies
pip install -r requirements.txt
- Configure environment variables
Copy
.env.exampleto.env:copy .env.example .env
-
Start the server:
python main.py
The API will be available at
http://localhost:8000. -
Scrape via API:
- TUN Plate:
http://localhost:8000/scrape/tun/{serie}/{num}curl "http://localhost:8000/scrape/tun/153/3601" - RS Plate:
http://localhost:8000/scrape/rs/{num_rs}curl "http://localhost:8000/scrape/rs/12345"
- TUN Plate:
You can still run the scraper directly for a quick test:
python scraper.pyEdit .env file to customize:
TARGET_URL: The website to scrape (default: https://vidange.tn)HEADLESS: Run browser in headless mode (True/False)BROWSER_TYPE:chromeoredge(default:edge)IMPLICIT_WAIT: Seconds to wait for elements (default: 10)OUTPUT_DIR: Directory to save scraped data (default: output)
The scraper now supports both Chrome and Edge and uses webdriver-manager to automatically download the correct drivers.
- Install Chrome:
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb sudo apt install ./google-chrome-stable_current_amd64.deb
- Configure
.env: SetBROWSER_TYPE=chromeandHEADLESS=Truein your.envfile.
PlateScraper/
├── .env # Environment variables
├── .gitignore # Git ignore rules
├── requirements.txt # Python dependencies
├── scraper.py # Main scraper script
├── README.md # This file
└── scraper.log # Log file
This project is for educational purposes only. Ensure you have permission to scrape the target website.