auto.ria.com Scraper

📖 Русская версия документации

High-performance asynchronous application for collecting used car data from the auto.ria.com platform.

📋 Description

AutoRia Scraper is an efficient tool for collecting car data from the auto.ria.com website. The application uses an asynchronous approach based on httpx+BeautifulSoup4 for maximum performance and resource efficiency.

Collected data:

💰 Price information in USD
🔍 Car characteristics (mileage, VIN code, license plate)
👤 Seller contact information (name, phone)
🖼️ Photo and media information
📊 Date and time when the listing was discovered by the scraper

Advantages:

⚡ High performance — asynchronous HTTP requests with httpx
🔄 Resilience — automatic retry attempts on errors
📈 Scalability — configurable number of concurrent requests
🧠 Intelligent data collection — two-stage collection process (main data + phone)
📝 Detailed logging — tracking all stages of data collection
🗃️ Automatic backups — regular database backup

🔧 Technologies

Python 3.10 — modern programming language version
PostgreSQL — reliable relational database for data storage
SQLAlchemy — powerful ORM for database operations
httpx — next-generation asynchronous HTTP client
BeautifulSoup4 — efficient HTML page parser
asyncio — library for asynchronous programming
Celery — distributed task queue for process automation
Docker & Docker Compose — containerization for easy deployment

📂 Project Structure

├── .dockerignore
├── .env                # Environment variables (create manually)
├── .env.example        # Example .env file
├── .gitignore
├── Dockerfile          # Application Docker image
├── README.md           # Documentation
├── README_RU.md        # Russian documentation
├── docker-compose.yml  # Docker Compose configuration
├── requirements.txt    # Python dependencies
├── tests/              # Tests
├── logs/               # Application logs
│   └── scraper.log
├── dumps/              # Database dumps
│   └── autoria_dump_YYYY-MM-DD_HH-MM-SS.sql
└── app/                # Main application code
    ├── __init__.py
    ├── main.py         # Entry point
    ├── core/           # Database and models
    │   ├── __init__.py
    │   ├── database.py
    │   └── models.py
    ├── config/         # Configuration
    │   ├── __init__.py
    │   ├── celery_config.py
    │   └── settings.py
    ├── utils/          # Utilities
    │   ├── __init__.py
    │   ├── db_dumper.py
    │   ├── db_utils.py
    │   └── logger.py
    ├── scraper/        # Parsing logic
    │   ├── __init__.py
    │   ├── autoria.py  # Main scraper
    │   ├── base.py     # Base scraper class
    │   └── parsers/
    │       ├── car_page.py    # Car page parser
    │       └── search_page.py # Search page parser
    └── tasks/          # Celery tasks
        ├── __init__.py
        ├── backup.py   # Backup tasks
        └── scraping.py # Data collection tasks

🚀 Installation and Launch

Via Docker (recommended)

Clone the repository:

git clone https://github.com/ursaloper/auto.ria-scraper
cd auto.ria-scraper

Create .env file based on .env.example:

cp .env.example .env

Configure environment variables in .env:

nano .env

Launch the application:

docker-compose up -d

View logs:

docker-compose logs -f

Local Installation

Create virtual environment:

python -m venv venv
source venv/bin/activate  # Linux/MacOS
# or
venv\Scripts\activate     # Windows

Install dependencies:

pip install -r requirements.txt

Configure .env file:

cp .env.example .env
nano .env

Launch the application:

python -m app.main

🤖 Celery Management

For manual task execution and queue monitoring use:

Scraping and Backup Tasks

Create database dump manually:

docker-compose exec celery_worker celery -A app call app.tasks.backup.manual_backup

Run scraping manually:

docker-compose exec celery_worker celery -A app call app.tasks.scraping.manual_scrape

Run scraping from specific URL:

docker-compose exec celery_worker celery -A app call app.tasks.scraping.manual_scrape --args='["https://auto.ria.com/uk/car/mercedes-benz/"]'

Celery Monitoring

Show registered tasks:

docker-compose exec celery_worker celery -A app inspect registered

Show queued tasks:

docker-compose exec celery_worker celery -A app inspect reserved

Show active tasks:

docker-compose exec celery_worker celery -A app inspect active

Show completed tasks history:

docker-compose exec celery_worker celery -A app inspect revoked

⚙️ Configuration

Main settings are located in the .env file:

Parameter	Description	Example
`DATABASE_URL`	PostgreSQL connection URL	`postgresql://user:password@postgres:5432/autoria`
`SCRAPER_START_TIME`	Data collection start time	`12:00` (daily at 12:00)
`DUMP_TIME`	Database dump creation time	`00:00` (daily at 00:00)
`SCRAPER_START_URL`	Starting page for data collection	`https://auto.ria.com/uk/car/used/`
`MAX_PAGES_TO_PARSE`	Maximum number of pages to parse	`10`
`MAX_CARS_TO_PROCESS`	Maximum number of cars to process	`100`
`SCRAPER_CONCURRENCY`	Maximum number of concurrent requests	`5`

🚄 Performance

Parser speed depends on the SCRAPER_CONCURRENCY parameter, which determines the number of concurrent requests. In practice, due to auto.ria.com site limitations and server-side delays, actual speed may differ from theoretical.

Test Results:

Processed: 500 cars
Added to DB: 495-496 new records
Execution time: ~6-7 minutes (360-380 seconds)
Efficiency: 99% (percentage of successfully processed listings)

Important:

Increasing SCRAPER_CONCURRENCY above 5-7 practically doesn't speed up data collection due to site limitations and delays on auto.ria.com side.

Too high values may lead to temporary IP address blocking.

Recommended to use values 5-7 for stable and safe operation.

💾 Database Dumps

Dumps are created automatically daily at specified time
Stored in dumps/ directory
Filename format: autoria_dump_YYYY-MM-DD_HH-MM-SS.sql
Automatic deletion of old dumps (stored for 30 days by default)

📊 Logging

The logging system provides detailed information about application operation:

All logs are available in logs/scraper.log file
Log rotation is configured (maximum file size: 10MB)
Separate logging for each module
Logging levels: DEBUG, INFO, WARNING, ERROR, CRITICAL

🛠️ Development

Code Style

The project uses Black for code formatting:

# Format code
black app/

# Check formatting
black --check app/

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🤝 Contributing

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📞 Support

If you have questions or need help:

Create an Issue
Check the Russian documentation

⭐ Star History

If this project helped you, please give it a star! ⭐

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

auto.ria.com Scraper

📋 Description

Collected data:

Advantages:

🔧 Technologies

📂 Project Structure

🚀 Installation and Launch

Via Docker (recommended)

Local Installation

🤖 Celery Management

Scraping and Backup Tasks

Celery Monitoring

⚙️ Configuration

🚄 Performance

💾 Database Dumps

📊 Logging

🛠️ Development

Code Style

📄 License

🤝 Contributing

📞 Support

⭐ Star History

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
app		app
.dockerignore		.dockerignore
.env.example		.env.example
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
README_RU.md		README_RU.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

License

ursaloper/auto.ria-scraper

Folders and files

Latest commit

History

Repository files navigation

auto.ria.com Scraper

📋 Description

Collected data:

Advantages:

🔧 Technologies

📂 Project Structure

🚀 Installation and Launch

Via Docker (recommended)

Local Installation

🤖 Celery Management

Scraping and Backup Tasks

Celery Monitoring

⚙️ Configuration

🚄 Performance

💾 Database Dumps

📊 Logging

🛠️ Development

Code Style

📄 License

🤝 Contributing

📞 Support

⭐ Star History

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages