Skip to content

High-performance real-time market data processing system designed for low-latency trading applications. This system ingests, processes, and distributes market data with sub-millisecond latency requirements.

Notifications You must be signed in to change notification settings

claeusdev/rtpipe

Repository files navigation

Real-Time Market Data Pipeline

High-performance real-time market data processing system designed for low-latency trading applications. This system ingests, processes, and distributes market data with sub-millisecond latency requirements.

Architecture

Market Data Sources → WebSocket Client → Kafka → Stream Processor → InfluxDB/Redis → API/Dashboard

Components

  • Data Ingestion: WebSocket clients for multiple exchanges
  • Message Queue: Apache Kafka for reliable data streaming
  • Stream Processing: Real-time data normalization and enrichment
  • Storage: InfluxDB for time-series data, Redis for low-latency access
  • API: FastAPI for real-time data access
  • Monitoring: Prometheus metrics and Grafana dashboards

Features

Performance

  • Sub-millisecond processing latency
  • 1M+ messages per second throughput
  • Horizontal scaling capabilities
  • Memory-optimized data structures

Data Processing

  • Multi-exchange data normalization
  • Real-time trade and quote processing
  • Order book reconstruction
  • Market data quality validation
  • Anomaly detection and alerting

Technology Stack

  • Python 3.11+: Core application logic
  • uv: Fast Python package manager and project manager
  • Apache Kafka: Message streaming platform
  • InfluxDB: Time-series database
  • Redis: In-memory data store
  • FastAPI: REST API framework
  • asyncio: Asynchronous programming
  • Docker: Containerization

Getting Started

Prerequisites

  • Docker and Docker Compose
  • Python 3.11+
  • uv (fast Python package manager)
  • 8GB+ RAM recommended

Quick Start

# Clone and setup
cd real_time_pipeline

# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install dependencies
uv sync

# Start infrastructure
docker-compose up -d

# Initialize databases
uv run python scripts/setup_db.py

# Generate test data (optional)
uv run python scripts/generate_test_data.py

# Start data pipeline
uv run python main.py

# View dashboard
open http://localhost:8080

Database Setup

The pipeline requires InfluxDB and Redis to be properly initialized before starting:

# Initialize all databases
uv run python scripts/setup_db.py

# Or initialize specific databases
uv run python scripts/setup_db.py --influxdb-only
uv run python scripts/setup_db.py --redis-only

This script will:

  • Create InfluxDB buckets with proper retention policies
  • Set up Redis data structures for symbols and metadata
  • Test connectivity to both databases
  • Provide detailed logging of the initialization process

Test Data Generation

For development and testing, you can generate sample market data:

# Generate all test data
uv run python scripts/generate_test_data.py

# Generate data for specific symbol
uv run python scripts/generate_test_data.py --symbol BTC-USD --count 1000

# Custom output directory
uv run python scripts/generate_test_data.py --output-dir my_test_data

Generated data includes:

  • Trade Data: Realistic trade records with price movements
  • Order Books: Sample bid/ask data for each symbol
  • Ticker Data: 24h/30d volume and price statistics
  • Configuration: Summary of generated data

Performance Benchmarks

Metric Value
Processing Latency <500μs
Throughput 1.2M msg/sec
Memory Usage <2GB
CPU Usage <50% (4 cores)

Configuration

Key configuration options in config.yaml:

  • Exchange connections
  • Kafka topics and partitions
  • Processing batch sizes
  • Storage retention policies
  • Alert thresholds

Monitoring

Metrics Tracked

  • Message processing latency (p50, p95, p99)
  • Throughput (messages/second)
  • Error rates and types
  • Memory and CPU utilization
  • Queue depths and processing lags

Alerts

  • High latency alerts (>1ms)
  • Message loss detection
  • Exchange disconnections
  • System resource limits

Testing

# Unit tests
uv run pytest tests/unit/

# Integration tests
uv run pytest tests/integration/

# Performance tests
uv run python tests/performance/load_test.py

# Generate test data for testing
uv run python scripts/generate_test_data.py

# Test database connectivity
uv run python scripts/setup_db.py --influxdb-only
uv run python scripts/setup_db.py --redis-only

Data Sources

Supported exchanges:

  • Binance (WebSocket)
  • Coinbase Advanced Trade (WebSocket) - using coinbase-advanced-py
  • Kraken (WebSocket)
  • Custom exchange adapters

API Endpoints

  • GET /health - System health check
  • GET /metrics - Prometheus metrics
  • GET /symbols - Available trading symbols
  • GET /trades/{symbol} - Recent trades
  • GET /orderbook/{symbol} - Current order book
  • WebSocket /ws - Real-time data stream

Deployment

Production Deployment

# Build images
docker build -t market-data-pipeline .

# Deploy with Kubernetes
kubectl apply -f k8s/

# Scale components
kubectl scale deployment processor --replicas=5

Configuration Management

  • Environment-specific configs
  • Secrets management with Vault
  • Feature flags for gradual rollouts

Development Setup

Using uv (Recommended)

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install dependencies including dev tools
uv sync --dev

# Run tests
uv run pytest

# Format code
uv run black .
uv run isort .

# Type checking
uv run mypy src/

# Linting
uv run flake8 src/

Alternative: Using pip

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -e ".[dev,test]"

# Run tests
pytest

Troubleshooting

Common Issues

Database Connection Errors:

# Check if databases are running
docker-compose ps

# Restart databases
docker-compose restart influxdb redis

# Reinitialize databases
uv run python scripts/setup_db.py

Dependency Conflicts:

# Clean and reinstall dependencies
rm -rf .venv uv.lock
uv sync

Port Conflicts:

# Check what's using ports
lsof -i :8000  # API port
lsof -i :9090  # Prometheus port
lsof -i :3000  # Grafana port

Docker Issues:

# Clean Docker resources
docker-compose down -v
docker system prune -f
docker-compose up -d

Logs and Debugging

# View application logs
uv run python main.py --log-level DEBUG

# View Docker logs
docker-compose logs -f kafka
docker-compose logs -f influxdb
docker-compose logs -f redis

# Check Prometheus targets
open http://localhost:9090/targets

Contributing

  1. Follow PEP 8 style guidelines
  2. Add unit tests for new features
  3. Update documentation
  4. Performance test critical paths
  5. Monitor resource usage
  6. Use uv for dependency management

About

High-performance real-time market data processing system designed for low-latency trading applications. This system ingests, processes, and distributes market data with sub-millisecond latency requirements.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published