A production-ready, microservices-based system for real-time cryptocurrency price prediction using streaming data processing and machine learning.
This system implements a complete end-to-end ML pipeline for predicting cryptocurrency prices in real-time. It leverages streaming architecture, technical indicators, and machine learning models to provide predictions based on live market data.
Key Features:
- β‘ Real-time data ingestion from cryptocurrency exchanges
- π Technical indicators calculation (RSI, MACD, Bollinger Bands, etc.)
- π€ ML-based price prediction with continuous model training
- π Streaming architecture using Quix Streams
- π³ Containerized microservices with Docker and Kubernetes
- π Production-ready deployment with monitoring and logging
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β TRADES SERVICE β
β WebSocket β Live Trade Data β Kafka Stream β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CANDLES SERVICE β
β Aggregates Trades β OHLCV Candlesticks β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β TECHNICAL INDICATORS SERVICE β
β RSI, MACD, BB, EMA β Feature Engineering β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PREDICTOR SERVICE β
β ML Models β Price Predictions β Results Stream β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PREDICTION API SERVICE β
β REST API β Serves Predictions to Clients β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- Trades Service β Connects to cryptocurrency exchange WebSocket, streams live trade data
- Candles Service β Aggregates trades into OHLCV (Open, High, Low, Close, Volume) candlesticks
- Technical Indicators β Calculates technical indicators from candles for feature engineering
- Predictor Service β Trains and runs ML models on enriched data, produces predictions
- Prediction API β Exposes REST endpoints for accessing real-time predictions
Real-Time Data Processing
- WebSocket integration with cryptocurrency exchanges
- Kafka-based streaming architecture with Quix Streams
- Low-latency data pipeline (<100ms end-to-end)
Technical Analysis
- 20+ Technical indicators (RSI, MACD, Bollinger Bands, EMA, SMA, etc.)
- Customizable indicator parameters
- Real-time calculation on streaming data
Machine Learning
- Multiple model support (Scikit-learn, XGBoost, LightGBM)
- Automated hyperparameter tuning with Optuna
- Continuous model retraining on new data
- Model versioning with MLflow
Production Infrastructure
- Kubernetes deployment with Helm charts
- Docker containerization for all services
- RisingWave for stream processing
- PostgreSQL for persistent storage
- Monitoring and logging with structured logs (Loguru)
| Component | Technology |
|---|---|
| Language | Python 3.12+ |
| Streaming | Quix Streams, Kafka |
| ML Framework | Scikit-learn, XGBoost, LightGBM |
| ML Ops | MLflow, Optuna |
| Data Validation | Great Expectations |
| API Framework | FastAPI |
| Database | PostgreSQL, RisingWave |
| Orchestration | Kubernetes, Docker |
| Logging | Loguru |
| Code Quality | Ruff, Pre-commit hooks |
realtime_training/
βββ services/ # Microservices
β βββ trades/ # Trade data ingestion
β βββ candles/ # Candlestick aggregation
β βββ technical_indicators/ # Technical analysis
β βββ predictor/ # ML prediction service
β βββ prediction-api/ # REST API
βββ deployment/ # Kubernetes manifests
β βββ dev/ # Development environment
β βββ prod/ # Production environment
βββ docker/ # Dockerfiles
βββ scripts/ # Build and deployment scripts
βββ state/ # State management
βββ pyproject.toml # Root dependencies
βββ Makefile # Build automation
βββ README.md # This file
- Python 3.12+
- Docker & Docker Compose
- Kubernetes cluster (kind, minikube, or cloud provider)
- UV package manager
# 1. Clone repository
git clone https://github.com/silsgah/realtime_training.git
cd realtime_training
# 2. Install dependencies
uv sync
# 3. Run a service locally (development)
make dev service=trades
# 4. Build Docker image for a service
make build-for-dev service=trades
# 5. Deploy to Kubernetes (development)
make deploy-for-dev service=tradesDevelopment Mode (Local)
# Run trades service
make dev service=trades
# Run candles service
make dev service=candles
# Run technical indicators service
make dev service=technical_indicators
# Run predictor service
make dev service=predictorProduction Deployment (Kubernetes)
# Build and push to registry
make build-and-push image=trades env=prod
# Deploy to cluster
make deploy service=trades env=prodOnce deployed, the Prediction API exposes:
# Get latest predictions
GET /predictions/{symbol}
# Health check
GET /health
# Metrics
GET /metricsCreate .env file in the root:
# Kafka/Streaming
KAFKA_BROKER=localhost:9092
QUIX_SDK_TOKEN=your_token_here
# Database
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_DB=crypto_predictions
POSTGRES_USER=admin
POSTGRES_PASSWORD=secret
# ML
MLFLOW_TRACKING_URI=http://localhost:5000
MODEL_RETRAIN_INTERVAL=3600 # seconds
# Exchange API
EXCHANGE_WS_URL=wss://stream.binance.com:9443/wsEach service has its own configuration in services/{service}/config.yaml:
# Example: predictor/config.yaml
model:
type: "xgboost"
hyperparameters:
max_depth: 10
learning_rate: 0.01
n_estimators: 100
features:
- rsi_14
- macd
- bb_upper
- ema_20
training:
retrain_interval: 3600
validation_split: 0.2# Run all tests
uv run pytest
# Run tests for specific service
uv run pytest services/trades/tests
# Run with coverage
uv run pytest --cov=services --cov-report=htmlAll services use structured logging with Loguru:
from loguru import logger
logger.info("Processing trade", symbol="BTC/USDT", price=50000.0)- MLflow tracks model performance metrics
- Kubernetes metrics via Prometheus
- Custom application metrics exposed at
/metrics
docker build -t trades:latest -f docker/trades.Dockerfile .docker-compose up -d# Create kind cluster
kind create cluster --name rwml-34fa
# Deploy service
make deploy-for-dev service=trades
# Check status
kubectl get pods
kubectl logs -f deployment/trades# Deploy to production cluster
kubectl apply -f deployment/prod/
# Monitor
kubectl get pods -n production
kubectl describe pod <pod-name> -n production- Latency: <100ms end-to-end (trade ingestion to prediction)
- Throughput: 10,000+ trades/second processed
- Prediction Accuracy: 65-70% directional accuracy (varies by market conditions)
- Uptime: 99.9% availability with Kubernetes
# Lint code
uv run ruff check .
# Format code
uv run ruff format .
# Run pre-commit hooks
pre-commit run --all-files- Author: Silas Kwabla Gah
- GitHub: @silsgah
- LinkedIn: Silas Gah
- Email: gahsilas@gmail.com
Technical Skills:
- Microservices architecture
- Stream processing (Kafka, Quix Streams)
- Real-time ML inference
- Kubernetes orchestration
- Docker containerization
- Technical analysis and feature engineering
- ML model training and deployment
- REST API development
ML Engineering:
- Real-time data pipelines
- Feature engineering from time-series data
- Model versioning and experiment tracking
- Continuous model training
- Production ML deployment
β If you found this project helpful, please star the repository!
