A Python project that retrieves historical OHLC trading data for top cryptocurrency pairs, computes analytical trading metrics, exports results to Excel, and trains a machine learning model to predict future price movements.
- Fetches daily OHLC data from the Binance public API (no API key required)
- Computes configurable look-back and look-forward price metrics
- Exports data to a multi-sheet Excel workbook
- Trains a multi-output regression model to predict future price differences
.
├── crypto_data.py # Core: fetch_crypto_data(), calculate_metrics()
├── ml_model.py # ML: train_model(), predict_outcomes()
├── main.py # Orchestration script
├── crypto_historical_data.xlsx # Generated Excel output
└── Project_Explanation.txt # Approach, decisions, challenges
pip install pandas requests openpyxl scikit-learn numpypython main.pyThis fetches data for 5 pairs, computes metrics, saves the Excel workbook, trains the ML model, and runs a sample prediction.
from crypto_data import fetch_crypto_data, calculate_metrics
# Fetch daily OHLC data
df = fetch_crypto_data("BTC/USDT", "2023-01-01")
# Compute metrics (7-day lookback, 5-day lookahead)
df = calculate_metrics(df, variable1=7, variable2=5)from ml_model import train_model, predict_outcomes
bundle = train_model(df, variable1=7, variable2=5)
prediction = predict_outcomes(
bundle,
days_since_high=1,
pct_diff_from_high=-0.95,
days_since_low=3,
pct_diff_from_low=2.50,
)
# {'%_Diff_From_High_Next_5_Days': ..., '%_Diff_From_Low_Next_5_Days': ...}| Property | Value |
|---|---|
| Provider | Binance Public REST API |
| Endpoint | https://api.binance.com/api/v3/klines |
| Auth | None (public endpoint) |
| Pairs supported | 1000+ (USDT, BTC, ETH quoted) |
| Timeframes | 1m, 5m, 15m, 1h, 4h, 1d, 1w, 1M |
| Data from | Each asset's listing date (BTC: 2017) |
| Rate limit | 1 200 requests/min |
Given variable1 (look-back days) and variable2 (look-forward days):
| Column | Description |
|---|---|
High_Last_{v1}_Days |
Rolling max of High over past v1 days |
Days_Since_High_Last_{v1}_Days |
Days since that high occurred |
%_Diff_From_High_Last_{v1}_Days |
(Close − High_Last) / High_Last × 100 |
Low_Last_{v1}_Days |
Rolling min of Low over past v1 days |
Days_Since_Low_Last_{v1}_Days |
Days since that low occurred |
%_Diff_From_Low_Last_{v1}_Days |
(Close − Low_Last) / Low_Last × 100 |
High_Next_{v2}_Days |
Max of High over next v2 days |
%_Diff_From_High_Next_{v2}_Days |
(High_Next − Close) / Close × 100 |
Low_Next_{v2}_Days |
Min of Low over next v2 days |
%_Diff_From_Low_Next_{v2}_Days |
(Low_Next − Close) / Close × 100 |
Type: Multi-output regression (predicts two targets simultaneously)
Features:
Days_Since_High_Last_{v1}_Days%_Diff_From_High_Last_{v1}_DaysDays_Since_Low_Last_{v1}_Days%_Diff_From_Low_Last_{v1}_Days
Targets:
%_Diff_From_High_Next_{v2}_Days%_Diff_From_Low_Next_{v2}_Days
Two models are trained (RandomForest and GradientBoosting); the one with lower test RMSE is selected. Features are scaled with StandardScaler. Data is split chronologically (80/20) to respect time-series ordering.
The workbook crypto_historical_data.xlsx contains:
- One sheet per crypto pair (
BTC_USDT,ETH_USDT, etc.) - A
Combinedsheet with all pairs merged (used for ML training)