Skip to content

ThunderShadows/Crypto-Historical-Data-Retrieval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Crypto Historical Data Retrieval

A Python project that retrieves historical OHLC trading data for top cryptocurrency pairs, computes analytical trading metrics, exports results to Excel, and trains a machine learning model to predict future price movements.

Features

  • Fetches daily OHLC data from the Binance public API (no API key required)
  • Computes configurable look-back and look-forward price metrics
  • Exports data to a multi-sheet Excel workbook
  • Trains a multi-output regression model to predict future price differences

Project Structure

.
├── crypto_data.py              # Core: fetch_crypto_data(), calculate_metrics()
├── ml_model.py                 # ML: train_model(), predict_outcomes()
├── main.py                     # Orchestration script
├── crypto_historical_data.xlsx # Generated Excel output
└── Project_Explanation.txt     # Approach, decisions, challenges

Setup

pip install pandas requests openpyxl scikit-learn numpy

Usage

Run the full pipeline

python main.py

This fetches data for 5 pairs, computes metrics, saves the Excel workbook, trains the ML model, and runs a sample prediction.

Use modules individually

from crypto_data import fetch_crypto_data, calculate_metrics

# Fetch daily OHLC data
df = fetch_crypto_data("BTC/USDT", "2023-01-01")

# Compute metrics (7-day lookback, 5-day lookahead)
df = calculate_metrics(df, variable1=7, variable2=5)
from ml_model import train_model, predict_outcomes

bundle = train_model(df, variable1=7, variable2=5)

prediction = predict_outcomes(
    bundle,
    days_since_high=1,
    pct_diff_from_high=-0.95,
    days_since_low=3,
    pct_diff_from_low=2.50,
)
# {'%_Diff_From_High_Next_5_Days': ..., '%_Diff_From_Low_Next_5_Days': ...}

API Details

Property Value
Provider Binance Public REST API
Endpoint https://api.binance.com/api/v3/klines
Auth None (public endpoint)
Pairs supported 1000+ (USDT, BTC, ETH quoted)
Timeframes 1m, 5m, 15m, 1h, 4h, 1d, 1w, 1M
Data from Each asset's listing date (BTC: 2017)
Rate limit 1 200 requests/min

Calculated Metrics

Given variable1 (look-back days) and variable2 (look-forward days):

Column Description
High_Last_{v1}_Days Rolling max of High over past v1 days
Days_Since_High_Last_{v1}_Days Days since that high occurred
%_Diff_From_High_Last_{v1}_Days (Close − High_Last) / High_Last × 100
Low_Last_{v1}_Days Rolling min of Low over past v1 days
Days_Since_Low_Last_{v1}_Days Days since that low occurred
%_Diff_From_Low_Last_{v1}_Days (Close − Low_Last) / Low_Last × 100
High_Next_{v2}_Days Max of High over next v2 days
%_Diff_From_High_Next_{v2}_Days (High_Next − Close) / Close × 100
Low_Next_{v2}_Days Min of Low over next v2 days
%_Diff_From_Low_Next_{v2}_Days (Low_Next − Close) / Close × 100

ML Model

Type: Multi-output regression (predicts two targets simultaneously)

Features:

  • Days_Since_High_Last_{v1}_Days
  • %_Diff_From_High_Last_{v1}_Days
  • Days_Since_Low_Last_{v1}_Days
  • %_Diff_From_Low_Last_{v1}_Days

Targets:

  • %_Diff_From_High_Next_{v2}_Days
  • %_Diff_From_Low_Next_{v2}_Days

Two models are trained (RandomForest and GradientBoosting); the one with lower test RMSE is selected. Features are scaled with StandardScaler. Data is split chronologically (80/20) to respect time-series ordering.

Excel Output

The workbook crypto_historical_data.xlsx contains:

  • One sheet per crypto pair (BTC_USDT, ETH_USDT, etc.)
  • A Combined sheet with all pairs merged (used for ML training)

About

Retrieving historical trading data for frequently traded cryptocurrency pairs and calculate several analytical metrics. The main focus is on working with APIs, handling and processing data, and calculating trading metrics over specific timeframes.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages