Skip to content

FlorianEbner96/AlphaEngine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AlphaEngine

ML-Powered Algorithmic Trading System

Status Python PyTorch Azure LangChain


Overview

AlphaEngine is an end-to-end algorithmic trading system that combines quantitative research, deep learning, and cloud infrastructure to generate, validate, and explain trading signals.

The project spans the full ML engineering lifecycle, from raw market data ingestion to a deployed, queryable intelligence layer, built deliberately from the ground up without relying on high-level abstractions.

Why this project? Built as a hands-on deep dive into three domains simultaneously: quantitative finance methodology, production ML engineering with PyTorch, and cloud-native deployment on Microsoft Azure.


Architecture

Alpaca Markets API
        ↓
Azure Blob Storage (raw & processed data)
        ↓
Feature Engineering Pipeline (log-returns, rolling volatility, normalization)
        ↓
PyTorch LSTM Model (sequence modeling on time-series data)
        ↓
Azure ML (experiment tracking & model versioning)
        ↓
Backtesting Engine (custom-built, bias-free)
        ↓
LangChain RAG Agent (natural language strategy Q&A)
        ↓
Azure Container App (REST API + Dashboard)

Tech Stack

Layer Technology Purpose
Data Alpaca Markets API Historical & live market data + paper trading
Storage Azure Blob Storage Raw and processed data persistence
Modeling PyTorch LSTM architecture for time-series forecasting
Experiment Tracking Azure ML Model versioning, training runs, metrics
Intelligence Layer LangChain + RAG Natural language querying of strategy & documents
Vector Search Azure AI Search Embedding store for financial documents
Deployment Azure Container Apps Scalable API hosting
Backend FastAPI REST API layer

Project Structure

alphaengine/
│
├── data/
│   ├── raw/                  # Raw market data from Alpaca API
│   └── processed/            # Cleaned, normalized, feature-engineered data
│
├── ingestion/
│   ├── api_loader.py         # Abstracted market data loader (Alpaca / extensible)
│   ├── preprocessor.py       # Feature engineering: log-returns, rolling features
│   └── blob_upload.py        # Azure Blob Storage read/write
│
├── models/
│   ├── dataset.py            # PyTorch Dataset & DataLoader
│   ├── lstm.py               # LSTM model architecture
│   └── train.py              # Training loop with experiment tracking
│
├── strategy/
│   ├── signals.py            # Signal generation from model outputs
│   └── backtester.py         # Custom backtesting engine
│
├── agent/
│   ├── tools.py              # Custom LangChain tools (backtesting, metrics)
│   ├── chain.py              # RAG chain + agent orchestration
│   ├── retriever.py          # Vector search over financial documents
│   └── ingestion/
│       ├── doc_loader.py     # PDF / filing ingestion
│       └── embeddings.py     # Embedding generation & storage
│
├── api/
│   └── app.py                # FastAPI application
│
├── evaluation/
│   └── metrics.py            # Sharpe ratio, max drawdown, hit rate
│
└── memo/
    └── business_case.md      # McKinsey-style business case write-up

Roadmap

✅ Phase 1 – Data Foundation (in progress)

  • Project structure & environment setup
  • Azure account & resource group
  • Alpaca Markets API integration
  • Feature engineering pipeline (log-returns, rolling volatility)
  • Azure Blob Storage persistence
  • End-to-end data pipeline

🔲 Phase 2 – Model Development

  • PyTorch Dataset & DataLoader for time-series
  • LSTM architecture implementation
  • Training loop with Azure ML experiment tracking
  • Overfitting analysis & regularization

🔲 Phase 3 – Strategy & Backtesting

  • Signal generation from model outputs
  • Custom backtesting engine (no look-ahead bias)
  • Performance metrics: Sharpe ratio, max drawdown, hit rate
  • LangChain RAG agent with financial document retrieval
  • Paper trading integration via Alpaca

🔲 Phase 4 – Deployment

  • FastAPI REST endpoint
  • Azure Container App deployment

Quantitative Foundations

The system is grounded in core quant concepts applied deliberately throughout:

  • Log-returns over simple returns for time-series stationarity and additivity
  • Rolling volatility as a key input feature capturing regime changes
  • Sharpe Ratio as the primary strategy evaluation metric
  • Look-ahead bias prevention as a first-class concern in the backtesting engine
  • Train/validation/test splits respecting temporal ordering — no random shuffling

Key Design Decisions

Abstracted data layerapi_loader.py exposes a standardized DataFrame interface regardless of the underlying data source. Switching from Alpaca to any other provider requires changes in one file only.

Custom backtester — built from scratch rather than using off-the-shelf libraries, to ensure full understanding of bias sources and edge cases.

RAG over structured + unstructured data — the LangChain agent combines quantitative backtesting results with unstructured financial documents (SEC filings, research papers) for context-aware strategy explanations.


Getting Started

# Clone the repository
git clone https://github.com/yourusername/alphaengine.git
cd alphaengine

# Create and activate conda environment
conda create -n alphaengine python=3.11
conda activate alphaengine

# Install dependencies
pip install -r requirements.txt

# Configure environment variables
cp .env.example .env
# Add your Alpaca API keys and Azure credentials to .env

Status & Transparency

This project is actively under development. It is being built step by step with a focus on actually understanding specific technical concepts and worflows over speed. Code is written from scratch where possible, and every architectural decision is made consciously.

Current phase: Phase 1 — Data Foundation


Background

Built by a Bioinformatics MSc graduate transitioning into ML Engineering / Quant Research. The project deliberately combines domains like quantitative finance, deep learning, and cloud infrastructure in order to reflect the profile of modern data science and ml engineering roles.

About

End-to-end ML system for algorithmic trading. LSTM-based signal generation, custom backtesting engine, RAG-powered strategy analysis. Built on PyTorch, Azure & LangChain. Work in progress.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors