AlphaEngine

ML-Powered Algorithmic Trading System

Overview

AlphaEngine is an end-to-end algorithmic trading system that combines quantitative research, deep learning, and cloud infrastructure to generate, validate, and explain trading signals.

The project spans the full ML engineering lifecycle, from raw market data ingestion to a deployed, queryable intelligence layer, built deliberately from the ground up without relying on high-level abstractions.

Why this project? Built as a hands-on deep dive into three domains simultaneously: quantitative finance methodology, production ML engineering with PyTorch, and cloud-native deployment on Microsoft Azure.

Architecture

Alpaca Markets API
        ↓
Azure Blob Storage (raw & processed data)
        ↓
Feature Engineering Pipeline (log-returns, rolling volatility, normalization)
        ↓
PyTorch LSTM Model (sequence modeling on time-series data)
        ↓
Azure ML (experiment tracking & model versioning)
        ↓
Backtesting Engine (custom-built, bias-free)
        ↓
LangChain RAG Agent (natural language strategy Q&A)
        ↓
Azure Container App (REST API + Dashboard)

Tech Stack

Layer	Technology	Purpose
Data	Alpaca Markets API	Historical & live market data + paper trading
Storage	Azure Blob Storage	Raw and processed data persistence
Modeling	PyTorch	LSTM architecture for time-series forecasting
Experiment Tracking	Azure ML	Model versioning, training runs, metrics
Intelligence Layer	LangChain + RAG	Natural language querying of strategy & documents
Vector Search	Azure AI Search	Embedding store for financial documents
Deployment	Azure Container Apps	Scalable API hosting
Backend	FastAPI	REST API layer

Project Structure

alphaengine/
│
├── data/
│   ├── raw/                  # Raw market data from Alpaca API
│   └── processed/            # Cleaned, normalized, feature-engineered data
│
├── ingestion/
│   ├── api_loader.py         # Abstracted market data loader (Alpaca / extensible)
│   ├── preprocessor.py       # Feature engineering: log-returns, rolling features
│   └── blob_upload.py        # Azure Blob Storage read/write
│
├── models/
│   ├── dataset.py            # PyTorch Dataset & DataLoader
│   ├── lstm.py               # LSTM model architecture
│   └── train.py              # Training loop with experiment tracking
│
├── strategy/
│   ├── signals.py            # Signal generation from model outputs
│   └── backtester.py         # Custom backtesting engine
│
├── agent/
│   ├── tools.py              # Custom LangChain tools (backtesting, metrics)
│   ├── chain.py              # RAG chain + agent orchestration
│   ├── retriever.py          # Vector search over financial documents
│   └── ingestion/
│       ├── doc_loader.py     # PDF / filing ingestion
│       └── embeddings.py     # Embedding generation & storage
│
├── api/
│   └── app.py                # FastAPI application
│
├── evaluation/
│   └── metrics.py            # Sharpe ratio, max drawdown, hit rate
│
└── memo/
    └── business_case.md      # McKinsey-style business case write-up

Roadmap

✅ Phase 1 – Data Foundation (in progress)

Project structure & environment setup
Azure account & resource group
Alpaca Markets API integration
Feature engineering pipeline (log-returns, rolling volatility)
Azure Blob Storage persistence
End-to-end data pipeline

🔲 Phase 2 – Model Development

PyTorch Dataset & DataLoader for time-series
LSTM architecture implementation
Training loop with Azure ML experiment tracking
Overfitting analysis & regularization

🔲 Phase 3 – Strategy & Backtesting

Signal generation from model outputs
Custom backtesting engine (no look-ahead bias)
Performance metrics: Sharpe ratio, max drawdown, hit rate
LangChain RAG agent with financial document retrieval
Paper trading integration via Alpaca

🔲 Phase 4 – Deployment

FastAPI REST endpoint
Azure Container App deployment

Quantitative Foundations

The system is grounded in core quant concepts applied deliberately throughout:

Log-returns over simple returns for time-series stationarity and additivity
Rolling volatility as a key input feature capturing regime changes
Sharpe Ratio as the primary strategy evaluation metric
Look-ahead bias prevention as a first-class concern in the backtesting engine
Train/validation/test splits respecting temporal ordering — no random shuffling

Key Design Decisions

Abstracted data layer — api_loader.py exposes a standardized DataFrame interface regardless of the underlying data source. Switching from Alpaca to any other provider requires changes in one file only.

Custom backtester — built from scratch rather than using off-the-shelf libraries, to ensure full understanding of bias sources and edge cases.

RAG over structured + unstructured data — the LangChain agent combines quantitative backtesting results with unstructured financial documents (SEC filings, research papers) for context-aware strategy explanations.

Getting Started

# Clone the repository
git clone https://github.com/yourusername/alphaengine.git
cd alphaengine

# Create and activate conda environment
conda create -n alphaengine python=3.11
conda activate alphaengine

# Install dependencies
pip install -r requirements.txt

# Configure environment variables
cp .env.example .env
# Add your Alpaca API keys and Azure credentials to .env

Status & Transparency

This project is actively under development. It is being built step by step with a focus on actually understanding specific technical concepts and worflows over speed. Code is written from scratch where possible, and every architectural decision is made consciously.

Current phase: Phase 1 — Data Foundation

Background

Built by a Bioinformatics MSc graduate transitioning into ML Engineering / Quant Research. The project deliberately combines domains like quantitative finance, deep learning, and cloud infrastructure in order to reflect the profile of modern data science and ml engineering roles.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AlphaEngine

ML-Powered Algorithmic Trading System

Overview

Architecture

Tech Stack

Project Structure

Roadmap

✅ Phase 1 – Data Foundation (in progress)

🔲 Phase 2 – Model Development

🔲 Phase 3 – Strategy & Backtesting

🔲 Phase 4 – Deployment

Quantitative Foundations

Key Design Decisions

Getting Started

Status & Transparency

Background

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
agent		agent
api		api
evaluation		evaluation
ingestion		ingestion
models		models
strategy		strategy
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

AlphaEngine

ML-Powered Algorithmic Trading System

Overview

Architecture

Tech Stack

Project Structure

Roadmap

✅ Phase 1 – Data Foundation (in progress)

🔲 Phase 2 – Model Development

🔲 Phase 3 – Strategy & Backtesting

🔲 Phase 4 – Deployment

Quantitative Foundations

Key Design Decisions

Getting Started

Status & Transparency

Background

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages