Alpha Engine Research

Autonomous investment research pipeline for US equities. 6 sector teams with ReAct tool-calling agents screen the S&P 900 weekly, maintain rolling investment theses, and submit recommendations to a CIO for population selection. Delivers a consolidated morning research brief via email.

Part of Nous Ergon: Alpha Engine.

Role in the System

Research is the signal generator. It produces signals/{date}/signals.json — the file that the Predictor and Executor both depend on. Runs weekly on Monday (06:00 UTC via EventBridge). Each sector team builds on prior theses rather than starting from scratch, creating compounding institutional knowledge over time.

signals.json contains:

universe — list of all stocks with signal (ENTER/HOLD/EXIT), score, rating, conviction, sector_rating, thesis
buy_candidates — subset with ENTER signals and enriched CIO theses
market_regime, sector_ratings — macro context for executor position sizing
population — current 25-stock portfolio ticker list

Research and Predictor Training are independent — no data flows between them. They run sequentially on Monday only to spread API load.

Architecture

PARALLEL (LangGraph Send() fan-out)
├── Technology Team   (Quant→Qual→PeerReview→2-3 picks + thesis maintenance)
├── Healthcare Team   (same pattern)
├── Financials Team
├── Industrials Team
├── Consumer Team
├── Defensives Team
├── Macro Economist   (regime + sector ratings, reflection loop)
└── Exit Evaluator    (determines exits from current population)
     │
SEQUENTIAL (fan-in)
├── Merge Results     (combine team picks + macro + exits → compute open slots)
├── Score Aggregator  (quant × w_quant + qual × w_qual + macro shift + boosts)
├── CIO               (single Sonnet batch: evaluate all picks on 4 dimensions)
├── Population Entry   (place CIO ADVANCE decisions into population)
├── Consolidator      (morning email: macro, sector allocation, notable developments, universe ratings)
├── Archive Writer    (S3 + SQLite + thesis history + IC audit trail + signals.json)
└── Email Sender      (markdown → HTML via formatter, Gmail SMTP primary, SES fallback)

6 Sector Teams

Team	GICS Sectors	~Stocks
Technology	Information Technology	~120
Healthcare	Healthcare	~110
Financials	Financials	~140
Industrials	Industrials, Materials	~215
Consumer	Consumer Disc, Consumer Staples, Comm Services	~190
Defensives	Energy, Utilities, Real Estate	~145

Intra-Team Flow

Each team runs autonomously using ReAct tool-calling agents:

Quant Analyst (Haiku, ReAct with tools, max 8 iterations): Screens the sector universe using tools it chooses — volume filters, technical indicators, analyst consensus, balance sheet, options flow. Produces ranked top 10.
Qualitative Analyst (Haiku, ReAct with tools, max 8 iterations): Reviews quant's top 5 using news, analyst reports, insider activity, SEC filings, prior theses. Produces holistic qual_score (0-100) per stock. May add 0-1 additional candidate.
Peer Review (Haiku): Quant reviews qual's addition (if any). Joint finalization selects 2-3 recommendations.
Thesis Maintenance: For held population stocks, update thesis only when material triggers fire (news spike, price move > 2 ATR, analyst revision, earnings proximity, insider cluster, sector regime change).

CIO Evaluation

The CIO (Sonnet) receives all team recommendations in a single batch call and evaluates on 4 dimensions:

Team conviction — quant+qual agreement, peer review flags
Macro alignment — regime fit, sector rating
Portfolio fit — diversification vs concentration
Catalyst specificity — time-bound vs vague

Selects top N for open slots. Every decision (ADVANCE, REJECT, NO_ADVANCE_DEADLOCK) saved with full rationale to thesis_history table.

Proprietary Source Files

This repo is open-source, but files containing prompts, scoring logic, and orchestration are gitignored. You must create these files yourself to run the pipeline.

Gitignored (Must Create)

File	Purpose
`config.py`	Centralized configuration loader
`config/universe.yaml`	Population size, thresholds, LLM models
`graph/research_graph.py`	LangGraph orchestrator (Send fan-out, sector teams)
`agents/macro_agent.py`	Macro economist with reflection loop
`agents/news_agent.py`	News sentiment agent
`agents/research_agent.py`	Research analysis agent
`agents/scanner_ranking_agent.py`	Scanner ranking agent
`agents/consolidator.py`	Email brief synthesizer
`agents/investment_committee/ic_cio.py`	CIO batch evaluator
`agents/sector_teams/quant_analyst.py`	Quant analyst ReAct agent
`agents/sector_teams/qual_analyst.py`	Qual analyst ReAct agent
`agents/sector_teams/peer_review.py`	Intra-team peer review
`agents/sector_teams/sector_team.py`	Team orchestrator
`scoring/technical.py`	Technical scoring engine
`scoring/performance_tracker.py`	Signal accuracy tracking
`data/scanner.py`	Quant filter pipeline
`data/population_selector.py`	Population management + exit/entry handlers
`thesis/updater.py`	Thesis record builder

Tracked (Infrastructure + Tools)

agents/sector_teams/team_config.py — GICS-to-team mapping, slot allocation
agents/sector_teams/quant_tools.py — LangChain @tool wrappers for quant agent
agents/sector_teams/qual_tools.py — LangChain @tool wrappers for qual agent
agents/sector_teams/material_triggers.py — Thesis update triggers (no LLM)
agents/prompt_loader.py — Loads prompts from config/prompts/
agents/token_guard.py — Token budget validation
scoring/composite.py — Composite scoring (quant × w_quant + qual × w_qual)
agents/json_utils.py — Shared JSON extraction (balanced-brace scanner)
agents/langchain_utils.py — Shared LangGraph message utilities
data/fetchers/ — All 8 data fetchers (price, news, analyst, macro, insider, institutional, options, revision)
data/deduplicator.py — Duplicate headline handling
archive/manager.py — S3 + SQLite CRUD + thesis history + IC audit trail + load_latest_theses() for prior week backfill
retry.py — Exponential-backoff retry decorator (used by archive manager + price fetcher)
emailer/ — Markdown to HTML email + SES/Gmail delivery
lambda/ — AWS Lambda handlers (main pipeline + intraday alerts)
infrastructure/ — Deploy scripts, IAM policies

How Scoring Works

composite = quant_score × w_quant + qual_score × w_qual + macro_shift + signal_boosts

Signal boosts: PEAD (±5), analyst revisions (±3), options flow (±4),
               insider activity (±5), short interest (±4), institutional (±3)
               → aggregate capped at ±10 pts

w_quant and w_qual auto-tuned by backtester via config/scoring_weights.json
Macro shift uses additive bounded formula (not a multiplier)
Ratings: BUY / HOLD / SELL based on configurable thresholds
Predictor veto: high-confidence DOWN from GBM can override BUY signals

Population Management

Target: ~25 stocks
Exits: thesis collapse (score < 40, immediate), score degradation (< 45 with tenure > 2 weeks), forced rotation floor (min 10% weekly turnover)
Entries: CIO selects from sector team recommendations to fill open slots
Tenure protection: 2-week minimum (except thesis collapse)
Max rotations: 10 per run (~40% of population)

Slot Allocation

Based on open slots + macro sector ratings:

Open Slots	Base	Overweight (+1)	Market Weight (0)	Underweight (-1)
0	0	0	0	0
1-3	1	2	1	0
4-7	2	3	2	1
8-10	3	4	3	2

Teams always produce 2-3 picks regardless of allocation. The CIO selects from the full pool.

Weekly Email Format

The morning research brief contains 4 sections:

Macro Regime Summary — full macro narrative (Fed, yield curve, credit, equity, commodities), key risks to monitor
Sector Allocation — 11-sector table with overweight/market weight/underweight ratings and rationale
Notable Developments — high-conviction recommendations, CIO advances, material exits
Universe Ratings — unified table of all stocks with status column:

Status	Meaning
NEW	Entered portfolio this week (CIO advanced)
BUY REC	Buy recommendation, not yet held (no open slot)
UPDATED	Continuing stock with material trigger, fresh thesis
HOLD	Continuing stock, carryover thesis from prior week
EXIT	Dropped from portfolio this week

Sorted by status priority, then score descending. Each stock has rating, score, and thesis rationale.

Data Sources

Data Type	Source	Cost
Price data (OHLCV)	yfinance	Free
Technical indicators	Computed from price data	Free
News headlines	Yahoo Finance RSS	Free
SEC 8-K filings	EDGAR full-text search API	Free
SEC Form 4 insider transactions	EDGAR (EdgarTools)	Free
13F institutional holdings	SEC EDGAR	Free
Analyst consensus	Financial Modeling Prep (FMP)	Free tier (250 req/day)
EPS revision trends	Financial Modeling Prep (FMP)	Free tier
Options chain (IV, put/call)	yfinance	Free
Macro data	FRED CSV API	Free
LLM	Anthropic Claude (Haiku + Sonnet)	~$0.89/week ($46/year)
Email	AWS SES + Gmail SMTP	Free tier
Storage	AWS S3	~$0.50/month

FMP rate-limited: 1 req/sec with global thread lock, 250/day budget tracked in-process. First 429 response immediately disables all FMP calls for remainder of run (graceful degradation — agents proceed without analyst data). yfinance uses sequential batches of 100 with rate-limited session via requests-ratelimiter.

Quick Start

Prerequisites

Python 3.12+
API keys: ANTHROPIC_API_KEY, FMP_API_KEY, FRED_API_KEY
AWS credentials with S3 read/write and SES send permission

Setup

git clone https://github.com/cipher813/alpha-engine-research.git
cd alpha-engine-research
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# 1. Environment variables
cp .env.example .env
# Edit .env — add API keys, AWS config, email settings

# 2. Configuration files
cp config/universe.sample.yaml config/universe.yaml
# Edit universe.yaml — set LLM models, thresholds, population config

# 3. Create gitignored source files (see table above)

# 4. Dry run (skips email + S3 write, still calls APIs/LLM)
python local/run.py --dry-run

# 5. Deploy to Lambda
./infrastructure/deploy.sh main

Offline Mode

Run the full pipeline end-to-end with zero external calls — no API keys, no LLM, no S3, no network required. All data fetchers, LLM agents, S3 operations, and email are replaced with synthetic stubs that return structurally valid data.

# Basic offline run (uses today's date)
python local/run.py --offline

# Offline with a specific date
python local/run.py --offline --date 2026-03-23

What gets stubbed:

Category	Stubbed Functions	Synthetic Output
Price data	`fetch_price_data`, `fetch_sp500_sp400_with_sectors`, `yf.download`	252-day synthetic OHLCV for 30 sample tickers
News	`fetch_all_news`	2 synthetic headlines per ticker
Analyst	`fetch_analyst_consensus`, `fetch_revisions`	Randomized consensus ratings + targets
Macro	`fetch_macro_data`, `compute_market_breadth`	Hardcoded neutral macro environment
Insider/Options	`fetch_insider_activity`, `fetch_options_signals`, `fetch_short_interest`, `fetch_institutional_accumulation`	Empty or minimal synthetic data
LLM agents	`run_news_agent`, `run_research_agent`, `run_macro_agent_with_reflection`, `run_scanner_ranking_agent`, `run_consolidator_agent`	Deterministic scores (40-75 range), stub reports
Sector teams	`run_sector_team`, `run_quant_analyst`, `run_qual_analyst`, `run_peer_review`	Synthetic picks with randomized quant/qual scores
CIO	`run_cio`	Advances candidates up to open slot count
S3 / Archive	`download_db`, `upload_db`, `load_predictions_json`, `write_signals_json`, `upload_population_json`	Uses local SQLite (creates empty if none exists), skips all S3 I/O
Email	`send_email`	Prints subject line to stdout

Use cases:

Validate graph topology and node wiring after refactors
Test scoring/aggregation/population logic changes without burning API credits
CI pipeline smoke tests
Develop and debug new graph nodes or scoring features
Onboard new contributors without requiring API keys or AWS credentials

How it works: local/offline_stubs.py monkey-patches all external call sites at import time. Stubs are installed before graph modules load, so from X import Y bindings in the graph pick up the patched functions. The pipeline runs in ~1 second.

Combining with other flags:

# Offline run
python local/run.py --offline

Note: --offline implies --dry-run behavior (no email delivery, no S3 writes). You don't need to pass both.

Database Tables

Core

investment_thesis, agent_reports, population, population_history, scanner_appearances, technical_scores, macro_snapshots, score_performance, news_article_hashes, predictor_outcomes, stock_archive, thesis_history, analyst_resources, memory_episodes, memory_semantic

Testing

source .venv/bin/activate
pytest tests/ -v

Tests cover scoring engine, agent JSON extraction, rotation logic, archive CRUD, and scheduling. No LLM API calls or AWS access required.

Local Pipeline Runs

# Offline mode: synthetic data, no API/LLM/S3 calls
python local/run.py --offline

# No-S3 mode: real APIs, writes signals to local/output/ instead of S3, no email
# Safe for preprod testing — does not affect live executor
python local/run.py --no-s3

# Full local run: real APIs, writes to S3, sends email (same as Lambda)
python local/run.py --local

# Specify date
python local/run.py --offline --date 2026-03-24

Preprod workflow:

--offline — verify pipeline logic runs without crashes
--no-s3 — verify with real APIs, inspect local/output/signals-{date}.json
Validate output: python ~/Development/alpha-engine/tests/validate_signals.py local/output/signals-{date}.json
--local — push to prod S3 and send email

Opportunities

Validation (needs live data)

Quant-only ablation: Run system without LLM agents to prove multi-agent value
Sector balancing hypothesis: Compare Sharpe with vs without sector constraints
Debate conviction vs performance: Track whether CIO conviction correlates with outperformance (needs 200+ samples, ~20 weeks)
Agent tool refinement: Analyze analyst_resources table to understand which tools each sector uses most

Architecture Enhancements

Adaptive slot allocation: Weight team recommendations by historical accuracy
Backtester team performance: Per-team signal accuracy at 10d/30d, fed back to CIO
IC critic: Add a Haiku critic to challenge CIO selections before finalization
Thesis maintenance optimization: Only update theses for stocks with material news ✅ Implemented — material triggers check all held stocks weekly, carryover thesis for unchanged stocks

Performance

Feature store integration: Research computes its own technical indicators (RSI, MACD, MAs, momentum) in fetch_data, duplicating logic the predictor already has in its feature store (s3://alpha-engine-research/features/{date}/). Two options: (A) research reads the predictor's Friday feature store snapshot over the weekend — avoids duplicate computation but couples the modules and uses slightly stale data; (B) research writes its own weekend indicators into the feature store — keeps research independent but makes it a feature store contributor. Either way, the current duplication should be resolved so the feature store is the single source of truth for technical signals.
Price history window: fetch_data pulls only 3 months of price history (period="3mo"). This is sufficient for RSI(14), MACD, and MA50 but insufficient for MA200 or 52-week high/low — indicators the predictor feature store already computes. Resolving the feature store integration above would address this gap.
S3 price caching: Cache weekly price data to avoid re-downloading 900 tickers from yfinance ✅ Predictor maintains S3 price cache; research uses sequential rate-limited batches
Batch S3 I/O: Consolidated JSON alongside per-ticker files
LangGraph state optimization: Return only changed keys from memory-intensive nodes

AI Agent Evals

LangSmith tracing (Phase 1, deployed): Full execution traces for every Monday run — per-agent token usage, latency, tool calls. Set LANGCHAIN_TRACING_V2=true in .env.
Trajectory validation (Phase 2, deployed): Post-pipeline check that the graph executed all 11 required nodes in the correct order with 6 sector teams. Failures logged to CloudWatch.
Trajectory failure alerting (Phase 2.5, future): Surface trajectory validation failures via Telegram message or morning briefing email rather than relying on CloudWatch log inspection. Currently failures are silent unless logs are manually checked.
Output quality evaluation (Phase 3, future): LLM-as-judge faithfulness checks on agent theses using DeepEval — verify claims trace back to actual fetcher data. Custom evaluators for thesis consistency, score calibration, macro regime accuracy. ~1-2 weeks lift.
Feedback loop closure (Phase 4, future): Retroactively label research outputs with backtester outcome data (5d/10d returns). Correlate eval quality scores with actual alpha contribution. Build growing eval dataset for regression testing. ~2-4 weeks lift.

Agent Memory

Macro memory fix + structured prior context (Phase 1, deployed): Macro agent now receives prior report and last 3 macro snapshots for incremental updates. CIO receives prior week's IC decisions for portfolio continuity.
Episodic memory from outcomes (Phase 2, deployed): Failed BUY signals are extracted into lessons via Haiku and stored in memory_episodes table. Qual analysts retrieve relevant lessons when analyzing stocks with similar patterns. Cost-capped at 15 Haiku calls/run.
Cross-agent semantic memory (Phase 3, deployed): Sector teams share observations and the macro agent accumulates regime reasoning via memory_semantic table. Auto-pruned after 8 weeks. Cost-capped at 10 Haiku calls/run.
Procedural memory (Phase 4, future): Consolidated sector-specific strategies learned from 3+ months of episodic data (e.g., "biotech theses degrade after 3 days without material events"). Requires episodic memory running 8-12 weeks first. ~3-4 weeks lift.

Operational Gaps

Token usage tracking: Agent token consumption (Haiku/Sonnet) not logged for cost monitoring. Add LangChain callback handler to track prompt_tokens + completion_tokens per agent, aggregate in consolidator, include in health status and email footer.
Wikipedia constituent cache: Fallback CSV (data/constituents_cache.csv) could become stale. Refresh periodically or add a staleness warning when CSV age exceeds 90 days.

Data Gaps

Survivorship bias: Wikipedia S&P constituents are current-only — need append-only log
Sub-sector analysis: Semiconductors and SaaS treated identically as "Technology"
Archive TTL: Old theses remain indefinitely — add configurable expiration

Related Modules

alpha-engine — Executor (trade execution + system overview)
alpha-engine-predictor — GBM predictor (5-day alpha predictions)
alpha-engine-backtester — Signal quality analysis and parameter optimization
alpha-engine-dashboard — Streamlit monitoring dashboard

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
agents		agents
archive		archive
config		config
data		data
emailer		emailer
evals		evals
graph		graph
infrastructure		infrastructure
lambda		lambda
local		local
memory		memory
rag		rag
scoring		scoring
tests		tests
thesis		thesis
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.alerts		Dockerfile.alerts
LICENSE		LICENSE
README.md		README.md
flow-doctor.yaml		flow-doctor.yaml
health_status.py		health_status.py
main.py		main.py
polygon_client.py		polygon_client.py
requirements-alerts.txt		requirements-alerts.txt
requirements.txt		requirements.txt
retry.py		retry.py

Folders and files

Latest commit

History

Repository files navigation

Alpha Engine Research

Role in the System

Architecture

6 Sector Teams

Intra-Team Flow

CIO Evaluation

Proprietary Source Files

Gitignored (Must Create)

Tracked (Infrastructure + Tools)

How Scoring Works

Population Management

Slot Allocation

Weekly Email Format

Data Sources

Quick Start

Prerequisites

Setup

Offline Mode

Database Tables

Core

Testing

Local Pipeline Runs

Opportunities

Validation (needs live data)

Architecture Enhancements

Performance

AI Agent Evals

Agent Memory

Operational Gaps

Data Gaps

Related Modules

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages