A production-grade multi-agent financial analysis system built with LangGraph and Groq (LLaMA 3.3 70B). Input a stock ticker, get back a structured investment report with fundamentals, sentiment, risk metrics, a buy/hold/sell recommendation, and a comparison against Wall Street analyst consensus — all generated by a pipeline of specialized AI agents running on live market data.
🟢 Live Demo → financeagent-langgraph-production.up.railway.app
Request:
POST /analyze
{
"ticker": "AAPL",
"timeframe": "3mo",
"asset_class": "equity"
}Response:
{
"ticker": "AAPL",
"report": {
"summary": "AAPL shows strong fundamentals with a P/E of 31.35 and bullish sentiment score of 0.6, driven by positive news around product launches. Moderate volatility at 24.01% with stable EPS suggests cautious optimism.",
"recommendations": "Buy",
"key_metrics": [
"P/E: 31.351456",
"EPS: 7.91",
"Revenue Growth: 0.157%",
"Debt to Equity: 102.63",
"Beta: 1.116",
"Volatility: 24.01%",
"Sentiment: 0.6 (Bullish)"
],
"confidence": "Medium",
"analyst_agreement": "Agreed — both recommend Buy"
}
}{
"ticker": "JNJ",
"report": {
"recommendations": "Buy",
"key_metrics": ["P/E: 21.32", "EPS: 11.04", "Beta: 0.326", "Volatility: 16.05%", "Sentiment: 0.6 (Bullish)"],
"confidence": "High",
"analyst_agreement": "Agreed — both recommend Buy"
}
}{
"ticker": "TSLA",
"report": {
"recommendations": "Hold",
"key_metrics": ["P/E: 343.88", "EPS: 1.07", "Beta: 1.926", "Volatility: 33.74%", "Sentiment: 0.8 (Bullish)"],
"confidence": "Medium",
"analyst_agreement": "Disagreed — pipeline says Hold, analysts say Buy"
}
}| Metric | JNJ (Defensive) | TSLA (High-Risk) |
|---|---|---|
| P/E Ratio | 21.32 | 343.88 |
| EPS | 11.04 | 1.07 |
| Beta | 0.326 | 1.926 |
| Volatility | 16.05% | 33.74% |
| Revenue Growth | +0.091% | -0.031% |
| Sentiment | Bullish (0.6) | Bullish (0.8) |
| Recommendation | Buy | Hold |
| Analyst Agreement | Agreed | Disagreed |
Same pipeline, same agents, same code — completely different risk profiles correctly identified. TSLA disagreement reveals a known limitation: current-metrics-only analysis underweights future growth catalysts that Wall Street analysts model explicitly.
Ran the pipeline across 20 large-cap tickers (S&P 500 constituents) and measured agreement rate against Wall Street analyst consensus using measure_agreement.py. Each ticker was run with majority-vote logic to account for LLM non-determinism, with a 3s sleep between runs to handle Groq rate limiting.
python measure_agreement.pyUser Input (ticker, timeframe, asset_class)
│
▼
┌─────────────────┐
│ DataFetchAgent │ ── yfinance → price history, financials, news, analyst consensus
└────────┬────────┘ normalizes recommendationKey → Buy/Hold/Sell
│
▼
┌─────────────────────┐
│ FundamentalsAgent │ ── LLM → P/E, EPS, Revenue Growth, Debt/Equity
├─────────────────────┤ context pruning: 4 keys from 50+ yfinance fields
│ SentimentAgent │ ── LLM → scores headlines → Bullish/Bearish/Neutral (-1.0 to 1.0)
├─────────────────────┤
│ RiskAgent │ ── pandas → annualized volatility, beta
└────────┬────────────┘ LLM → risk flag interpretation
│
▼
┌─────────────────┐
│ ReportAgent │ ── LLM → synthesizes all outputs
└─────────────────┘ compares vs Wall Street consensus → analyst_agreement
│
▼
Structured JSON Report
State management: All agents share a typed FinanceState (LangGraph TypedDict). Each agent reads what it needs and writes back exactly one output.
Structured outputs: Every agent uses Pydantic BaseModel + with_structured_output() — type-safe, validated handoffs. Nullable fields use Optional[float] with default=None to handle missing yfinance data.
Error handling: Custom exception hierarchy (FinanceAgentError → TickerNotFoundError, EmptyDataError, DataFetchRateLimitError, LLMStructuredOutputError). FastAPI returns 422 for invalid inputs, 400 for pipeline errors, 500 for unexpected failures.
Benchmarked across 15 sequential runs (5 × 3 sessions) on Groq LLaMA 3.3 70B:
| Metric | Sequential | Parallel |
|---|---|---|
| Average latency | 2.38s | 8.62s |
| Min | 1.77s | 2.52s |
| Max | 3.35s | 12.29s |
Finding: Groq rate limiting under concurrent load makes parallel 3.6x slower. Sequential architecture retained. Documented in benchmark.py.
| Field | Valid Values | HTTP Error |
|---|---|---|
ticker |
Non-empty string | 422 |
timeframe |
1mo, 3mo, 6mo, 1y, 2y |
422 |
asset_class |
equity, crypto, macro |
422 |
Invalid/delisted tickers return a clean 400 with a descriptive message.
| Layer | Technology |
|---|---|
| Agent orchestration | LangGraph (StateGraph) |
| LLM | LLaMA 3.3 70B via Groq |
| Market data | yfinance |
| Structured outputs | Pydantic v2 |
| API framework | FastAPI |
| Observability | LangSmith + Python logging |
| LLM interface | LangChain |
| Deployment | Docker + Railway |
| Package manager | uv |
src/
├── states/
│ └── financestate.py # FinanceState TypedDict + 4 Pydantic schemas
├── nodes/
│ ├── data_fetch.py # DataFetchAgent — yfinance + analyst consensus, no LLM
│ ├── fundamentals_agent.py # FundamentalsAgent
│ ├── sentiment_agent.py # SentimentAgent
│ ├── risk_agent.py # RiskAgent
│ └── report_agent.py # ReportAgent + analyst comparison
├── graphs/
│ └── graph_builder.py # GraphBuilder — sequential + parallel modes
├── llms/
│ └── groqllm.py
└── exceptions.py # Custom exception hierarchy
app.py # FastAPI — POST /analyze + serves frontend
frontend/
└── index.html # Bloomberg terminal-style recruiter demo
measure_agreement.py # Analyst consensus agreement across 20 tickers
benchmark.py # Sequential vs parallel latency benchmarking
Dockerfile # Production containerization
Run locally:
git clone https://github.com/aakarsh31/FinanceAgent-LangGraph.git
cd FinanceAgent-LangGraph
uv sync
cp .env.example .env # add GROQ_API_KEY and LANGCHAIN_API_KEY
uvicorn app:app --reloadRun with Docker:
docker build -t financeagent .
docker run -p 8000:8000 --env-file .env financeagentTest:
curl -X POST http://localhost:8000/analyze \
-H "Content-Type: application/json" \
-d '{"ticker": "AAPL", "timeframe": "3mo", "asset_class": "equity"}'Sequential over parallel — benchmarking showed parallel is 3.6x slower due to Groq rate limiting under concurrent load.
Volatility computed in code — deterministic pandas formula beats LLM estimation for numerical accuracy.
Context pruning — FundamentalsAgent extracts 4 relevant keys from 50+ yfinance fields, saving ~800 tokens per request.
Optional Pydantic fields — prevents schema validation crashes when yfinance returns None for obscure tickers.
Custom exception hierarchy — maps known failure modes to meaningful HTTP responses; unknown failures return 500 with internal logging only.
Normalized analyst consensus — yfinance recommendationKey values (strongBuy, underperform) normalized to Buy/Hold/Sell for clean LLM comparison.
LangSmith tracing enabled for per-agent token cost and latency. Structured Python logging with file/function/line output:
2026-03-22 21:37:26 INFO src.nodes.data_fetch:fetch:28 Fetching Ticker Data AAPL...
2026-03-22 21:37:26 INFO src.nodes.data_fetch:fetch:51 Analyst consensus for AAPL: {'recommendation': 'Buy', 'target_price': 237.45, 'num_analysts': 38}
2026-03-22 21:37:29 INFO app:analyze_stock:59 Successfully analyzed AAPL