Skip to content

leaalonzo/index-rag-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Index RAG Agent

A RAG (Retrieval-Augmented Generation) agent for querying index methodology documents and historical constituent data. Built with LlamaIndex, LangGraph, ChromaDB, and DuckDB.

Features

  • Semantic search over ingested PDF methodology documents (MSCI, S&P, FTSE)
  • Natural language SQL queries against quarterly index constituent snapshots
  • Methodology comparison — side-by-side diff of two index documents
  • Changelog generation — what changed between two versions of a methodology
  • MCP server — expose all tools to Claude Desktop via the Model Context Protocol
  • RAGAS evaluation — score pipeline quality against a golden Q&A set

Setup

pip install -r requirements.txt

Create a .env file:

OPENAI_API_KEY=sk-...

# Optional: use Anthropic as the LLM backend
# LLM_PROVIDER=anthropic
# ANTHROPIC_API_KEY=sk-ant-...
# ANTHROPIC_MODEL=claude-sonnet-4-5

# Optional: override the OpenAI model
# OPENAI_MODEL=gpt-4o-mini

Usage

1. Ingest methodology PDFs

python main.py ingest ./docs

2. Load constituent data

# Generate synthetic data first (if needed)
python data/generate_constituents.py

# Load into DuckDB
python main.py load-data

3. Query

Direct RAG query (semantic search over PDFs):

python main.py query "What are the eligibility criteria for MSCI World?"

Agent query (auto-routes to the best tool):

python main.py agent "Which sectors are overweight in MSCI EM vs S&P 500?"
python main.py agent "What is Apple's weight in the S&P 500?"
python main.py agent "How does MSCI World differ from FTSE 100 methodology?"

4. Evaluate

python main.py eval --qa data/golden_qa.json

Results are stored in eval_results.db (SQLite).

MCP Server (Claude Desktop)

Expose the agent as an MCP server so Claude Desktop can call it as a tool:

python mcp_server.py

Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "index-rag-agent": {
      "command": "python",
      "args": ["/path/to/index-rag-agent/mcp_server.py"]
    }
  }
}

Architecture

main.py          CLI entry point
pipeline.py      Ingest (PDF → ChromaDB) and direct query
agent.py         LangGraph ReAct agent with 4 tools:
                   • search_methodology   — semantic search over PDFs
                   • compare_methodologies — side-by-side diff
                   • summarize_changes    — changelog between versions
                   • query_index_data     — NL→SQL over DuckDB
mcp_server.py    FastMCP wrapper for Claude Desktop
eval.py          RAGAS evaluation harness
chroma_db/       Persistent ChromaDB vector store
index_data.ddb   DuckDB database (quarterly constituent snapshots)
data/            CSV data and golden Q&A for evaluation
docs/            PDF methodology documents

Data

The DuckDB database (index_data.ddb) holds quarterly snapshots for four indices:

Index Snapshot dates
S&P 500 2023-12-31, 2024-03-31, 2024-06-30, 2024-09-30, 2024-12-31
MSCI World same
MSCI EM same
FTSE 100 same

Columns: date, index_name, constituent, ticker, sector, country, weight_pct, market_cap_usd.

About

RAG agent for querying index methodology documents and constituent data. Built with LlamaIndex, LangGraph, ChromaDB, and DuckDB. Supports semantic search, NL→SQL, methodology comparison, and MCP integration for Claude Desktop.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages