An event-driven architecture framework for building agentic AI systems for internal question answering. This framework demonstrates two distinct approaches to building AI agents: Tool-based Agents and LLM Workflows, both leveraging Domain-Driven Design principles from the "Cosmic Python" book.
This framework showcases event-driven architecture patterns for building AI systems with two main approaches:
The tool agent uses external tools and APIs to gather information and execute actions. This approach:
- Leverages the
smolagentslibrary for tool orchestration - Supports multiple tools for data retrieval, conversion, and analysis
- Uses a planning and execution model with tool selection
- Ideal for tasks requiring real-time data access, calculations, or external system integration
Example use cases:
- Fetching sensor data and creating visualizations
- Querying time-series databases
- Performing calculations on retrieved data
- Integrating with external APIs
The LLM workflow uses a state machine pattern to build SQL queries through multiple stages. This approach:
- Implements a structured pipeline with defined stages (Check → Ground → Filter → Aggregate → Join → Construct → Execute)
- Writes SQL queries based on a decomposition of the question into a series of steps based on the database schema
- Includes guardrails at entry and exit points
- Ideal for knowledge-based Q&A and controlled response generation
Example use cases:
- Answering questions from a knowledge base
- SQL query generation and execution
- Document-based question answering
- Controlled and validated response generation
Both approaches are built on a robust event-driven foundation:
- Message Bus: Central command and event handling system
- Domain Events: Clear separation of commands and events
- Dependency Injection: Clean architecture with swappable adapters
- Real-time Updates: WebSocket support for live status updates
- Observability: Integrated tracing with Langfuse and OpenTelemetry
Both agent approaches rely on external services and APIs to function properly.
To run the external services locally, run the following command:
docker compose up
from the Sim Project root directory.
Repositories for the external services are located in:
- the Sim Project repository for generating the data and the database schema
- the Sim API repository for the API that provides the tools to the agentic ai framework
- the Sim RAG repository for the RAG that provides the knowledge to the agentic ai framework
- the Sim Frontend repository for the Frontend that provides the UI for the agentic ai framework
-
LLM Providers (for both approaches)
- Supports multiple LLM providers via
litellm: Anthropic, OpenAI, Google Gemini - Used for reasoning, query generation, and response synthesis
- Configurable models for different tasks (main LLM, tools LLM, guardrails LLM)
- Supports multiple LLM providers via
-
PostgreSQL Database (for SQL Agent)
- Required for SQL query execution
- Stores business data that agents query
- Connection configured via environment variables
-
Information Retrieval Services (for LLM Workflow)
- Embedding API for document vectorization
- Ranking API for relevance scoring
- Retrieval API for semantic search
- Default endpoints:
http://localhost:5051
-
Tool API (for Tool Agent)
- External API providing data access tools
- Sensor data, time-series queries, asset information
- Default endpoint:
http://localhost:5000
-
Observability Services (optional)
- Langfuse for LLM tracing and monitoring
- OpenTelemetry for distributed tracing
- Configurable via environment variables
All external dependencies are configured through environment variables. See .env.tests for configuration templates. Key configurations include:
- API endpoints and credentials
- Database connection strings
- Model selection and parameters
- Service timeouts and limits
The framework includes Docker configurations to help set up some of these dependencies locally. Use make up to start containerized services.
To run the service manually in debug mode install the required python dependencies:
uv install
You can run the service in dev mode by default:
via fastapi app:
make dev
and access via http://127.0.0.1:5055/docs
via cli:
Run the tool agent for real-time data access and calculations:
make run Q="What is the daily maximum value of PI-P0017 in April 2025?" M="tool"
Run the LLM workflow for knowledge-based queries:
make run Q="How much was produced in the first two weeks of 2025?" M="sql"
Tool Agent queries (real-time data, calculations, visualizations):
- "What is the daily maximum value of PI-P0017 in April 2025?"
- "Can you compare PI-P0017 and PI-P0016 for the first 10 days in 2025?"
- "What assets are next to asset BA100?"
- "Can you create a plot for the adjacent sensors of asset BA101 for 1st January 2025?"
- "What is the id of TI-T0022?"
- "What is the name of asset id c831fadb-d620-4007-bdda-4593038c87f9?"
- "Can you provide me the highest value for June 2025 for TI-T0022?"
- "What is the current pressure in the distillation?"
- "What is the level in tank b?"
- "Can you plot me the temperature of the distillation cooler A for the last two weeks?"
- "What is the current temperature in the water tank?"
- "Can you plot me data for 18b04353-839d-40a1-84c1-9b547d09dd80 in February?"
The SQL agent is a specialized LLM workflow implementation for database queries. It uses a multi-stage pipeline to:
- "How many customers do we have?"
- "What are the top 5 selling products?"
- "Show me orders from 2024"
- "What is the average order value?"
make up
and to shut down the service:
make down
To run the tests:
uv run python -m pytest --verbose --cov=./
This framework includes a comprehensive evaluation system to assess the quality and performance of both agent approaches. The evaluation framework uses an LLM judge to score responses across multiple dimensions.
-
LLM Judge Evaluation
- Scores responses on 4 dimensions: accuracy, relevance, completeness, and hallucination
- Configurable thresholds for pass/fail criteria
- Provides detailed reasoning for each score
- Results are stored in JSON format in
evals/reports/
-
Evaluation Types
Tool Agent Evaluations:
make eval_tool_e2e- End-to-end evaluation of complete tool agent workflowmake eval_tool_enhance- Tests question enhancement capabilitiesmake eval_tool_pre_check- Evaluates input validation and guardrailsmake eval_tool_post_check- Tests output validationmake eval_tool_ir- Information retrieval evaluationmake eval_tool_tools- Tool selection and execution evaluation
SQL Agent Evaluations:
make eval_sql_e2e- End-to-end SQL workflow evaluationmake eval_sql_aggregate- Tests aggregation query generationmake eval_sql_construct- SQL query construction evaluationmake eval_sql_filter- Filter clause generation testingmake eval_sql_grounding- Entity name to schema mappingmake eval_sql_join- Join inference evaluationmake eval_sql_pre_check- Input validation for SQL queries
-
Running All Evaluations
make eval_tool- Run all tool agent evaluationsmake eval_sql- Run all SQL agent evaluations
Evaluation results are stored in evals/reports/ with timestamps and include:
- Individual test results with scores
- Pass/fail status based on configured thresholds
- Detailed reasoning from the LLM judge
- Aggregate statistics for the evaluation run
The evaluation framework helps ensure consistent quality across different agent implementations and provides insights into areas for improvement.
This is a personal project inspired by my past work, but built independently from scratch. The architecture is based on Domain-Driven Design principles from the "Cosmic Python" book, demonstrating how event-driven patterns can be effectively applied to AI agent systems.

