This project demonstrates how to build an Azure OpenAI-powered chat agent with the Microsoft Agent Framework. The agent can call three deterministic tools – weather, company news, and stock quotes – and can be explored interactively via the Agent Framework DevUI.
Features:
- 🤖 Learning-focused chat agent sample with tool calling
- 🧪 Comprehensive evaluation framework with 6 evaluators
- 📊 24 pytest tests covering all agent capabilities
- 🎨 Beautiful CLI output with Rich formatting
- 🔧 AI Toolkit compatible evaluators
- ✅ CI/CD ready with exit codes and JSON/CSV export
- 🐳 Dev Container support with pre-configured extensions
- Python 3.12+
- uv for dependency management (installed by
default when you run
uvcommands) - An Azure OpenAI resource with a Responses deployment
OR use the included Dev Container for a pre-configured environment (see Dev Container Setup below).
-
Install dependencies
uv sync
This installs all dependencies including the Agent Framework DevUI.
-
Configure Azure credentials
Copy
.env.exampleand populate with your values:Copy-Item .env.example .envRequired variables in
.env:AZURE_OPENAI_ENDPOINT– Your Azure OpenAI endpoint URLAZURE_OPENAI_DEPLOYMENT_NAME– Your deployment name (e.g., gpt-4)AZURE_OPENAI_API_KEY– Your API key (or use CLI credentials)
Optional variables:
AZURE_OPENAI_API_VERSION– API version (defaults to SDK default)AZURE_OPENAI_USE_CLI_CREDENTIAL– Set totrueto useaz logininstead of API key
When using CLI credentials, authenticate with
az loginbeforehand. -
Run the sample agent from the command line
uv run azure-agent-sample "Give me the weather in Seattle and a stock update for Contoso."
Omitting the prompt starts an interactive chat session.
Use the pre-configured Dev Container with all tools and extensions installed:
-
Prerequisites:
-
Open in container:
- Press
F1→Dev Containers: Open Folder in Container - Wait for initial setup (~3-5 minutes first time)
- Configure
.envwith your Azure OpenAI credentials - Press
F5to start debugging!
- Press
Two useful launchers are defined in .vscode/launch.json:
- Python Debugger: Current File – Standard configuration for debugging the currently open script.
- Azure Agent Sample: CLI – Launches the agent in CLI mode with an interactive chat session. Use F5 to start the agent, and interact with it directly in the terminal while hitting breakpoints in your Python code.
- Azure Agent Sample: DevUI – Launches the Agent Framework DevUI via
agent_framework_devui._cliwithPYTHONPATHset to includesrc. Use F5 to start the DevUI, browse tohttp://127.0.0.1:8080, and interact with the discoveredContosoInsightsagent while hitting breakpoints in your Python code.
If the DevUI cannot find the agent at start-up, ensure the Azure OpenAI environment variables are populated and restart the session.
Run the bundled unit tests to validate tool behaviour:
uv run pytest tests/ -vComprehensive quality evaluation tests using the Azure AI Evaluation SDK are
available in the eval-tests/ directory. After recent consolidation, the evaluation
framework is streamlined with minimal files and simple CLI options.
What's New After Consolidation:
- 📚 Single comprehensive eval-tests/README.md (replaces 14+ docs)
- 🧪 All 24 tests in test_agent_evaluation.py
- 🚀 Unified runner run_evaluations.py with clean CLI
- 📊 19 test cases in test_data_extended.jsonl
- 🎯 6 evaluators: 3 built-in (Coherence, Relevance, Groundedness) + 3 custom (IntentResolution, ToolCallAccuracy, TaskAdherence)

Option 1: Quick Validation (Recommended for Fast Feedback)
Run 5 representative tests across all 6 evaluators (~7 seconds):
uv run python eval-tests/run_evaluations.py --quick📊 Sample Output (Click to expand)
╭─────────────────────────────────────────╮
│ Azure Agent Evaluation Runner │
│ Quick test run (5 representative cases) │
│ Fast validation of core functionality │
╰─────────────────────────────────────────╯
Running 5 quick tests...
Summary by Evaluator
┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Evaluator ┃ Avg Score ┃ Pass Rate ┃ Tests ┃ Avg Duration ┃
┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━┩
│ Coherence │ 4.00 / 5.0 │ 5/5 (100.0%) │ 5 │ 2407ms │
│ Groundedness │ 4.60 / 5.0 │ 5/5 (100.0%) │ 5 │ 2584ms │
│ IntentResoluti… │ 5.00 / 5.0 │ 5/5 (100.0%) │ 5 │ 0ms │
│ Relevance │ 4.20 / 5.0 │ 5/5 (100.0%) │ 5 │ 1632ms │
│ TaskAdherence │ 5.00 / 5.0 │ 5/5 (100.0%) │ 5 │ 0ms │
│ ToolCallAccura… │ 5.00 / 5.0 │ 5/5 (100.0%) │ 5 │ 0ms │
└─────────────────┴────────────┴──────────────┴───────┴──────────────┘
╭────────────────────────────────────────────╮
│ Overall: 30/30 evaluations passed (100.0%) │
╰────────────────────────────────────────────╯
✓ Evaluation completed with 100.0% pass rate
What's being tested:
- 5 queries × 6 evaluators = 30 total evaluations
- Validates: weather queries, stock quotes, news, and multi-tool orchestration
- Perfect 100% pass rate with average scores above 4.0/5.0
- Execution time: ~7 seconds
Option 2: Default Suite (10 In-Scope Tests)
Run the main test suite with beautiful Rich output:
uv run python eval-tests/run_evaluations.pyOption 3: Extended Suite (All 19 Tests)
Run all tests including out-of-scope queries:
uv run python eval-tests/run_evaluations.py --extendedOption 4: Three-Tool Comprehensive Test
Show detailed analysis of the comprehensive multi-tool test:
uv run python eval-tests/run_evaluations.py --extended --show-three-toolOption 5: Pytest (Recommended for CI/CD)
Run evaluations as unit tests with pass/fail assertions:
# First-time setup: install evaluation dependencies
uv sync --group dev
# Validate environment configuration
uv run python eval-tests/validate_setup.py
# Run all 24 evaluation tests
uv run pytest eval-tests/test_agent_evaluation.py -v
# Run specific test
uv run pytest eval-tests/test_agent_evaluation.py::test_all_three_tools_comprehensive -v# Show help
uv run python eval-tests/run_evaluations.py --help🔧 Available Options
usage: run_evaluations.py [-h] [--extended] [--quick] [--show-three-tool]
Azure Agent Evaluation Runner
options:
-h, --help show this help message and exit
--extended Run all 19 test cases (includes out-of-scope queries)
--quick Run 5 representative tests only
--show-three-tool Display three-tool comprehensive test summary
Examples:
python eval-tests/run_evaluations.py # Run default tests (10 in-scope)
python eval-tests/run_evaluations.py --extended # Run all 19 tests
python eval-tests/run_evaluations.py --quick # Run 5 quick tests
python eval-tests/run_evaluations.py --extended --show-three-tool # Show three-tool summary
Mode Comparison:
| Mode | Test Cases | Duration | Use Case |
|---|---|---|---|
--quick |
5 tests | ~7s | Fast validation during development |
| Default | 10 tests | ~15s | Standard quality checks (in-scope only) |
--extended |
19 tests | ~30s | Full suite including edge cases |
The framework tests your agent across multiple quality dimensions:
| Evaluator | What It Measures | Score Range |
|---|---|---|
| IntentResolution | Does the agent understand user queries? | 1-5 |
| ToolCallAccuracy | Does it invoke correct tools with right parameters? | 1-5 |
| TaskAdherence | Does it follow system instructions? | 1-5 |
| Relevance | Are responses directly relevant? | 1-5 |
| Coherence | Are responses logically structured? | 1-5 |
| Groundedness | Are responses factually accurate? | 1-5 |
- ✅ Single tool queries (weather, news, stock)
- ✅ Multi-tool queries (weather + stock)
- ✅ Comprehensive queries (all 3 tools in one request)
- ✅ Out-of-scope queries (graceful decline handling)
- ✅ Edge cases (missing params, unclear intent)
- eval-tests/README.md – Complete guide (setup, architecture, troubleshooting)
- eval-tests/CONSOLIDATION_SUMMARY.md – Recent refactoring details
- eval-tests/aitk_format/README.md – AI Toolkit compatible evaluators
From recent validation run:
- Quick mode: 96.7% pass rate (29/30 evaluations)
- Execution time: ~7 seconds (quick mode)
- Average scores: Intent 5.0, ToolAccuracy 4.2, TaskAdherence 5.0, Coherence 4.0
The evaluation framework supports multiple workflows:
- Command Line → Use
run_evaluations.pyfor formatted console output - CI/CD Pipeline → Use pytest for automated testing with exit codes
- VS Code Test Explorer → Discover and run tests in VS Code UI
- AI Toolkit → Optional - Use evaluators in AI Toolkit for VS Code visual UI (see aitk_format/README.md)
- Custom Integration → Import evaluators directly in your Python code
Note: The AI Toolkit integration is optional. The evaluation framework works perfectly without it using the CLI runner or pytest.
📦 agent-framework/
│
├── 📁 src/ # 🤖 Core agent implementation
│ ├── agent.py # ChatAgent with tool calling
│ ├── cli.py # Command line interface
│ ├── config.py # Azure OpenAI configuration
│ ├── devui.py # DevUI integration
│ └── tools/ # Business logic: weather, news, stocks
│
├── 📁 eval-tests/ # 🧪 Quality assurance framework
│ ├── README.md # Complete evaluation guide (300+ lines)
│ ├── run_evaluations.py # CLI runner with Rich output
│ ├── test_agent_evaluation.py # 24 pytest tests
│ ├── test_data_extended.jsonl # 19 test cases
│ ├── custom_evaluators.py # 3 custom evaluators
│ ├── aitk_evaluators.py # Azure AI SDK wrappers
│ ├── validate_setup.py # Environment validation
│ ├── conftest.py # Pytest configuration
│ ├── CONSOLIDATION_SUMMARY.md # Refactoring documentation
│ └── aitk_format/ # 🔧 Optional: AI Toolkit integration
│ ├── README.md # AI Toolkit setup guide
│ └── *_local_eval.py # 4 evaluators for AI Toolkit UI
│
├── 📁 tests/ # ✅ Unit tests for tools
│
├── .env.example # Environment template
├── .env # Your credentials (gitignored)
├── pyproject.toml # Dependencies and metadata
└── README.md # This file
Key Components:
- src/ - The main agent application that handles user queries and orchestrates tool calls
- eval-tests/ - Comprehensive quality testing with 6 evaluators and 24 tests
- eval-tests/aitk_format/ - Optional integration for visual evaluation in AI Toolkit for VS Code (see aitk_format/README.md)
- tests/ - Unit tests for individual tool functions
Before consolidation: ~30 files in eval-tests/
After consolidation: 10 essential files (70% reduction)
What Changed:
- ✅ Documentation: 16 files → 2 files (README.md + CONSOLIDATION_SUMMARY.md)
- ✅ Tests: 5 files → 2 files (test_agent_evaluation.py + validate_setup.py)
- ✅ Runners: 2 files → 1 file (run_evaluations.py with CLI flags)
- ✅ Data: 2 files → 1 file (test_data_extended.jsonl)
- ✅ CLI: Simplified to 3 flags (--extended, --quick, --show-three-tool)
# Development
uv sync # Install dependencies
uv run azure-agent-sample # Interactive chat
uv run azure-agent-sample "Your query here" # Single query
# Testing
uv run pytest tests/ -v # Unit tests
uv run pytest eval-tests/test_agent_evaluation.py -v # Evaluation tests (24 tests)
# Evaluations (pick your workflow)
uv run python eval-tests/run_evaluations.py --quick # Quick validation (5 tests, 7s)
uv run python eval-tests/run_evaluations.py # Default suite (10 tests)
uv run python eval-tests/run_evaluations.py --extended # Full suite (19 tests)
# DevUI (F5 in VS Code)
uv run -m devui # Start DevUI server- Agent Framework Documentation – Official framework docs
- Azure AI Evaluation SDK – Evaluation SDK reference
- eval-tests/README.md – Complete evaluation guide
- eval-tests/CONSOLIDATION_SUMMARY.md – Refactoring details
- eval-tests/aitk_format/README.md – AI Toolkit integration
This is a sample project demonstrating Agent Framework capabilities. Feel free to:
- ✅ Add new tools in
src/tools/ - ✅ Extend evaluation tests in
eval-tests/test_agent_evaluation.py - ✅ Add test cases in
eval-tests/test_data_extended.jsonl - ✅ Improve custom evaluators in
eval-tests/custom_evaluators.py
This sample is provided as-is for educational purposes. See the Agent Framework repository for licensing information.
Happy hacking! Fill in your Azure credentials, set breakpoints anywhere inside
src/, and press F5 → Azure Agent Sample: DevUI to debug
live interactions.

