Azure Agent Framework Sample

This project demonstrates how to build an Azure OpenAI-powered chat agent with the Microsoft Agent Framework. The agent can call three deterministic tools – weather, company news, and stock quotes – and can be explored interactively via the Agent Framework DevUI.

Features:

🤖 Learning-focused chat agent sample with tool calling
🧪 Comprehensive evaluation framework with 6 evaluators
📊 24 pytest tests covering all agent capabilities
🎨 Beautiful CLI output with Rich formatting
🔧 AI Toolkit compatible evaluators
✅ CI/CD ready with exit codes and JSON/CSV export
🐳 Dev Container support with pre-configured extensions

Prerequisites

Python 3.12+
uv for dependency management (installed by default when you run uv commands)
An Azure OpenAI resource with a Responses deployment

OR use the included Dev Container for a pre-configured environment (see Dev Container Setup below).

Getting Started

Option 1: Local Development

Install dependencies
```
uv sync
```
This installs all dependencies including the Agent Framework DevUI.
Configure Azure credentials

Copy .env.example and populate with your values:
```
Copy-Item .env.example .env
```
Required variables in .env:
- AZURE_OPENAI_ENDPOINT – Your Azure OpenAI endpoint URL
- AZURE_OPENAI_DEPLOYMENT_NAME – Your deployment name (e.g., gpt-4)
- AZURE_OPENAI_API_KEY – Your API key (or use CLI credentials)
Optional variables:
- AZURE_OPENAI_API_VERSION – API version (defaults to SDK default)
- AZURE_OPENAI_USE_CLI_CREDENTIAL – Set to true to use az login instead of API key
When using CLI credentials, authenticate with az login beforehand.
Run the sample agent from the command line
```
uv run azure-agent-sample "Give me the weather in Seattle and a stock update for Contoso."
```
Omitting the prompt starts an interactive chat session.

Option 2: Dev Container (Recommended for Quick Start)

Use the pre-configured Dev Container with all tools and extensions installed:

Prerequisites:
- Docker Desktop
- VS Code with Dev Containers extension
Open in container:
- Press F1 → Dev Containers: Open Folder in Container
- Wait for initial setup (~3-5 minutes first time)
- Configure .env with your Azure OpenAI credentials
- Press F5 to start debugging!

Visual Studio Code Integration

Two useful launchers are defined in .vscode/launch.json:

Python Debugger: Current File – Standard configuration for debugging the currently open script.
Azure Agent Sample: CLI – Launches the agent in CLI mode with an interactive chat session. Use F5 to start the agent, and interact with it directly in the terminal while hitting breakpoints in your Python code.

Azure Agent Sample: DevUI – Launches the Agent Framework DevUI via agent_framework_devui._cli with PYTHONPATH set to include src. Use F5 to start the DevUI, browse to http://127.0.0.1:8080, and interact with the discovered ContosoInsights agent while hitting breakpoints in your Python code.

If the DevUI cannot find the agent at start-up, ensure the Azure OpenAI environment variables are populated and restart the session.

Tests

Unit Tests

Run the bundled unit tests to validate tool behaviour:

uv run pytest tests/ -v

Evaluation Tests

Comprehensive quality evaluation tests using the Azure AI Evaluation SDK are available in the eval-tests/ directory. After recent consolidation, the evaluation framework is streamlined with minimal files and simple CLI options.

What's New After Consolidation:

📚 Single comprehensive eval-tests/README.md (replaces 14+ docs)
🧪 All 24 tests in test_agent_evaluation.py
🚀 Unified runner run_evaluations.py with clean CLI
📊 19 test cases in test_data_extended.jsonl
🎯 6 evaluators: 3 built-in (Coherence, Relevance, Groundedness) + 3 custom (IntentResolution, ToolCallAccuracy, TaskAdherence)

Quick Start: Run Evaluations

Option 1: Quick Validation (Recommended for Fast Feedback)

Run 5 representative tests across all 6 evaluators (~7 seconds):

uv run python eval-tests/run_evaluations.py --quick

📊 Sample Output (Click to expand)

╭─────────────────────────────────────────╮
│ Azure Agent Evaluation Runner           │
│ Quick test run (5 representative cases) │
│ Fast validation of core functionality   │
╰─────────────────────────────────────────╯

Running 5 quick tests...

                         Summary by Evaluator                         
┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Evaluator       ┃  Avg Score ┃    Pass Rate ┃ Tests ┃ Avg Duration ┃
┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━┩
│ Coherence       │ 4.00 / 5.0 │ 5/5 (100.0%) │     5 │       2407ms │
│ Groundedness    │ 4.60 / 5.0 │ 5/5 (100.0%) │     5 │       2584ms │
│ IntentResoluti… │ 5.00 / 5.0 │ 5/5 (100.0%) │     5 │          0ms │
│ Relevance       │ 4.20 / 5.0 │ 5/5 (100.0%) │     5 │       1632ms │
│ TaskAdherence   │ 5.00 / 5.0 │ 5/5 (100.0%) │     5 │          0ms │
│ ToolCallAccura… │ 5.00 / 5.0 │ 5/5 (100.0%) │     5 │          0ms │
└─────────────────┴────────────┴──────────────┴───────┴──────────────┘

╭────────────────────────────────────────────╮
│ Overall: 30/30 evaluations passed (100.0%) │
╰────────────────────────────────────────────╯

✓ Evaluation completed with 100.0% pass rate

What's being tested:

5 queries × 6 evaluators = 30 total evaluations
Validates: weather queries, stock quotes, news, and multi-tool orchestration
Perfect 100% pass rate with average scores above 4.0/5.0
Execution time: ~7 seconds

Option 2: Default Suite (10 In-Scope Tests)

Run the main test suite with beautiful Rich output:

uv run python eval-tests/run_evaluations.py

Option 3: Extended Suite (All 19 Tests)

Run all tests including out-of-scope queries:

uv run python eval-tests/run_evaluations.py --extended

Option 4: Three-Tool Comprehensive Test

Show detailed analysis of the comprehensive multi-tool test:

uv run python eval-tests/run_evaluations.py --extended --show-three-tool

Option 5: Pytest (Recommended for CI/CD)

Run evaluations as unit tests with pass/fail assertions:

# First-time setup: install evaluation dependencies
uv sync --group dev

# Validate environment configuration
uv run python eval-tests/validate_setup.py

# Run all 24 evaluation tests
uv run pytest eval-tests/test_agent_evaluation.py -v

# Run specific test
uv run pytest eval-tests/test_agent_evaluation.py::test_all_three_tools_comprehensive -v

CLI Options Reference

# Show help
uv run python eval-tests/run_evaluations.py --help

🔧 Available Options

usage: run_evaluations.py [-h] [--extended] [--quick] [--show-three-tool]

Azure Agent Evaluation Runner

options:
  -h, --help         show this help message and exit
  --extended         Run all 19 test cases (includes out-of-scope queries)
  --quick            Run 5 representative tests only
  --show-three-tool  Display three-tool comprehensive test summary

Examples:
  python eval-tests/run_evaluations.py                              # Run default tests (10 in-scope)
  python eval-tests/run_evaluations.py --extended                   # Run all 19 tests
  python eval-tests/run_evaluations.py --quick                      # Run 5 quick tests
  python eval-tests/run_evaluations.py --extended --show-three-tool # Show three-tool summary

Mode Comparison:

Mode	Test Cases	Duration	Use Case
`--quick`	5 tests	~7s	Fast validation during development
Default	10 tests	~15s	Standard quality checks (in-scope only)
`--extended`	19 tests	~30s	Full suite including edge cases

What Gets Evaluated

The framework tests your agent across multiple quality dimensions:

Evaluator	What It Measures	Score Range
IntentResolution	Does the agent understand user queries?	1-5
ToolCallAccuracy	Does it invoke correct tools with right parameters?	1-5
TaskAdherence	Does it follow system instructions?	1-5
Relevance	Are responses directly relevant?	1-5
Coherence	Are responses logically structured?	1-5
Groundedness	Are responses factually accurate?	1-5

Test Coverage

✅ Single tool queries (weather, news, stock)
✅ Multi-tool queries (weather + stock)
✅ Comprehensive queries (all 3 tools in one request)
✅ Out-of-scope queries (graceful decline handling)
✅ Edge cases (missing params, unclear intent)

Documentation

eval-tests/README.md – Complete guide (setup, architecture, troubleshooting)
eval-tests/CONSOLIDATION_SUMMARY.md – Recent refactoring details
eval-tests/aitk_format/README.md – AI Toolkit compatible evaluators

Performance Metrics

From recent validation run:

Quick mode: 96.7% pass rate (29/30 evaluations)
Execution time: ~7 seconds (quick mode)
Average scores: Intent 5.0, ToolAccuracy 4.2, TaskAdherence 5.0, Coherence 4.0

Integration Options

The evaluation framework supports multiple workflows:

Command Line → Use run_evaluations.py for formatted console output
CI/CD Pipeline → Use pytest for automated testing with exit codes
VS Code Test Explorer → Discover and run tests in VS Code UI
AI Toolkit → Optional - Use evaluators in AI Toolkit for VS Code visual UI (see aitk_format/README.md)
Custom Integration → Import evaluators directly in your Python code

Note: The AI Toolkit integration is optional. The evaluation framework works perfectly without it using the CLI runner or pytest.

Project Layout

📦 agent-framework/
│
├── 📁 src/                         # 🤖 Core agent implementation
│   ├── agent.py                    # ChatAgent with tool calling
│   ├── cli.py                      # Command line interface
│   ├── config.py                   # Azure OpenAI configuration
│   ├── devui.py                    # DevUI integration
│   └── tools/                      # Business logic: weather, news, stocks
│
├── 📁 eval-tests/                  # 🧪 Quality assurance framework
│   ├── README.md                   # Complete evaluation guide (300+ lines)
│   ├── run_evaluations.py          # CLI runner with Rich output
│   ├── test_agent_evaluation.py    # 24 pytest tests
│   ├── test_data_extended.jsonl    # 19 test cases
│   ├── custom_evaluators.py        # 3 custom evaluators
│   ├── aitk_evaluators.py          # Azure AI SDK wrappers
│   ├── validate_setup.py           # Environment validation
│   ├── conftest.py                 # Pytest configuration
│   ├── CONSOLIDATION_SUMMARY.md    # Refactoring documentation
│   └── aitk_format/                # 🔧 Optional: AI Toolkit integration
│       ├── README.md               # AI Toolkit setup guide
│       └── *_local_eval.py         # 4 evaluators for AI Toolkit UI
│
├── 📁 tests/                       # ✅ Unit tests for tools
│
├── .env.example                    # Environment template
├── .env                            # Your credentials (gitignored)
├── pyproject.toml                  # Dependencies and metadata
└── README.md                       # This file

Key Components:

src/ - The main agent application that handles user queries and orchestrates tool calls
eval-tests/ - Comprehensive quality testing with 6 evaluators and 24 tests
eval-tests/aitk_format/ - Optional integration for visual evaluation in AI Toolkit for VS Code (see aitk_format/README.md)
tests/ - Unit tests for individual tool functions

Key Files After Consolidation

Before consolidation: ~30 files in eval-tests/
After consolidation: 10 essential files (70% reduction)

What Changed:

✅ Documentation: 16 files → 2 files (README.md + CONSOLIDATION_SUMMARY.md)
✅ Tests: 5 files → 2 files (test_agent_evaluation.py + validate_setup.py)
✅ Runners: 2 files → 1 file (run_evaluations.py with CLI flags)
✅ Data: 2 files → 1 file (test_data_extended.jsonl)
✅ CLI: Simplified to 3 flags (--extended, --quick, --show-three-tool)

Quick Command Reference

# Development
uv sync                                                    # Install dependencies
uv run azure-agent-sample                                  # Interactive chat
uv run azure-agent-sample "Your query here"                # Single query

# Testing
uv run pytest tests/ -v                                    # Unit tests
uv run pytest eval-tests/test_agent_evaluation.py -v       # Evaluation tests (24 tests)

# Evaluations (pick your workflow)
uv run python eval-tests/run_evaluations.py --quick        # Quick validation (5 tests, 7s)
uv run python eval-tests/run_evaluations.py                # Default suite (10 tests)
uv run python eval-tests/run_evaluations.py --extended     # Full suite (19 tests)

# DevUI (F5 in VS Code)
uv run -m devui                                            # Start DevUI server

Additional Resources

Agent Framework Documentation – Official framework docs
Azure AI Evaluation SDK – Evaluation SDK reference
eval-tests/README.md – Complete evaluation guide
eval-tests/CONSOLIDATION_SUMMARY.md – Refactoring details
eval-tests/aitk_format/README.md – AI Toolkit integration

Contributing

This is a sample project demonstrating Agent Framework capabilities. Feel free to:

✅ Add new tools in src/tools/
✅ Extend evaluation tests in eval-tests/test_agent_evaluation.py
✅ Add test cases in eval-tests/test_data_extended.jsonl
✅ Improve custom evaluators in eval-tests/custom_evaluators.py

License

This sample is provided as-is for educational purposes. See the Agent Framework repository for licensing information.

Happy hacking! Fill in your Azure credentials, set breakpoints anywhere inside src/, and press F5 → Azure Agent Sample: DevUI to debug live interactions.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.devcontainer		.devcontainer
.vscode		.vscode
docs		docs
eval-tests		eval-tests
src		src
tests		tests
.coverage		.coverage
.editorconfig		.editorconfig
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Azure Agent Framework Sample

Prerequisites

Getting Started

Option 1: Local Development

Option 2: Dev Container (Recommended for Quick Start)

Visual Studio Code Integration

Tests

Unit Tests

Evaluation Tests

Quick Start: Run Evaluations

CLI Options Reference

What Gets Evaluated

Test Coverage

Documentation

Performance Metrics

Integration Options

Project Layout

Key Files After Consolidation

Quick Command Reference

Additional Resources

Contributing

License

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Azure Agent Framework Sample

Prerequisites

Getting Started

Option 1: Local Development

Option 2: Dev Container (Recommended for Quick Start)

Visual Studio Code Integration

Tests

Unit Tests

Evaluation Tests

Quick Start: Run Evaluations

CLI Options Reference

What Gets Evaluated

Test Coverage

Documentation

Performance Metrics

Integration Options

Project Layout

Key Files After Consolidation

Quick Command Reference

Additional Resources

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages