Skip to content

obrocki/agent-framework-sample

Repository files navigation

Azure Agent Framework Sample

This project demonstrates how to build an Azure OpenAI-powered chat agent with the Microsoft Agent Framework. The agent can call three deterministic tools – weather, company news, and stock quotes – and can be explored interactively via the Agent Framework DevUI.

Features:

  • 🤖 Learning-focused chat agent sample with tool calling
  • 🧪 Comprehensive evaluation framework with 6 evaluators
  • 📊 24 pytest tests covering all agent capabilities
  • 🎨 Beautiful CLI output with Rich formatting
  • 🔧 AI Toolkit compatible evaluators
  • ✅ CI/CD ready with exit codes and JSON/CSV export
  • 🐳 Dev Container support with pre-configured extensions

Prerequisites

  • Python 3.12+
  • uv for dependency management (installed by default when you run uv commands)
  • An Azure OpenAI resource with a Responses deployment

OR use the included Dev Container for a pre-configured environment (see Dev Container Setup below).

Getting Started

Option 1: Local Development

  1. Install dependencies

    uv sync

    This installs all dependencies including the Agent Framework DevUI.

  2. Configure Azure credentials

    Copy .env.example and populate with your values:

    Copy-Item .env.example .env

    Required variables in .env:

    • AZURE_OPENAI_ENDPOINT – Your Azure OpenAI endpoint URL
    • AZURE_OPENAI_DEPLOYMENT_NAME – Your deployment name (e.g., gpt-4)
    • AZURE_OPENAI_API_KEY – Your API key (or use CLI credentials)

    Optional variables:

    • AZURE_OPENAI_API_VERSION – API version (defaults to SDK default)
    • AZURE_OPENAI_USE_CLI_CREDENTIAL – Set to true to use az login instead of API key

    When using CLI credentials, authenticate with az login beforehand.

  3. Run the sample agent from the command line

    uv run azure-agent-sample "Give me the weather in Seattle and a stock update for Contoso."

    Omitting the prompt starts an interactive chat session.

Option 2: Dev Container (Recommended for Quick Start)

Use the pre-configured Dev Container with all tools and extensions installed:

  1. Prerequisites:

  2. Open in container:

    • Press F1Dev Containers: Open Folder in Container
    • Wait for initial setup (~3-5 minutes first time)
    • Configure .env with your Azure OpenAI credentials
    • Press F5 to start debugging!

Visual Studio Code Integration

Two useful launchers are defined in .vscode/launch.json:

  • Python Debugger: Current File – Standard configuration for debugging the currently open script.
  • Azure Agent Sample: CLI – Launches the agent in CLI mode with an interactive chat session. Use F5 to start the agent, and interact with it directly in the terminal while hitting breakpoints in your Python code.

CLI Screenshot

  • Azure Agent Sample: DevUI – Launches the Agent Framework DevUI via agent_framework_devui._cli with PYTHONPATH set to include src. Use F5 to start the DevUI, browse to http://127.0.0.1:8080, and interact with the discovered ContosoInsights agent while hitting breakpoints in your Python code.

DevUI Screenshot

If the DevUI cannot find the agent at start-up, ensure the Azure OpenAI environment variables are populated and restart the session.

Tests

Unit Tests

Run the bundled unit tests to validate tool behaviour:

uv run pytest tests/ -v

Evaluation Tests

Comprehensive quality evaluation tests using the Azure AI Evaluation SDK are available in the eval-tests/ directory. After recent consolidation, the evaluation framework is streamlined with minimal files and simple CLI options.

What's New After Consolidation:

Quick Start: Run Evaluations

Option 1: Quick Validation (Recommended for Fast Feedback)

Run 5 representative tests across all 6 evaluators (~7 seconds):

uv run python eval-tests/run_evaluations.py --quick
📊 Sample Output (Click to expand)
╭─────────────────────────────────────────╮
│ Azure Agent Evaluation Runner           │
│ Quick test run (5 representative cases) │
│ Fast validation of core functionality   │
╰─────────────────────────────────────────╯

Running 5 quick tests...

                         Summary by Evaluator                         
┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Evaluator       ┃  Avg Score ┃    Pass Rate ┃ Tests ┃ Avg Duration ┃
┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━┩
│ Coherence       │ 4.00 / 5.0 │ 5/5 (100.0%) │     5 │       2407ms │
│ Groundedness    │ 4.60 / 5.0 │ 5/5 (100.0%) │     5 │       2584ms │
│ IntentResoluti… │ 5.00 / 5.0 │ 5/5 (100.0%) │     5 │          0ms │
│ Relevance       │ 4.20 / 5.0 │ 5/5 (100.0%) │     5 │       1632ms │
│ TaskAdherence   │ 5.00 / 5.0 │ 5/5 (100.0%) │     5 │          0ms │
│ ToolCallAccura… │ 5.00 / 5.0 │ 5/5 (100.0%) │     5 │          0ms │
└─────────────────┴────────────┴──────────────┴───────┴──────────────┘

╭────────────────────────────────────────────╮
│ Overall: 30/30 evaluations passed (100.0%) │
╰────────────────────────────────────────────╯

✓ Evaluation completed with 100.0% pass rate

What's being tested:

  • 5 queries × 6 evaluators = 30 total evaluations
  • Validates: weather queries, stock quotes, news, and multi-tool orchestration
  • Perfect 100% pass rate with average scores above 4.0/5.0
  • Execution time: ~7 seconds

Option 2: Default Suite (10 In-Scope Tests)

Run the main test suite with beautiful Rich output:

uv run python eval-tests/run_evaluations.py

Option 3: Extended Suite (All 19 Tests)

Run all tests including out-of-scope queries:

uv run python eval-tests/run_evaluations.py --extended

Option 4: Three-Tool Comprehensive Test

Show detailed analysis of the comprehensive multi-tool test:

uv run python eval-tests/run_evaluations.py --extended --show-three-tool

Option 5: Pytest (Recommended for CI/CD)

Run evaluations as unit tests with pass/fail assertions:

# First-time setup: install evaluation dependencies
uv sync --group dev

# Validate environment configuration
uv run python eval-tests/validate_setup.py

# Run all 24 evaluation tests
uv run pytest eval-tests/test_agent_evaluation.py -v

# Run specific test
uv run pytest eval-tests/test_agent_evaluation.py::test_all_three_tools_comprehensive -v

CLI Options Reference

# Show help
uv run python eval-tests/run_evaluations.py --help
🔧 Available Options
usage: run_evaluations.py [-h] [--extended] [--quick] [--show-three-tool]

Azure Agent Evaluation Runner

options:
  -h, --help         show this help message and exit
  --extended         Run all 19 test cases (includes out-of-scope queries)
  --quick            Run 5 representative tests only
  --show-three-tool  Display three-tool comprehensive test summary

Examples:
  python eval-tests/run_evaluations.py                              # Run default tests (10 in-scope)
  python eval-tests/run_evaluations.py --extended                   # Run all 19 tests
  python eval-tests/run_evaluations.py --quick                      # Run 5 quick tests
  python eval-tests/run_evaluations.py --extended --show-three-tool # Show three-tool summary

Mode Comparison:

Mode Test Cases Duration Use Case
--quick 5 tests ~7s Fast validation during development
Default 10 tests ~15s Standard quality checks (in-scope only)
--extended 19 tests ~30s Full suite including edge cases

What Gets Evaluated

The framework tests your agent across multiple quality dimensions:

Evaluator What It Measures Score Range
IntentResolution Does the agent understand user queries? 1-5
ToolCallAccuracy Does it invoke correct tools with right parameters? 1-5
TaskAdherence Does it follow system instructions? 1-5
Relevance Are responses directly relevant? 1-5
Coherence Are responses logically structured? 1-5
Groundedness Are responses factually accurate? 1-5

Test Coverage

  • Single tool queries (weather, news, stock)
  • Multi-tool queries (weather + stock)
  • Comprehensive queries (all 3 tools in one request)
  • Out-of-scope queries (graceful decline handling)
  • Edge cases (missing params, unclear intent)

Documentation

Performance Metrics

From recent validation run:

  • Quick mode: 96.7% pass rate (29/30 evaluations)
  • Execution time: ~7 seconds (quick mode)
  • Average scores: Intent 5.0, ToolAccuracy 4.2, TaskAdherence 5.0, Coherence 4.0

Integration Options

The evaluation framework supports multiple workflows:

  1. Command Line → Use run_evaluations.py for formatted console output
  2. CI/CD Pipeline → Use pytest for automated testing with exit codes
  3. VS Code Test Explorer → Discover and run tests in VS Code UI
  4. AI ToolkitOptional - Use evaluators in AI Toolkit for VS Code visual UI (see aitk_format/README.md)
  5. Custom Integration → Import evaluators directly in your Python code

Note: The AI Toolkit integration is optional. The evaluation framework works perfectly without it using the CLI runner or pytest.

Project Layout

📦 agent-framework/
│
├── 📁 src/                         # 🤖 Core agent implementation
│   ├── agent.py                    # ChatAgent with tool calling
│   ├── cli.py                      # Command line interface
│   ├── config.py                   # Azure OpenAI configuration
│   ├── devui.py                    # DevUI integration
│   └── tools/                      # Business logic: weather, news, stocks
│
├── 📁 eval-tests/                  # 🧪 Quality assurance framework
│   ├── README.md                   # Complete evaluation guide (300+ lines)
│   ├── run_evaluations.py          # CLI runner with Rich output
│   ├── test_agent_evaluation.py    # 24 pytest tests
│   ├── test_data_extended.jsonl    # 19 test cases
│   ├── custom_evaluators.py        # 3 custom evaluators
│   ├── aitk_evaluators.py          # Azure AI SDK wrappers
│   ├── validate_setup.py           # Environment validation
│   ├── conftest.py                 # Pytest configuration
│   ├── CONSOLIDATION_SUMMARY.md    # Refactoring documentation
│   └── aitk_format/                # 🔧 Optional: AI Toolkit integration
│       ├── README.md               # AI Toolkit setup guide
│       └── *_local_eval.py         # 4 evaluators for AI Toolkit UI
│
├── 📁 tests/                       # ✅ Unit tests for tools
│
├── .env.example                    # Environment template
├── .env                            # Your credentials (gitignored)
├── pyproject.toml                  # Dependencies and metadata
└── README.md                       # This file

Key Components:

  • src/ - The main agent application that handles user queries and orchestrates tool calls
  • eval-tests/ - Comprehensive quality testing with 6 evaluators and 24 tests
  • eval-tests/aitk_format/ - Optional integration for visual evaluation in AI Toolkit for VS Code (see aitk_format/README.md)
  • tests/ - Unit tests for individual tool functions

Key Files After Consolidation

Before consolidation: ~30 files in eval-tests/
After consolidation: 10 essential files (70% reduction)

What Changed:

  • ✅ Documentation: 16 files → 2 files (README.md + CONSOLIDATION_SUMMARY.md)
  • ✅ Tests: 5 files → 2 files (test_agent_evaluation.py + validate_setup.py)
  • ✅ Runners: 2 files → 1 file (run_evaluations.py with CLI flags)
  • ✅ Data: 2 files → 1 file (test_data_extended.jsonl)
  • ✅ CLI: Simplified to 3 flags (--extended, --quick, --show-three-tool)

Quick Command Reference

# Development
uv sync                                                    # Install dependencies
uv run azure-agent-sample                                  # Interactive chat
uv run azure-agent-sample "Your query here"                # Single query

# Testing
uv run pytest tests/ -v                                    # Unit tests
uv run pytest eval-tests/test_agent_evaluation.py -v       # Evaluation tests (24 tests)

# Evaluations (pick your workflow)
uv run python eval-tests/run_evaluations.py --quick        # Quick validation (5 tests, 7s)
uv run python eval-tests/run_evaluations.py                # Default suite (10 tests)
uv run python eval-tests/run_evaluations.py --extended     # Full suite (19 tests)

# DevUI (F5 in VS Code)
uv run -m devui                                            # Start DevUI server

Additional Resources

Contributing

This is a sample project demonstrating Agent Framework capabilities. Feel free to:

  • ✅ Add new tools in src/tools/
  • ✅ Extend evaluation tests in eval-tests/test_agent_evaluation.py
  • ✅ Add test cases in eval-tests/test_data_extended.jsonl
  • ✅ Improve custom evaluators in eval-tests/custom_evaluators.py

License

This sample is provided as-is for educational purposes. See the Agent Framework repository for licensing information.


Happy hacking! Fill in your Azure credentials, set breakpoints anywhere inside src/, and press F5 → Azure Agent Sample: DevUI to debug live interactions.

About

Learning-focused chat agent sample with tool calling, basic evaluation framework, and DevUI integration. Built with Microsoft Agent Framework.

Resources

Stars

Watchers

Forks

Contributors