Lab environment demonstrating Azure OpenAI enterprise patterns. Covers:
- APIM load balancing across multi-region Azure OpenAI instances (the following scenario)
- Responses API routing through APIM and AI Foundry connectors
- Minimum RBAC permissions for Responses API with BYO storage + AI Search (no Contributor needed)
- Working test scripts for hands-on validation of all scenarios
┌─────────────────────────────────────────┐
│ Client App │
└───────┬──────────────┬──────────────┬───┘
│ │ │
Direct call Via APIM Via AI Foundry
│ │ │
│ ┌─────────▼──────────┐ │
│ │ API Management │ │
│ │ (Load Balancer) │ │
│ │ │ │
│ │ ┌──────────────┐ │ │
│ │ │ Backend Pool │ │ │
│ │ │ 50% / 50% │ │ │
│ │ └──┬───────┬───┘ │ │
│ └─────┼───────┼───────┘ │
│ │ │ │
┌──────────▼──────────▼┐ ┌──▼───────────▼────────┐
│ Azure OpenAI │ │ Azure OpenAI │
│ East US │ │ East US 2 │
│ (GPT-4o) │ │ (GPT-4o) │
└──────────────────────┘ └────────────────────────┘
▲ ▲
│ ┌───────────┐ │
└──────┤ AI Foundry ├──────┘
│ Project │
│ │
│ Connections:│
│ • aoai-eus │
│ • aoai-eus2│
│ • ai-search│
└──────┬─────┘
│
┌───────────────┼───────────────┐
│ │ │
┌──────▼──────┐ ┌──────▼──────┐ ┌─────▼──────┐
│ BYO Storage │ │ AI Search │ │ Key Vault │
│ Account │ │ │ │ │
└─────────────┘ └─────────────┘ └────────────┘
- Azure CLI (
az) logged in with a subscription that can create resources - Python 3.10+
- Bash shell
# Basic deployment
./scripts/deploy.sh rg-openai-apim-lab eastus
# With RBAC role assignments for a specific user
USER_OID=$(az ad signed-in-user show --query id -o tsv)
./scripts/deploy.sh rg-openai-apim-lab eastus "$USER_OID"Note: APIM deployment takes ~15-25 minutes. Other resources deploy in ~5 minutes.
# Use the simplified policy (no Event Hub dependency)
az apim api operation policy create \
--resource-group rg-openai-apim-lab \
--service-name <apim-name> \
--api-id azure-openai \
--operation-id all-operations \
--xml-file apim-policies/openai-load-balancer-simple.xmlcd scripts
pip install -r requirements.txt
# Test Responses API (direct, APIM, AI Foundry)
python test_responses_api.py
# Test load balancing distribution
python test_apim_load_balancing.py
# Validate RBAC permissions
python test_rbac_permissions.pyAPIM Load Balancing:
- APIM uses a backend pool (
openai-pool) with two Azure OpenAI backends - Requests are distributed via weighted round-robin (configurable 50/50, 70/30, etc.)
- The APIM policy adds retry with failover: if one backend returns 429 or 5xx, the request retries on the other backend
- APIM authenticates to Azure OpenAI using its managed identity (no API keys)
Translation to AI Foundry:
- AI Foundry does not natively provide APIM-style load balancing across connected OpenAI resources
- When you add multiple OpenAI connections to AI Foundry, each connection maps to a specific resource
- The project endpoint routes to whichever connected resource has the requested model deployment
- Recommendation: Keep APIM as the load balancer in front of OpenAI, and optionally connect APIM (as a single endpoint) to AI Foundry — or manage routing at the application layer
See: apim-policies/ and scripts/test_apim_load_balancing.py
Three access patterns are demonstrated:
| Pattern | URL | Auth |
|---|---|---|
| Direct | https://{resource}.openai.azure.com/openai/responses?api-version=2025-03-01-preview |
Entra ID token (cognitiveservices.azure.com) |
| Via APIM | https://{apim}.azure-api.net/openai/responses?api-version=2025-03-01-preview |
APIM subscription key (APIM handles Entra auth to backend) |
| Via AI Foundry | Use the SDK: project_client.inference.get_azure_openai_client() then call client.responses.create() |
Entra ID token (SDK handles routing to the correct connected resource) |
Key insight: With AI Foundry, you do NOT specify the raw OpenAI resource URL. The SDK resolves which connected OpenAI resource to use based on the model deployment name and the project's connections.
See: scripts/test_responses_api.py
The Azure AI User role alone is insufficient. The minimum required roles are:
| Role | Scope | Purpose |
|---|---|---|
| Cognitive Services OpenAI User | Azure OpenAI resource(s) | Inference access |
| Storage Blob Data Contributor | BYO Storage Account | File operations for Responses API tools |
| Search Index Data Contributor | AI Search resource | Search queries for grounding |
| Azure AI Developer | AI Foundry Project | Project endpoint access |
See: docs/rbac-permissions.md for the full guide including CLI commands, troubleshooting, and why Contributor is not needed.
├── README.md # This file
├── infra/
│ ├── main.bicep # Main deployment orchestrator
│ ├── main.bicepparam # Default parameters
│ └── modules/
│ ├── openai.bicep # Azure OpenAI resource + model deployment
│ ├── apim.bicep # APIM + backend pool + API definition
│ ├── ai-foundry.bicep # AI Foundry hub + project + connections
│ ├── ai-search.bicep # AI Search resource
│ ├── storage.bicep # BYO storage account
│ └── rbac.bicep # All RBAC role assignments (documented)
├── apim-policies/
│ ├── openai-load-balancer.xml # Full policy (with Event Hub logging)
│ └── openai-load-balancer-simple.xml # Simplified policy (no dependencies)
├── scripts/
│ ├── deploy.sh # One-command deployment script
│ ├── .env.sample # Environment variable template
│ ├── requirements.txt # Python dependencies
│ ├── test_responses_api.py # Test Responses API (3 access patterns)
│ ├── test_apim_load_balancing.py # Test APIM load distribution
│ └── test_rbac_permissions.py # Validate RBAC permissions
└── docs/
└── rbac-permissions.md # Detailed RBAC guide for the customer
az group delete --name rg-openai-apim-lab --yes --no-wait