diff --git a/products/azure-ai-foundry-local/azure-ai-foundry-local.csv b/products/Microsoft-foundry-local/azure-ai-foundry-local.csv similarity index 100% rename from products/azure-ai-foundry-local/azure-ai-foundry-local.csv rename to products/Microsoft-foundry-local/azure-ai-foundry-local.csv diff --git a/products/azure-ai-foundry-local/report.md b/products/Microsoft-foundry-local/report.md similarity index 100% rename from products/azure-ai-foundry-local/report.md rename to products/Microsoft-foundry-local/report.md diff --git a/skills/azure-ai-foundry-local/SKILL.md b/skills/azure-ai-foundry-local/SKILL.md deleted file mode 100644 index aa7f7855..00000000 --- a/skills/azure-ai-foundry-local/SKILL.md +++ /dev/null @@ -1,60 +0,0 @@ ---- -name: azure-ai-foundry-local -description: Expert knowledge for Azure AI Foundry Local development including troubleshooting, best practices, configuration, and integrations & coding patterns. Use when building, debugging, or optimizing Azure AI Foundry Local applications. Not for Azure Machine Learning (use azure-machine-learning), Azure AI services (use azure-ai-services), Azure AI Vision (use azure-ai-vision), Azure AI Document Intelligence (use azure-document-intelligence). -compatibility: Requires network access. Uses mcp_microsoftdocs:microsoft_docs_fetch or fetch_webpage to retrieve documentation. -metadata: - generated_at: "2026-03-02" - generator: "docs2skills/1.0.0" ---- -# Azure AI Foundry Local Skill - -This skill provides expert guidance for Azure AI Foundry Local. Covers troubleshooting, best practices, configuration, and integrations & coding patterns. It combines local quick-reference content with remote documentation fetching capabilities. - -## How to Use This Skill - -> **IMPORTANT for Agent**: This file may be large. Use the **Category Index** below to locate relevant sections, then use `read_file` with specific line ranges (e.g., `L136-L144`) to read the sections needed for the user's question - -> **IMPORTANT for Agent**: If `metadata.generated_at` is more than 3 months old, suggest the user pull the latest version from the repository. If `mcp_microsoftdocs` tools are not available, suggest the user install it: [Installation Guide](https://github.com/MicrosoftDocs/mcp/blob/main/README.md) - -This skill requires **network access** to fetch documentation content: -- **Preferred**: Use `mcp_microsoftdocs:microsoft_docs_fetch` with query string `from=learn-agent-skill`. Returns Markdown. -- **Fallback**: Use `fetch_webpage` with query string `from=learn-agent-skill&accept=text/markdown`. Returns Markdown. - -## Category Index - -| Category | Lines | Description | -|----------|-------|-------------| -| Troubleshooting | L32-L36 | Troubleshooting setup and runtime issues when installing and running Azure AI Foundry Local specifically on Windows Server 2025. | -| Best Practices | L37-L41 | Best practices for configuring, securing, and operating Foundry Local, plus troubleshooting setup, connectivity, performance, and common runtime or deployment issues. | -| Configuration | L42-L48 | Installing and configuring Foundry Local, compiling Hugging Face models with Olive, and using the Foundry Local CLI commands and options | -| Integrations & Coding Patterns | L49-L60 | Patterns and code samples for calling Foundry Local via REST/SDKs, OpenAI-compatible clients, LangChain, Open WebUI, tool calling, transcription, and the Model Catalog API. | - -### Troubleshooting -| Topic | URL | -|-------|-----| -| Run Foundry Local on Windows Server 2025 | https://learn.microsoft.com/en-us/azure/foundry-local/reference/windows-server-frequently-asked-questions | - -### Best Practices -| Topic | URL | -|-------|-----| -| Apply best practices and troubleshoot Foundry Local | https://learn.microsoft.com/en-us/azure/foundry-local/reference/reference-best-practice | - -### Configuration -| Topic | URL | -|-------|-----| -| Install and configure Foundry Local on your device | https://learn.microsoft.com/en-us/azure/foundry-local/get-started | -| Compile Hugging Face models for Foundry Local with Olive | https://learn.microsoft.com/en-us/azure/foundry-local/how-to/how-to-compile-hugging-face-models | -| Use Foundry Local CLI commands and options | https://learn.microsoft.com/en-us/azure/foundry-local/reference/reference-cli | - -### Integrations & Coding Patterns -| Topic | URL | -|-------|-----| -| Create a chat UI using Open WebUI and Foundry Local | https://learn.microsoft.com/en-us/azure/foundry-local/how-to/how-to-chat-application-with-open-web-ui | -| Integrate Foundry Local with OpenAI-compatible SDKs | https://learn.microsoft.com/en-us/azure/foundry-local/how-to/how-to-integrate-with-inference-sdks | -| Transcribe audio using Foundry Local APIs | https://learn.microsoft.com/en-us/azure/foundry-local/how-to/how-to-transcribe-audio | -| Build a LangChain translation app with Foundry Local | https://learn.microsoft.com/en-us/azure/foundry-local/how-to/how-to-use-langchain-with-foundry-local | -| Use Foundry Local native chat completions API | https://learn.microsoft.com/en-us/azure/foundry-local/how-to/how-to-use-native-chat-completions | -| Implement tool calling workflows with Foundry Local | https://learn.microsoft.com/en-us/azure/foundry-local/how-to/how-to-use-tool-calling-with-foundry-local | -| Integrate with Foundry Local Model Catalog API | https://learn.microsoft.com/en-us/azure/foundry-local/reference/reference-catalog-api | -| Invoke Foundry Local via REST API endpoints | https://learn.microsoft.com/en-us/azure/foundry-local/reference/reference-rest | -| Call Foundry Local via SDKs in Python, JS, C#, Rust | https://learn.microsoft.com/en-us/azure/foundry-local/reference/reference-sdk | \ No newline at end of file diff --git a/skills/microsoft-foundry-local/SKILL.md b/skills/microsoft-foundry-local/SKILL.md new file mode 100644 index 00000000..544420fd --- /dev/null +++ b/skills/microsoft-foundry-local/SKILL.md @@ -0,0 +1,85 @@ +--- +name: microsoft-foundry-local +description: "Build AI applications with Foundry Local — a lightweight runtime that downloads, manages, and serves language models entirely on-device via an OpenAI-compatible API. No cloud, no API keys. Routes to specific skills for setup, chat, RAG, agents, whisper, custom models, and evaluation. WHEN: foundry local, on-device AI, local LLM, foundry local overview, what can foundry do, foundry local help, local inference, offline AI, private AI, no cloud AI, foundry capabilities." +license: MIT +metadata: + author: Microsoft + version: "1.0.0" +--- + +# Foundry Local — Skill Hub + +Foundry Local is an on-device AI runtime that serves language models via an OpenAI-compatible API at `http://localhost:/v1`. No cloud services, API keys, or Azure subscriptions required. + +## Skill Routing + +| Need | Skill | Triggers | +|------|-------|----------| +| Install CLI, start service, manage models | **microsoft-foundry-local-setup** | install, CLI, service start/stop, model download, port discovery | +| Chat completions (streaming, multi-turn) | **microsoft-foundry-local-chat** | chat, streaming, conversation history, OpenAI SDK | +| Retrieval-Augmented Generation | **microsoft-foundry-local-rag** | RAG, knowledge base, context injection, document grounding | +| Single & multi-agent workflows | **microsoft-foundry-local-agents** | agent, multi-agent, orchestration, Agent Framework | +| Audio transcription with Whisper | **microsoft-foundry-local-whisper** | whisper, transcribe, speech-to-text, audio | +| Compile custom Hugging Face models | **microsoft-foundry-local-custom-models** | custom model, ONNX, Model Builder, Hugging Face, quantize | +| Test & evaluate LLM output quality | **microsoft-foundry-local-evaluation** | evaluate, golden dataset, LLM judge, prompt comparison | + +## Quick Reference + +- **API key**: Always `"not-required"` +- **Base URL**: Dynamic port — use SDK to discover: `manager.get_endpoint()` +- **Supported languages**: Python, JavaScript (Node.js), C# (.NET 9) +- **Key SDKs**: `foundry-local-sdk` (Python/JS), `Microsoft.AI.Foundry.Local` (C#) + +## Common Starting Points + +> **Prefer the SDK over the CLI.** The SDK handles service lifecycle, port discovery, model download, and loading automatically. Use the CLI only for manual exploration or troubleshooting. + +### Connect with Python (recommended) +```python +from foundry_local import FoundryLocalManager + +manager = FoundryLocalManager("phi-4-mini") +client = manager.get_openai_client() +``` + +### Connect with JavaScript +```javascript +import { FoundryLocalManager } from "foundry-local-sdk"; + +const manager = await FoundryLocalManager.start("phi-4-mini"); +const client = manager.getOpenAIClient(); +``` + +### Connect with C# +```csharp +using Microsoft.AI.Foundry.Local; +using OpenAI; + +var manager = await FoundryLocalManager.StartServiceAsync(); +var client = new OpenAIClient(new("not-required"), + new() { Endpoint = manager.Endpoint }); +``` + +### CLI (for exploration and troubleshooting) +```bash +# Install +winget install Microsoft.FoundryLocal # Windows +brew install foundrylocal # macOS + +# Explore models +foundry model list +foundry model run phi-4-mini +``` + +## Rules + +1. **Prefer SDKs over CLI.** The SDK manages the full lifecycle (service start, model download, port discovery). Use CLI only for manual exploration or troubleshooting. +2. Always use the SDK for endpoint discovery — never hard-code ports. +3. Set `api_key` to `"not-required"` — Foundry Local doesn't use API keys. +4. Route to the specific sub-skill for detailed patterns and troubleshooting. +5. All code runs entirely on-device — no network calls to cloud APIs. + +## References + +- [Foundry Local documentation](https://learn.microsoft.com/azure/foundry-local/) +- [foundrylocal.ai](https://foundrylocal.ai) diff --git a/skills/microsoft-foundry-local/agents/SKILL.md b/skills/microsoft-foundry-local/agents/SKILL.md new file mode 100644 index 00000000..3f954754 --- /dev/null +++ b/skills/microsoft-foundry-local/agents/SKILL.md @@ -0,0 +1,285 @@ +--- +name: microsoft-foundry-local-agents +description: "Build AI agents and multi-agent workflows with Foundry Local. Covers single agents with personas, multi-agent sequential pipelines, feedback loops, the Microsoft Agent Framework, and conversation history management. WHEN: foundry agent, AI agent local, multi-agent, agent orchestration, feedback loop, agent persona, system instructions, sequential pipeline, researcher writer editor, on-device agent, agent framework, FoundryLocalClient, AsAIAgent." +license: MIT +metadata: + author: Microsoft + version: "1.0.0" +--- + +# Foundry Local Agents & Multi-Agent Workflows + +This skill provides patterns for building single agents and multi-agent workflows that run entirely on-device with Foundry Local. + +## Triggers + +Activate this skill when the user wants to: +- Create an AI agent with custom instructions and persona +- Build multi-agent pipelines (Researcher → Writer → Editor) +- Implement feedback loops between agents +- Use the Microsoft Agent Framework with Foundry Local +- Manage conversation history across agent interactions + +## Rules + +1. **Agents are stateless by default.** Multi-turn agents must explicitly maintain a `history` list. +2. **Use the Agent Framework when available** — it simplifies agent creation. Python uses `agent_framework_foundry_local`, C# uses `Microsoft.Agents.AI.OpenAI`. +3. **JavaScript has no high-level agent framework** — implement agents manually with OpenAI SDK + history management. +4. **Feedback loops need a retry limit** — prevent infinite loops with a max iteration count (typically 2-3). +5. For service setup, refer to **microsoft-foundry-local-setup** skill. + +--- + +## Single Agent — Using the Agent Framework + +### Python (Recommended — Agent Framework) + +```python +import asyncio +from agent_framework_foundry_local import FoundryLocalClient + +async def main(): + alias = "phi-4-mini" + + # FoundryLocalClient handles service start, model download, and loading + client = FoundryLocalClient(model_id=alias) + + # Create an agent with system instructions + agent = client.as_agent( + name="Joker", + instructions="You are good at telling jokes.", + ) + + # Non-streaming + result = await agent.run("Tell me a joke about a pirate.") + print(result) + + # Streaming + async for chunk in agent.run("Tell me a joke about a programmer.", stream=True): + if chunk.text: + print(chunk.text, end="", flush=True) + +asyncio.run(main()) +``` + +### C# (Recommended — Agent Framework) + +```csharp +using Microsoft.Agents.AI; + +// After setting up manager, model, and OpenAI client (see microsoft-foundry-local-setup)... +AIAgent joker = client + .GetChatClient(model.Id) + .AsAIAgent( + instructions: "You are good at telling jokes.", + name: "Joker" + ); + +// Non-streaming +var response = await joker.RunAsync("Tell me a joke about a pirate."); +Console.WriteLine(response); + +// Streaming +await foreach (var chunk in joker.RunStreamingAsync("Tell me another joke.")) +{ + Console.Write(chunk.Text); +} +``` + +### JavaScript (Manual — No Agent Framework) + +```javascript +class ChatAgent { + constructor(client, modelId, name, instructions) { + this.client = client; + this.modelId = modelId; + this.name = name; + this.history = [{ role: "system", content: instructions }]; + } + + async run(userMessage) { + this.history.push({ role: "user", content: userMessage }); + + const response = await this.client.chat.completions.create({ + model: this.modelId, + messages: this.history, + temperature: 0.7, + max_tokens: 1024, + }); + + const reply = response.choices[0].message.content; + this.history.push({ role: "assistant", content: reply }); + return reply; + } +} + +// Usage +const joker = new ChatAgent(client, modelInfo.id, "Joker", "You are good at telling jokes."); +const joke = await joker.run("Tell me a joke about a pirate."); +``` + +--- + +## Multi-Agent Pipeline — Sequential Workflow + +The canonical multi-agent pattern is a sequential pipeline where each agent's output feeds the next: + +``` +Topic → [Researcher] → Research Notes → [Writer] → Draft → [Editor] → Verdict +``` + +### Python + +```python +import asyncio +from agent_framework_foundry_local import FoundryLocalClient + +async def main(): + client = FoundryLocalClient(model_id="phi-4-mini") + + researcher = client.as_agent( + name="Researcher", + instructions=( + "You are a research assistant. When given a topic, provide a concise " + "collection of key facts as bullet points." + ), + ) + + writer = client.as_agent( + name="Writer", + instructions=( + "You are a skilled blog writer. Using the research notes provided, " + "write a short, engaging blog post (3-4 paragraphs)." + ), + ) + + editor = client.as_agent( + name="Editor", + instructions=( + "You are a senior editor. Review the blog post for clarity, grammar, " + "and factual consistency. Provide a verdict: ACCEPT or REVISE." + ), + ) + + topic = "The history of renewable energy" + + # Sequential pipeline + research = await researcher.run(f"Research this topic:\n{topic}") + draft = await writer.run(f"Write a blog post from these notes:\n\n{research}") + verdict = await editor.run( + f"Review this article.\n\nResearch notes:\n{research}\n\nArticle:\n{draft}" + ) + +asyncio.run(main()) +``` + +### C# + +```csharp +AIAgent researcher = chatClient.AsAIAgent( + name: "Researcher", + instructions: "You are a research assistant. Provide key facts as bullet points."); + +AIAgent writer = chatClient.AsAIAgent( + name: "Writer", + instructions: "You are a skilled blog writer. Write a short blog post."); + +AIAgent editor = chatClient.AsAIAgent( + name: "Editor", + instructions: "Review the blog post. Provide a verdict: ACCEPT or REVISE."); + +var topic = "The history of renewable energy"; + +var research = await researcher.RunAsync($"Research this topic:\n{topic}"); +var draft = await writer.RunAsync($"Write a blog post from these notes:\n\n{research}"); +var verdict = await editor.RunAsync( + $"Review this article.\n\nResearch notes:\n{research}\n\nArticle:\n{draft}"); +``` + +--- + +## Feedback Loop Pattern + +Add a feedback loop where the Editor can reject the draft and trigger a rewrite: + +```python +MAX_RETRIES = 2 + +for attempt in range(MAX_RETRIES + 1): + draft = await writer.run(f"Write a blog post from these notes:\n\n{research}") + + verdict = await editor.run( + f"Review this article.\n\nResearch:\n{research}\n\nArticle:\n{draft}" + ) + + if "ACCEPT" in verdict.upper(): + print("Article accepted!") + break + elif attempt < MAX_RETRIES: + print(f"Revising (attempt {attempt + 2})...") + research = await researcher.run( + f"The editor wants revisions:\n{verdict}\n\nOriginal topic:\n{topic}" + ) + else: + print("Max retries reached — publishing best effort.") +``` + +--- + +## Agent Design Best Practices + +| Practice | Rationale | +|----------|-----------| +| Give each agent a specific, focused persona | Broad instructions produce vague outputs | +| Include output format in instructions | "Organize as bullet points" or "Respond with ACCEPT or REVISE" | +| Pass context from previous agents explicitly | Agents don't share memory implicitly | +| Limit context passed between agents | Don't forward entire conversations — summarise | +| Set retry limits on feedback loops | Prevent infinite loops (2-3 retries is typical) | + +--- + +## Production Pattern — Shared Configuration + +For production apps (like the Zava Creative Writer), extract common configuration: + +### Python (FastAPI service) +```python +# foundry_config.py — shared across all agents +from foundry_local import FoundryLocalManager + +manager = FoundryLocalManager() +manager.start_service() + +ALIAS = "phi-4-mini" +manager.load_model(ALIAS) + +MODEL_ID = manager.get_model_info(ALIAS).id +ENDPOINT = manager.endpoint +API_KEY = manager.api_key +``` + +```python +# Each agent module imports the shared config +from foundry_config import MODEL_ID, ENDPOINT, API_KEY +``` + +--- + +## Key Packages + +| Language | Package | Purpose | +|----------|---------|---------| +| Python | `agent-framework-foundry-local` | High-level agent abstraction with streaming | +| C# | `Microsoft.Agents.AI.OpenAI` | `AsAIAgent()` extension method | +| JavaScript | — | No framework; use OpenAI SDK directly | + +--- + +## Cross-References + +- [Foundry Local documentation](https://learn.microsoft.com/azure/foundry-local/) +- [foundrylocal.ai](https://foundrylocal.ai) +- For service setup, see **microsoft-foundry-local-setup** +- For basic chat patterns, see **microsoft-foundry-local-chat** +- For grounding agents with local data, see **microsoft-foundry-local-rag** +- For testing agent quality, see **microsoft-foundry-local-evaluation** diff --git a/skills/microsoft-foundry-local/chat/SKILL.md b/skills/microsoft-foundry-local/chat/SKILL.md new file mode 100644 index 00000000..7b4eed9d --- /dev/null +++ b/skills/microsoft-foundry-local/chat/SKILL.md @@ -0,0 +1,231 @@ +--- +name: microsoft-foundry-local-chat +description: "Chat completion patterns with Foundry Local's OpenAI-compatible API. Covers streaming, multi-turn conversations, temperature tuning, and conversation history management. WHEN: foundry local chat, local LLM chat, streaming response, chat completion, conversation history, multi-turn, OpenAI SDK with foundry, api_key not-required, stream tokens, on-device chat, local inference, chat parameters." +license: MIT +metadata: + author: Microsoft + version: "1.0.0" +--- + +# Foundry Local Chat Completions + +This skill provides patterns for chat completions using Foundry Local's OpenAI-compatible API across Python, JavaScript, and C#. + +## Triggers + +Activate this skill when the user wants to: +- Create a chat completion with a local model +- Stream responses token by token +- Build multi-turn conversations with history +- Configure temperature, max_tokens, or other parameters +- Use the OpenAI SDK with Foundry Local + +## Rules + +1. **Always use `manager.endpoint`** for the base URL — never hardcode a port. +2. **API key is `"not-required"`** — Foundry Local does not authenticate. +3. **Use `model_info.id`** (the full hardware-specific ID) in API calls, not the alias. +4. **Streaming syntax differs across languages** — use the correct pattern for each. +5. For service setup, refer to **microsoft-foundry-local-setup** skill. + +--- + +## Single-Turn Chat (Non-Streaming) + +### Python +```python +response = client.chat.completions.create( + model=model_id, + messages=[{"role": "user", "content": "What is the golden ratio?"}], + temperature=0.7, + max_tokens=512, +) +print(response.choices[0].message.content) +``` + +### JavaScript +```javascript +const response = await client.chat.completions.create({ + model: modelInfo.id, + messages: [{ role: "user", content: "What is the golden ratio?" }], + temperature: 0.7, + max_tokens: 512, +}); +console.log(response.choices[0].message.content); +``` + +### C# +```csharp +var chatClient = client.GetChatClient(model.Id); +var result = await chatClient.CompleteChatAsync("What is the golden ratio?"); +Console.WriteLine(result.Value.Content[0].Text); +``` + +--- + +## Streaming Chat + +Streaming returns tokens as they are generated, providing a responsive user experience. + +### Python +```python +stream = client.chat.completions.create( + model=model_id, + messages=[{"role": "user", "content": "What is the golden ratio?"}], + stream=True, +) + +for chunk in stream: + if chunk.choices[0].delta.content is not None: + print(chunk.choices[0].delta.content, end="", flush=True) +print() +``` + +### JavaScript +```javascript +const stream = await client.chat.completions.create({ + model: modelInfo.id, + messages: [{ role: "user", content: "What is the golden ratio?" }], + stream: true, +}); + +for await (const chunk of stream) { + if (chunk.choices[0]?.delta?.content) { + process.stdout.write(chunk.choices[0].delta.content); + } +} +console.log(); +``` + +### C# +```csharp +var chatClient = client.GetChatClient(model.Id); +var updates = chatClient.CompleteChatStreaming("What is the golden ratio?"); + +foreach (var update in updates) +{ + if (update.ContentUpdate.Count > 0) + { + Console.Write(update.ContentUpdate[0].Text); + } +} +Console.WriteLine(); +``` + +--- + +## Multi-Turn Conversations + +Foundry Local is stateless — you must maintain conversation history yourself by appending each user message and assistant response. + +### Python +```python +history = [ + {"role": "system", "content": "You are a helpful assistant."}, +] + +def chat(user_message): + history.append({"role": "user", "content": user_message}) + + response = client.chat.completions.create( + model=model_id, + messages=history, + temperature=0.7, + max_tokens=512, + ) + + assistant_reply = response.choices[0].message.content + history.append({"role": "assistant", "content": assistant_reply}) + return assistant_reply +``` + +### JavaScript +```javascript +const history = [ + { role: "system", content: "You are a helpful assistant." }, +]; + +async function chat(userMessage) { + history.push({ role: "user", content: userMessage }); + + const response = await client.chat.completions.create({ + model: modelInfo.id, + messages: history, + temperature: 0.7, + max_tokens: 512, + }); + + const reply = response.choices[0].message.content; + history.push({ role: "assistant", content: reply }); + return reply; +} +``` + +### C# +```csharp +var messages = new List +{ + new SystemChatMessage("You are a helpful assistant."), +}; + +async Task ChatAsync(string userMessage) +{ + messages.Add(new UserChatMessage(userMessage)); + + var result = await chatClient.CompleteChatAsync(messages); + var reply = result.Value.Content[0].Text; + + messages.Add(new AssistantChatMessage(reply)); + return reply; +} +``` + +--- + +## Common Pitfalls + +| Mistake | Impact | Fix | +|---------|--------|-----| +| Forgetting to append assistant message to history | Model loses context each turn | Always `history.append({"role": "assistant", ...})` | +| Using alias instead of full model ID in API calls | May fail or select wrong variant | Use `manager.get_model_info(alias).id` | +| Hardcoding `http://localhost:5000/v1` | Fails when port changes | Use `manager.endpoint` | +| Setting `stream=True` but reading `.message.content` | Content is in `.delta.content` for streams | Check `chunk.choices[0].delta.content` | + +--- + +## REST API (cURL) + +You can also call the API directly. Get the port from `foundry service status`: + +```bash +curl http://localhost:/v1/chat/completions \ + -H "Content-Type: application/json" \ + -d '{ + "model": "", + "messages": [{"role": "user", "content": "Hello!"}], + "max_tokens": 100, + "temperature": 0.7 + }' +``` + +--- + +## Parameters Reference + +| Parameter | Default | Notes | +|-----------|---------|-------| +| `temperature` | 1.0 | Lower = more deterministic, higher = more creative | +| `max_tokens` | model-specific | Maximum tokens to generate | +| `top_p` | 1.0 | Nucleus sampling threshold | +| `stream` | `false` | Enable token-by-token streaming | +| `stop` | none | Stop sequences | + +--- + +## Cross-References + +- [Foundry Local documentation](https://learn.microsoft.com/azure/foundry-local/) +- [foundrylocal.ai](https://foundrylocal.ai) +- For service setup and model management, see **microsoft-foundry-local-setup** +- For grounding chat with local data, see **microsoft-foundry-local-rag** +- For agents with system instructions, see **microsoft-foundry-local-agents** diff --git a/skills/microsoft-foundry-local/custom-models/SKILL.md b/skills/microsoft-foundry-local/custom-models/SKILL.md new file mode 100644 index 00000000..1e5991e7 --- /dev/null +++ b/skills/microsoft-foundry-local/custom-models/SKILL.md @@ -0,0 +1,272 @@ +--- +name: microsoft-foundry-local-custom-models +description: "Compile and register custom Hugging Face models for Foundry Local. Covers ONNX Runtime GenAI Model Builder, quantisation, chat template generation, cache registration, and inference_model.json configuration. WHEN: custom model foundry, hugging face model, ONNX compile, model builder, quantize model, int4 quantisation, register custom model, onnxruntime-genai, bring your own model, compile model, ONNX conversion, custom ONNX model, foundry cache register." +license: MIT +metadata: + author: Microsoft + version: "1.0.0" +--- + +# Foundry Local Custom Models + +This skill provides the complete workflow for compiling Hugging Face models into the ONNX format that Foundry Local requires, configuring chat templates, and registering models in the local cache. + +## Triggers + +Activate this skill when the user wants to: +- Compile a Hugging Face model for Foundry Local +- Use the ONNX Runtime GenAI Model Builder +- Quantise a model (int4, int8, fp16, fp32) +- Create an `inference_model.json` chat template configuration +- Register a custom model in the Foundry Local cache +- Understand Model Builder vs Microsoft Olive trade-offs + +## Rules + +1. **Use ONNX Runtime GenAI Model Builder** — it produces the exact output format Foundry Local expects in a single command. +2. **Requires Python 3.10+** and a dedicated virtual environment (PyTorch, Transformers are large). +3. **The `inference_model.json` file is required** — it tells Foundry Local how to format prompts. +4. **The `Name` field in `inference_model.json` becomes the model alias** used in all API calls. +5. For service setup, refer to **microsoft-foundry-local-setup** skill. + +--- + +## End-to-End Workflow + +``` +1. Install pip install onnxruntime-genai +2. Compile python -m onnxruntime_genai.models.builder -m -o -p int4 -e cpu +3. Chat Template python generate_chat_template.py (creates inference_model.json) +4. Register foundry cache cd +5. Run foundry model run +``` + +--- + +## Step 1: Install the Model Builder + +```bash +pip install onnxruntime-genai +``` + +Verify: +```bash +python -m onnxruntime_genai.models.builder --help +``` + +--- + +## Step 2: Compile a Model + +### CPU (int4 quantisation) + +```bash +python -m onnxruntime_genai.models.builder \ + -m Qwen/Qwen3-0.6B \ + -o models/qwen3 \ + -p int4 \ + -e cpu \ + --extra_options hf_token=false +``` + +### NVIDIA GPU (fp16) + +```bash +python -m onnxruntime_genai.models.builder \ + -m Qwen/Qwen3-0.6B \ + -o models/qwen3-gpu \ + -p fp16 \ + -e cuda \ + --extra_options hf_token=false +``` + +### Parameters + +| Parameter | Purpose | Common Values | +|-----------|---------|---------------| +| `-m` | Hugging Face model ID or local path | `Qwen/Qwen3-0.6B`, `microsoft/Phi-3.5-mini-instruct` | +| `-o` | Output directory | `models/qwen3` | +| `-p` | Quantisation precision | `int4`, `int8`, `fp16`, `fp32` | +| `-e` | Execution provider (target hardware) | `cpu`, `cuda`, `dml`, `NvTensorRtRtx`, `webgpu` | +| `--extra_options` | Additional options | `hf_token=false` (skip auth for public models) | + +### Precision Trade-offs + +| Precision | Size | Speed | Quality | Best For | +|-----------|------|-------|---------|----------| +| `int4` | Smallest | Fastest | Moderate loss | CPU development, low-RAM devices | +| `int8` | Small | Fast | Slight loss | Balanced trade-off | +| `fp16` | Large | Fast (GPU) | Very good | GPU inference | +| `fp32` | Largest | Slowest | Highest | Maximum quality | + +### Hardware Targets + +| Hardware | `-e` value | Recommended `-p` | +|----------|-----------|-------------------| +| CPU | `cpu` | `int4` | +| NVIDIA GPU | `cuda` | `fp16` or `int4` | +| Windows GPU (DirectML) | `dml` | `fp16` or `int4` | +| NVIDIA TensorRT RTX | `NvTensorRtRtx` | `fp16` | +| WebGPU | `webgpu` | `int4` | + +--- + +## Step 3: Create inference_model.json + +The `inference_model.json` tells Foundry Local how to format prompts. Generate it from the model's tokeniser: + +```python +"""Generate an inference_model.json chat template for Foundry Local.""" + +import json +from transformers import AutoTokenizer + +MODEL_PATH = "models/qwen3" + +tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH) + +messages = [ + {"role": "system", "content": "{Content}"}, + {"role": "user", "content": "{Content}"}, +] + +prompt_template = tokenizer.apply_chat_template( + messages, + tokenize=False, + add_generation_prompt=True, + enable_thinking=False, +) + +inference_model = { + "Name": "qwen3-0.6b", # This becomes the model alias + "PromptTemplate": { + "assistant": "{Content}", + "prompt": prompt_template, + }, +} + +output_path = f"{MODEL_PATH}/inference_model.json" +with open(output_path, "w", encoding="utf-8") as f: + json.dump(inference_model, f, indent=2, ensure_ascii=False) + +print(f"Chat template written to {output_path}") +``` + +> **Important:** The `"Name"` field becomes the model alias used in all subsequent API calls and CLI commands. + +--- + +## Step 4: Register in Foundry Local Cache + +```bash +foundry cache cd models/qwen3 +``` + +Verify: +```bash +foundry cache ls +``` + +--- + +## Step 5: Run the Model + +### CLI +```bash +foundry model run qwen3-0.6b --verbose +``` + +### REST API +```bash +curl -X POST http://localhost:/v1/chat/completions \ + -H "Content-Type: application/json" \ + -d '{"model": "qwen3-0.6b", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 100}' +``` + +### OpenAI SDK (Python) +```python +from openai import OpenAI + +client = OpenAI(base_url="http://localhost:/v1", api_key="not-required") + +response = client.chat.completions.create( + model="qwen3-0.6b", + messages=[{"role": "user", "content": "What is the golden ratio?"}], + max_tokens=200, +) +print(response.choices[0].message.content) +``` + +### Foundry Local SDK (Python) +```python +from foundry_local import FoundryLocalManager +import openai + +manager = FoundryLocalManager() +manager.start_service() +manager.load_model("qwen3-0.6b") + +client = openai.OpenAI(base_url=manager.endpoint, api_key=manager.api_key) +response = client.chat.completions.create( + model="qwen3-0.6b", + messages=[{"role": "user", "content": "Hello!"}], +) +print(response.choices[0].message.content) +``` + +--- + +## Expected Output Directory + +After compilation and chat template generation: + +``` +models/qwen3/ + model.onnx + model.onnx.data + genai_config.json (auto-generated by model builder) + chat_template.jinja (auto-generated by model builder) + inference_model.json (you create this) + tokenizer.json + tokenizer_config.json + vocab.json + merges.txt + special_tokens_map.json + added_tokens.json +``` + +--- + +## Model Builder vs Microsoft Olive + +| | **Model Builder** | **Olive** | +|---|---|---| +| **Package** | `onnxruntime-genai` | `olive-ai` | +| **Ease of use** | Single command | Multi-step workflow with YAML config | +| **Best for** | Quick compilation for Foundry Local | Production pipelines with fine-grained control | +| **Foundry Local compat** | Direct — output is immediately compatible | Requires `--use_ort_genai` flag | +| **Hardware scope** | CPU, CUDA, DirectML, TensorRT, WebGPU | All of above + Qualcomm QNN | + +> **Recommendation:** Use the Model Builder for compiling individual models for Foundry Local. Use Olive when you need advanced optimisation (accuracy-aware quantisation, graph surgery, multi-pass tuning). + +--- + +## Troubleshooting + +| Issue | Fix | +|-------|-----| +| Model fails to load after registration | Verify `inference_model.json` exists and is valid JSON | +| `` tags in output | Normal for reasoning models (Qwen3). Adjust prompt template to suppress | +| `hf_token` error | Add `--extra_options hf_token=false` for public models | +| Out of memory during compilation | Use a smaller model or `int4` precision | +| Compilation very slow | Expected — 5-15 min for small models, longer for large ones | + +--- + +## Cross-References + +- [Foundry Local documentation](https://learn.microsoft.com/azure/foundry-local/) +- [foundrylocal.ai](https://foundrylocal.ai) +- For service setup, see **microsoft-foundry-local-setup** +- For chat completions with compiled models, see **microsoft-foundry-local-chat** +- For testing model quality, see **microsoft-foundry-local-evaluation** diff --git a/skills/microsoft-foundry-local/evaluation/SKILL.md b/skills/microsoft-foundry-local/evaluation/SKILL.md new file mode 100644 index 00000000..547b7da7 --- /dev/null +++ b/skills/microsoft-foundry-local/evaluation/SKILL.md @@ -0,0 +1,347 @@ +--- +name: microsoft-foundry-local-evaluation +description: "Test and evaluate LLM output quality with Foundry Local. Covers golden datasets, rule-based scoring, LLM-as-judge patterns, side-by-side prompt comparison, and handling service crashes under sustained load. WHEN: evaluate LLM, golden dataset, LLM as judge, prompt comparison, test AI output, eval framework, benchmark local model, quality scoring, evaluate agent, prompt engineering, A/B test prompts, regression testing." +license: MIT +metadata: + author: Microsoft + version: "1.0.0" +--- + +# Foundry Local Evaluation Framework + +This skill provides patterns for systematically testing and evaluating LLM output quality using Foundry Local — entirely on-device. + +## Triggers + +Activate this skill when the user wants to: +- Create a golden dataset for testing AI responses +- Implement rule-based checks (keyword coverage, length, forbidden terms) +- Use LLM-as-judge scoring with a rubric +- Compare prompt variants side by side +- Build a regression testing framework for prompts +- Systematically test agent quality + +## Rules + +1. **Use golden datasets** — define expected outputs before testing, not after. +2. **Combine rule-based and LLM-based scoring** — rules catch obvious issues, LLM judges catch nuance. +3. **Handle HTTP 500 under sustained load** — the service may crash after ~13-15 completions; add try/catch with fallback. +4. **Lower temperature for evaluation** — use 0.1 for LLM-as-judge to get consistent scoring. +5. For service setup, refer to **microsoft-foundry-local-setup** skill. + +--- + +## Architecture + +``` +Golden Dataset Prompt Variants Scoring +┌──────────────┐ ┌───────────────┐ ┌──────────────┐ +│ Test cases │──────►│ Agent run │──────►│ Rule-based │ +│ with expected│ │ with variant │ │ + LLM │ +│ keywords │ │ system prompt │ │ judge │ +└──────────────┘ └───────────────┘ └──────────────┘ + │ + ┌──────▼──────┐ + │ Comparison │ + │ Report │ + └─────────────┘ +``` + +--- + +## Step 1: Define a Golden Dataset + +Each test case includes an input, expected keywords, and a category: + +### Python +```python +GOLDEN_DATASET = [ + { + "input": "What tools do I need to build a wooden deck?", + "expected": ["saw", "drill", "screws", "level", "tape measure"], + "category": "product-recommendation", + }, + { + "input": "How do I fix a leaky kitchen faucet?", + "expected": ["wrench", "washer", "plumber", "valve", "seal"], + "category": "repair-guidance", + }, + { + "input": "How do I safely use a circular saw?", + "expected": ["safety", "glasses", "guard", "clamp", "blade"], + "category": "safety-advice", + }, +] +``` + +### JavaScript +```javascript +const GOLDEN_DATASET = [ + { + input: "What tools do I need to build a wooden deck?", + expected: ["saw", "drill", "screws", "level", "tape measure"], + category: "product-recommendation", + }, + { + input: "How do I fix a leaky kitchen faucet?", + expected: ["wrench", "washer", "plumber", "valve", "seal"], + category: "repair-guidance", + }, +]; +``` + +--- + +## Step 2: Define Prompt Variants + +Compare different system prompts to find the most effective one: + +```python +PROMPT_VARIANTS = { + "baseline": ( + "You are a helpful assistant. Answer the user's question clearly." + ), + "specialised": ( + "You are a DIY expert. Recommend specific tools and materials, " + "provide step-by-step guidance, and include safety tips." + ), +} +``` + +--- + +## Step 3: Rule-Based Scoring + +Deterministic checks that don't require an LLM call: + +### Python +```python +FORBIDDEN_TERMS = ["home depot", "lowes", "amazon"] + +def score_rules(response, expected_keywords): + words = response.lower().split() + word_count = len(words) + response_lower = response.lower() + + # Length check: 50-500 words + length_score = 1.0 if 50 <= word_count <= 500 else 0.0 + + # Keyword coverage + found = [kw for kw in expected_keywords if kw.lower() in response_lower] + keyword_score = len(found) / len(expected_keywords) if expected_keywords else 1.0 + + # Forbidden terms + forbidden_found = [t for t in FORBIDDEN_TERMS if t in response_lower] + forbidden_score = 0.0 if forbidden_found else 1.0 + + combined = (length_score + keyword_score + forbidden_score) / 3.0 + + return { + "length_score": length_score, + "keyword_score": keyword_score, + "keywords_found": found, + "keywords_missing": [kw for kw in expected_keywords if kw.lower() not in response_lower], + "forbidden_score": forbidden_score, + "combined": round(combined, 2), + } +``` + +### JavaScript +```javascript +const FORBIDDEN_TERMS = ["home depot", "lowes", "amazon"]; + +function scoreRules(response, expectedKeywords) { + const words = response.toLowerCase().split(/\s+/); + const responseLower = response.toLowerCase(); + + const lengthScore = words.length >= 50 && words.length <= 500 ? 1.0 : 0.0; + + const found = expectedKeywords.filter(kw => responseLower.includes(kw.toLowerCase())); + const keywordScore = expectedKeywords.length > 0 + ? found.length / expectedKeywords.length + : 1.0; + + const forbiddenFound = FORBIDDEN_TERMS.filter(t => responseLower.includes(t)); + const forbiddenScore = forbiddenFound.length === 0 ? 1.0 : 0.0; + + return { + lengthScore, + keywordScore, + keywordsFound: found, + forbiddenScore, + combined: Math.round(((lengthScore + keywordScore + forbiddenScore) / 3.0) * 100) / 100, + }; +} +``` + +--- + +## Step 4: LLM-as-Judge Scoring + +Use the same local model to grade response quality: + +### Python +```python +import json +import re + +JUDGE_SYSTEM_PROMPT = """\ +You are an impartial quality evaluator. Rate the following response on a scale of 1-5. + +Rubric: +- 1: Completely wrong or irrelevant +- 2: Partially correct but missing key information +- 3: Adequate but could be improved significantly +- 4: Good response with only minor issues +- 5: Excellent, comprehensive, well-structured response + +Respond ONLY with valid JSON (no code fences): +{"score": <1-5>, "reasoning": ""} +""" + +def llm_judge(client, model_id, question, response): + try: + result = client.chat.completions.create( + model=model_id, + messages=[ + {"role": "system", "content": JUDGE_SYSTEM_PROMPT}, + { + "role": "user", + "content": f"Question: {question}\n\nResponse to evaluate:\n{response}", + }, + ], + temperature=0.1, # Low temperature for consistent scoring + max_tokens=256, + ) + + raw = result.choices[0].message.content.strip() + raw = raw.removeprefix("```json").removeprefix("```").removesuffix("```").strip() + + parsed = json.loads(raw) + score = max(1, min(5, int(parsed.get("score", 3)))) + return {"score": score, "reasoning": parsed.get("reasoning", "")} + except Exception: + # Fallback: extract a number or default to 3 + numbers = re.findall(r"\b([1-5])\b", raw if 'raw' in dir() else "") + return {"score": int(numbers[0]) if numbers else 3, "reasoning": "Fallback score"} +``` + +--- + +## Step 5: Run Evaluation Pipeline + +### Python +```python +def run_agent(client, model_id, system_prompt, user_input): + result = client.chat.completions.create( + model=model_id, + messages=[ + {"role": "system", "content": system_prompt}, + {"role": "user", "content": user_input}, + ], + temperature=0.7, + max_tokens=512, + ) + return result.choices[0].message.content.strip() + + +# Run evaluation for each prompt variant +results = {} + +for variant_name, system_prompt in PROMPT_VARIANTS.items(): + variant_results = [] + + for test_case in GOLDEN_DATASET: + # Get agent response + response = run_agent(client, model_id, system_prompt, test_case["input"]) + + # Score with rules + rule_scores = score_rules(response, test_case["expected"]) + + # Score with LLM judge + judge_result = llm_judge(client, model_id, test_case["input"], response) + + variant_results.append({ + "input": test_case["input"], + "category": test_case["category"], + "rule_score": rule_scores["combined"], + "judge_score": judge_result["score"], + }) + + results[variant_name] = variant_results + +# Compare variants +for name, scores in results.items(): + avg_rule = sum(r["rule_score"] for r in scores) / len(scores) + avg_judge = sum(r["judge_score"] for r in scores) / len(scores) + print(f"{name}: Rule={avg_rule:.2f}, Judge={avg_judge:.1f}/5") +``` + +--- + +## Handling Service Crashes Under Sustained Load + +The Foundry Local service may return HTTP 500 after ~13-15 sequential completions. Add retry logic: + +```python +import time + +def safe_completion(client, model_id, messages, max_retries=2): + for attempt in range(max_retries + 1): + try: + result = client.chat.completions.create( + model=model_id, + messages=messages, + temperature=0.7, + max_tokens=512, + ) + return result.choices[0].message.content.strip() + except Exception as e: + if attempt < max_retries: + print(f"Retry {attempt + 1} after error: {e}") + time.sleep(2) + else: + raise +``` + +If retries don't help, restart the service: +```bash +foundry service stop +foundry service start +``` + +--- + +## Evaluation Design Guidelines + +| Guideline | Rationale | +|-----------|-----------| +| Write golden dataset before prompts | Prevents confirmation bias | +| Use 5+ test cases per category | Statistical significance | +| Combine rule + LLM scoring | Rules catch format issues; LLM catches content quality | +| Use `temperature: 0.1` for judge | Consistent scoring across runs | +| Include forbidden terms check | Catches hallucinated brand names or competitors | +| Test after every prompt change | Regression testing for prompt engineering | + +--- + +## Common Pitfalls + +| Mistake | Impact | Fix | +|---------|--------|-----| +| No try/catch around LLM judge | Pipeline crashes on HTTP 500 | Add fallback score (default 3) | +| High temperature for judge | Inconsistent scores | Use 0.1 | +| Too few test cases | Results not statistically meaningful | Use 5+ per category | +| Only using LLM judge (no rules) | Missing obvious format failures | Combine both approaches | +| Evaluating only one prompt variant | No comparison baseline | Always test at least 2 variants | + +--- + +## Cross-References + +- [Foundry Local documentation](https://learn.microsoft.com/azure/foundry-local/) +- [foundrylocal.ai](https://foundrylocal.ai) +- For service setup, see **microsoft-foundry-local-setup** +- For agents to evaluate, see **microsoft-foundry-local-agents** +- For RAG pipelines to evaluate, see **microsoft-foundry-local-rag** +- For chat patterns, see **microsoft-foundry-local-chat** diff --git a/skills/microsoft-foundry-local/rag/SKILL.md b/skills/microsoft-foundry-local/rag/SKILL.md new file mode 100644 index 00000000..71947fa9 --- /dev/null +++ b/skills/microsoft-foundry-local/rag/SKILL.md @@ -0,0 +1,244 @@ +--- +name: microsoft-foundry-local-rag +description: "Build Retrieval-Augmented Generation (RAG) pipelines with Foundry Local. Covers knowledge base design, retrieval strategies, context injection, and prompt templates — all running on-device with no cloud dependencies. WHEN: RAG pipeline, retrieval augmented generation, ground answers in data, knowledge base, local search, context injection, foundry local RAG, on-device RAG, document grounding, chunk retrieval." +license: MIT +metadata: + author: Microsoft + version: "1.0.0" +--- + +# Foundry Local RAG Pipelines + +This skill provides patterns for building Retrieval-Augmented Generation (RAG) pipelines that run entirely on-device with Foundry Local — no cloud, vector database, or embeddings API required. + +## Triggers + +Activate this skill when the user wants to: +- Build a RAG pipeline with Foundry Local +- Ground LLM answers in local data or documents +- Create a local knowledge base +- Implement retrieval (keyword or semantic) for prompt augmentation +- Design system prompts that inject retrieved context + +## Rules + +1. RAG = **Retrieve** relevant context + **Augment** the prompt + **Generate** a grounded answer. +2. Start with keyword-overlap retrieval (zero dependencies) before suggesting vector search. +3. Always instruct the model to use only the provided context — prevents hallucination. +4. Keep retrieved chunks concise — local models have limited context windows (typically 4K–16K tokens). +5. For service setup, refer to **microsoft-foundry-local-setup** skill. + +--- + +## Architecture + +``` +User Question + │ + ▼ +┌─────────────┐ ┌──────────────┐ ┌──────────────┐ +│ Retrieve │────►│ Augment │────►│ Generate │ +│ (search │ │ (build │ │ (LLM call │ +│ knowledge │ │ prompt │ │ with │ +│ base) │ │ with │ │ context) │ +│ │ │ context) │ │ │ +└──────────────┘ └──────────────┘ └──────────────┘ +``` + +--- + +## Step 1: Define a Knowledge Base + +Structure your data as a list of chunks, each with a title and content: + +### Python +```python +KNOWLEDGE_BASE = [ + { + "title": "Foundry Local Overview", + "content": ( + "Foundry Local brings the power of Azure AI Foundry to your local " + "device without requiring an Azure subscription..." + ), + }, + { + "title": "Supported Hardware", + "content": ( + "Foundry Local automatically selects the best model variant for " + "your hardware. NVIDIA CUDA, Qualcomm NPU, or CPU..." + ), + }, +] +``` + +### JavaScript +```javascript +const KNOWLEDGE_BASE = [ + { + title: "Foundry Local Overview", + content: "Foundry Local brings the power of Azure AI Foundry...", + }, + { + title: "Supported Hardware", + content: "Foundry Local automatically selects the best model variant...", + }, +]; +``` + +--- + +## Step 2: Implement Retrieval + +### Keyword Overlap (Simple — No Dependencies) + +Scores chunks by word overlap with the query. Good for getting started: + +#### Python +```python +def retrieve(query, knowledge_base, top_k=2): + query_words = set(query.lower().split()) + scored = [] + for chunk in knowledge_base: + chunk_words = set(chunk["content"].lower().split()) + overlap = len(query_words & chunk_words) + scored.append((overlap, chunk)) + scored.sort(key=lambda x: x[0], reverse=True) + return [item[1] for item in scored[:top_k]] +``` + +#### JavaScript +```javascript +function retrieve(query, knowledgeBase, topK = 2) { + const queryWords = new Set(query.toLowerCase().split(/\s+/)); + return knowledgeBase + .map(chunk => { + const chunkWords = new Set(chunk.content.toLowerCase().split(/\s+/)); + const overlap = [...queryWords].filter(w => chunkWords.has(w)).length; + return { overlap, chunk }; + }) + .sort((a, b) => b.overlap - a.overlap) + .slice(0, topK) + .map(item => item.chunk); +} +``` + +### When to Upgrade + +| Approach | Dependencies | Best For | +|----------|-------------|----------| +| Keyword overlap | None | Prototyping, small knowledge bases (<100 chunks) | +| TF-IDF | `scikit-learn` | Medium knowledge bases, better relevance | +| Embedding similarity | Embedding model + numpy | Large knowledge bases, semantic matching | + +--- + +## Step 3: Augment the Prompt + +Build a system prompt that injects the retrieved context and instructs the model to use only that information: + +### Python +```python +def build_rag_prompt(question, retrieved_chunks): + context = "\n".join( + f"- {chunk['title']}: {chunk['content']}" for chunk in retrieved_chunks + ) + system_prompt = ( + "You are a helpful assistant. Answer the user's question using " + "ONLY the context provided below. If the context does not contain " + "enough information, say so.\n\n" + f"Context:\n{context}" + ) + return [ + {"role": "system", "content": system_prompt}, + {"role": "user", "content": question}, + ] +``` + +### JavaScript +```javascript +function buildRagPrompt(question, retrievedChunks) { + const context = retrievedChunks + .map(c => `- ${c.title}: ${c.content}`) + .join("\n"); + + return [ + { + role: "system", + content: + "You are a helpful assistant. Answer the user's question using " + + "ONLY the context provided below. If the context does not contain " + + "enough information, say so.\n\n" + + `Context:\n${context}`, + }, + { role: "user", content: question }, + ]; +} +``` + +--- + +## Step 4: Generate the Answer + +### Python +```python +question = "What hardware does Foundry Local support?" +chunks = retrieve(question, KNOWLEDGE_BASE, top_k=2) +messages = build_rag_prompt(question, chunks) + +response = client.chat.completions.create( + model=model_id, + messages=messages, + temperature=0.3, # Lower temperature for factual answers + max_tokens=512, +) +print(response.choices[0].message.content) +``` + +### JavaScript +```javascript +const question = "What hardware does Foundry Local support?"; +const chunks = retrieve(question, KNOWLEDGE_BASE, 2); +const messages = buildRagPrompt(question, chunks); + +const response = await client.chat.completions.create({ + model: modelInfo.id, + messages, + temperature: 0.3, + max_tokens: 512, +}); +console.log(response.choices[0].message.content); +``` + +--- + +## Design Guidelines + +| Guideline | Rationale | +|-----------|-----------| +| Use `temperature: 0.3` or lower | RAG answers should be factual, not creative | +| Limit to 2-3 retrieved chunks | Local models have limited context windows | +| Include "say so if context is insufficient" | Prevents hallucination when data is missing | +| Chunk content to 100-300 words each | Too long = context overflow; too short = missing info | +| Include source titles in context | Helps the model attribute information | + +--- + +## Common Pitfalls + +| Mistake | Impact | Fix | +|---------|--------|-----| +| No "use only this context" instruction | Model hallucinates beyond provided data | Add explicit grounding instruction in system prompt | +| Retrieving too many chunks | Exceeds context window, degrades quality | Limit `top_k` to 2-3 for small models | +| High temperature (>0.7) for RAG | Generates creative but inaccurate answers | Use 0.1-0.3 for factual grounding | +| Not chunking documents | Entire documents overwhelm context | Split into focused 100-300 word chunks | + +--- + +## Cross-References + +- [Foundry Local documentation](https://learn.microsoft.com/azure/foundry-local/) +- [foundrylocal.ai](https://foundrylocal.ai) +- For service setup, see **microsoft-foundry-local-setup** +- For basic chat patterns, see **microsoft-foundry-local-chat** +- For agents with persistent instructions, see **microsoft-foundry-local-agents** +- For testing RAG quality systematically, see **microsoft-foundry-local-evaluation** diff --git a/skills/microsoft-foundry-local/setup/SKILL.md b/skills/microsoft-foundry-local/setup/SKILL.md new file mode 100644 index 00000000..f420240e --- /dev/null +++ b/skills/microsoft-foundry-local/setup/SKILL.md @@ -0,0 +1,219 @@ +--- +name: microsoft-foundry-local-setup +description: "Install, configure, and manage Foundry Local — the on-device AI runtime. Covers CLI installation, service lifecycle, model management, port discovery, and troubleshooting. WHEN: install foundry local, start foundry service, download model, list models, foundry CLI, model not loading, service not starting, port discovery, foundry status, foundry local setup, model alias, cache location, hardware detection, service restart, dynamic port." +license: MIT +metadata: + author: Microsoft + version: "1.0.0" +--- + +# Foundry Local Setup & Service Management + +This skill provides guidance for installing, configuring, and managing the Foundry Local on-device AI runtime. + +> **What is Foundry Local?** A lightweight runtime that downloads, manages, and serves language models entirely on your hardware. It exposes an **OpenAI-compatible API** — no cloud account or API keys required. See the [Foundry Local documentation](https://learn.microsoft.com/azure/foundry-local/) and [foundrylocal.ai](https://foundrylocal.ai). +> +> **Prefer the SDK over the CLI.** The SDK handles service lifecycle, port discovery, model download, and loading in a single programmatic flow. Use the CLI for manual exploration or troubleshooting. + +## Triggers + +Activate this skill when the user wants to: +- Install Foundry Local CLI +- Start, stop, or restart the Foundry Local service +- Download, list, or manage models +- Discover the dynamic endpoint / port +- Understand model aliases vs hardware-specific IDs +- Troubleshoot service startup, model loading, or cache issues +- Set up a new project with Foundry Local SDK + +## Rules + +1. **Never hardcode ports.** The Foundry Local service uses a dynamic port — always use `manager.endpoint` (Python/JS) or `manager.Urls[0]` (C#). +2. **Cached ≠ loaded.** A model can be cached on disk but not loaded into memory. Always call `load_model()` / `loadModel()` / `LoadAsync()` after confirming the model is cached. +3. **Use aliases, not full IDs.** Aliases like `phi-3.5-mini` auto-select the best hardware variant (CUDA, QNN, CPU). Full IDs are hardware-specific. +4. **API key is always `"not-required"`.** Foundry Local does not authenticate — set `api_key="not-required"` or `"foundry-local"`. + +--- + +## CLI Installation + +### Windows +```powershell +winget install Microsoft.FoundryLocal +``` + +### macOS +```bash +brew tap microsoft/foundrylocal && brew install foundrylocal +``` + +### Verify +```bash +foundry --version +foundry service status +``` + +--- + +## CLI Quick Reference + +| Command | Purpose | +|---------|---------| +| `foundry model list` | List all available models in the catalog | +| `foundry model list --source cache` | List only downloaded (cached) models | +| `foundry model run ` | Download (if needed), load, and start interactive chat | +| `foundry service status` | Check if the service is running | +| `foundry service stop` | Stop the service | +| `foundry cache register --model-path --alias ` | Register a custom compiled model | + +--- + +## SDK Lifecycle — 7-Step Pattern (All Languages) + +Every Foundry Local application follows the same architecture: + +1. **Create manager** — no parameters required +2. **Start service** — spawns the inference server on a dynamic port +3. **Query catalog** — list available models +4. **Check cache** — distinguish "available" from "downloaded" +5. **Download if needed** — with progress callbacks +6. **Load into memory** — required before inference; resolves full model ID +7. **Create OpenAI client** — use `manager.endpoint` + dummy API key + +### Python + +```python +from foundry_local import FoundryLocalManager +import openai + +alias = "phi-3.5-mini" + +manager = FoundryLocalManager() +manager.start_service() + +# Check cache and download if needed +cached = manager.list_cached_models() +catalog_info = manager.get_model_info(alias) +is_cached = any(m.id == catalog_info.id for m in cached) if catalog_info else False + +if not is_cached: + manager.download_model(alias, progress_callback=lambda p: print(f"{p:.0f}%")) + +manager.load_model(alias) + +client = openai.OpenAI( + base_url=manager.endpoint, + api_key=manager.api_key # "not-required" +) +``` + +### JavaScript + +```javascript +import { FoundryLocalManager } from "foundry-local-sdk"; +import { OpenAI } from "openai"; + +const alias = "phi-3.5-mini"; +const manager = new FoundryLocalManager(); +await manager.startService(); + +const cached = await manager.listCachedModels(); +const catalogInfo = await manager.getModelInfo(alias); +const isCached = cached.some(m => m.id === catalogInfo?.id); + +if (!isCached) { + await manager.downloadModel(alias, undefined, false, p => console.log(`${p}%`)); +} + +const modelInfo = await manager.loadModel(alias); + +const client = new OpenAI({ + baseURL: manager.endpoint, + apiKey: manager.apiKey, +}); +``` + +### C# + +```csharp +using Microsoft.AI.Foundry.Local; +using Microsoft.Extensions.Logging.Abstractions; +using OpenAI; +using System.ClientModel; + +var alias = "phi-3.5-mini"; + +await FoundryLocalManager.CreateAsync( + new Configuration + { + AppName = "MyApp", + Web = new Configuration.WebService { Urls = "http://127.0.0.1:0" } + }, NullLogger.Instance, default); + +var manager = FoundryLocalManager.Instance; +await manager.StartWebServiceAsync(default); + +var catalog = await manager.GetCatalogAsync(default); +var model = await catalog.GetModelAsync(alias, default); + +if (!await model.IsCachedAsync(default)) + await model.DownloadAsync(null, default); + +await model.LoadAsync(default); + +var client = new OpenAIClient( + new ApiKeyCredential("foundry-local"), + new OpenAIClientOptions { Endpoint = new Uri(manager.Urls[0] + "/v1") }); +``` + +> **C# Note:** The `Microsoft.AI.Foundry.Local` NuGet package requires an explicit `` in your `.csproj` (e.g., `win-x64`, `win-arm64`). + +--- + +## Hardware Auto-Detection + +When you use an alias like `phi-3.5-mini`, the SDK automatically selects the best variant: + +| Hardware | Execution Provider | Selected Automatically | +|----------|-------------------|----------------------| +| NVIDIA GPU | CUDA | Yes | +| Qualcomm NPU | QNN | Yes (if available) | +| CPU (default) | CPU | Yes (fallback) | + +Developers do not need to pick variants — hardware detection is transparent. + +--- + +## Troubleshooting + +| Symptom | Cause | Fix | +|---------|-------|-----| +| Service won't start | Port conflict or stale process | `foundry service stop` then retry | +| Model not found | Alias typo or outdated catalog | Run `foundry model list` to see valid aliases | +| `IsCachedAsync` NullReferenceException | Race condition on first run (C#) | Retry after delay; SDK may not be fully ready | +| HTTP 500 under sustained load | Resource exhaustion after ~13-15 completions | `foundry service stop` then restart; add try/catch with fallback | +| OGA memory leak warnings on exit | SDK doesn't expose Dispose for native resources | Non-blocking; can be ignored | +| Snapdragon CPU warnings | cpuinfo library doesn't recognise Oryon cores | Cosmetic only; inference works correctly | +| C# build fails with NETSDK1047 | Missing `` in `.csproj` | Add `win-x64` | + +--- + +## Key SDK Properties + +| Property | Python | JavaScript | C# | +|----------|--------|------------|-----| +| Endpoint | `manager.endpoint` | `manager.endpoint` | `manager.Urls[0] + "/v1"` | +| API key | `manager.api_key` | `manager.apiKey` | `"foundry-local"` (any string) | +| Model ID | `manager.get_model_info(alias).id` | `modelInfo.id` | `model.Id` | +| Cache path | `manager.get_cache_location()` | — | — | + +--- + +## Cross-References + +- [Foundry Local documentation](https://learn.microsoft.com/azure/foundry-local/) +- [foundrylocal.ai](https://foundrylocal.ai) +- For chat completion patterns, see **microsoft-foundry-local-chat** +- For RAG pipelines, see **microsoft-foundry-local-rag** +- For agent creation, see **microsoft-foundry-local-agents** +- For custom model compilation, see **microsoft-foundry-local-custom-models** diff --git a/skills/microsoft-foundry-local/whisper/SKILL.md b/skills/microsoft-foundry-local/whisper/SKILL.md new file mode 100644 index 00000000..10d6b09e --- /dev/null +++ b/skills/microsoft-foundry-local/whisper/SKILL.md @@ -0,0 +1,224 @@ +--- +name: microsoft-foundry-local-whisper +description: "Transcribe audio with Whisper running on-device via Foundry Local. Covers model download (SDK only), ONNX encoder/decoder pipeline, feature extraction, audio format requirements, and language-specific APIs. WHEN: whisper transcription, speech to text local, audio transcription, foundry whisper, on-device transcription, WAV transcription, voice to text, transcribe audio, whisper model, speech recognition local." +license: MIT +metadata: + author: Microsoft + version: "1.0.0" +--- + +# Foundry Local Whisper Transcription + +This skill provides patterns for transcribing audio files using the OpenAI Whisper model running entirely on-device through Foundry Local. + +## Triggers + +Activate this skill when the user wants to: +- Transcribe audio files (WAV) using Whisper locally +- Set up speech-to-text with no cloud dependencies +- Download and configure the Whisper model via Foundry Local +- Build an ONNX encoder/decoder transcription pipeline +- Process audio files for local transcription + +## Rules + +1. **Whisper must be downloaded via the SDK** — the CLI does not support Whisper model download. +2. **Audio must be 16kHz mono WAV** — resample before processing. +3. **Python/JS use a manual ONNX pipeline** (encoder → decoder with KV cache). +4. **C# has a high-level API** — `GetAudioClient().TranscribeAudioAsync()`. +5. For service setup, refer to **microsoft-foundry-local-setup** skill. + +--- + +## Model Download + +The Whisper model **must** be downloaded using the Foundry Local SDK, not the CLI: + +### Python +```python +from foundry_local import FoundryLocalManager + +manager = FoundryLocalManager("whisper-medium") +model_info = manager.get_model_info("whisper-medium") +cache_location = manager.get_cache_location() +``` + +### JavaScript +```javascript +import { FoundryLocalManager } from "foundry-local-sdk"; + +const manager = new FoundryLocalManager(); +await manager.startService(); +const modelInfo = await manager.loadModel("whisper-medium"); +``` + +### C# +```csharp +var catalog = await manager.GetCatalogAsync(default); +var model = await catalog.GetModelAsync("whisper-medium", default); +if (!await model.IsCachedAsync(default)) + await model.DownloadAsync(null, default); +await model.LoadAsync(default); +``` + +--- + +## Python — Manual ONNX Pipeline + +Python requires a manual encoder/decoder pipeline using ONNX Runtime: + +### Dependencies +```bash +pip install foundry-local-sdk onnxruntime transformers librosa numpy +``` + +### Complete Pipeline + +```python +import numpy as np +import onnxruntime as ort +import librosa +from transformers import WhisperFeatureExtractor, WhisperTokenizer +from foundry_local import FoundryLocalManager +import os + +# Download model via SDK +manager = FoundryLocalManager("whisper-medium") +model_info = manager.get_model_info("whisper-medium") +cache_location = manager.get_cache_location() + +# Build path to ONNX files +model_dir = os.path.join( + cache_location, "Microsoft", + model_info.id.replace(":", "-"), + "cpu-fp32" +) + +# Load ONNX sessions +encoder_session = ort.InferenceSession( + os.path.join(model_dir, "whisper-medium_encoder_fp32.onnx"), + providers=["CPUExecutionProvider"], +) +decoder_session = ort.InferenceSession( + os.path.join(model_dir, "whisper-medium_decoder_fp32.onnx"), + providers=["CPUExecutionProvider"], +) + +# Load feature extractor and tokeniser +feature_extractor = WhisperFeatureExtractor.from_pretrained(model_dir) +tokenizer = WhisperTokenizer.from_pretrained(model_dir) + +# Whisper medium model dimensions +NUM_LAYERS = 24 +NUM_HEADS = 16 +HEAD_SIZE = 64 + +# Build initial decoder tokens +sot = tokenizer.convert_tokens_to_ids("<|startoftranscript|>") +eot = tokenizer.convert_tokens_to_ids("<|endoftext|>") +notimestamps = tokenizer.convert_tokens_to_ids("<|notimestamps|>") +forced_ids = tokenizer.get_decoder_prompt_ids(language="en", task="transcribe") +INITIAL_TOKENS = [sot] + [tid for _, tid in forced_ids] + [notimestamps] + + +def transcribe(audio_path): + # Load audio at 16kHz mono + audio, _ = librosa.load(audio_path, sr=16000) + + # Extract log-mel spectrogram + features = feature_extractor(audio, sampling_rate=16000, return_tensors="np") + audio_features = features["input_features"].astype(np.float32) + + # Run encoder + encoder_outputs = encoder_session.run(None, {"audio_features": audio_features}) + cross_kv_list = encoder_outputs[1:] + + # Prepare cross-attention KV cache + cross_kv = {} + for i in range(NUM_LAYERS): + cross_kv[f"past_key_cross_{i}"] = cross_kv_list[i * 2] + cross_kv[f"past_value_cross_{i}"] = cross_kv_list[i * 2 + 1] + + # Initialise self-attention KV cache + self_kv = {} + for i in range(NUM_LAYERS): + self_kv[f"past_key_self_{i}"] = np.zeros((1, NUM_HEADS, 0, HEAD_SIZE), dtype=np.float32) + self_kv[f"past_value_self_{i}"] = np.zeros((1, NUM_HEADS, 0, HEAD_SIZE), dtype=np.float32) + + # Autoregressive decoding + input_ids = np.array([INITIAL_TOKENS], dtype=np.int32) + generated = [] + + for _ in range(448): # Max tokens + feeds = {"input_ids": input_ids} + feeds.update(cross_kv) + feeds.update(self_kv) + + outputs = decoder_session.run(None, feeds) + logits = outputs[0] + next_token = int(np.argmax(logits[0, -1, :])) + + if next_token == eot: + break + + generated.append(next_token) + + # Update self-attention KV cache + for i in range(NUM_LAYERS): + self_kv[f"past_key_self_{i}"] = outputs[1 + i * 2] + self_kv[f"past_value_self_{i}"] = outputs[2 + i * 2] + + input_ids = np.array([[next_token]], dtype=np.int32) + + return tokenizer.decode(generated, skip_special_tokens=True).strip() +``` + +--- + +## C# — High-Level API + +C# provides a simpler API via `GetAudioClient()`: + +```csharp +var audioClient = model.GetAudioClient(); +var result = await audioClient.TranscribeAudioAsync(audioFilePath); +Console.WriteLine(result.Text); +``` + +--- + +## Audio Format Requirements + +| Requirement | Value | +|-------------|-------| +| Sample rate | 16,000 Hz (16 kHz) | +| Channels | Mono (1 channel) | +| Format | WAV (PCM) | +| Max duration | ~30 seconds per segment | + +If your audio is in a different format, resample before processing: + +```python +# librosa handles resampling automatically +audio, _ = librosa.load("input.wav", sr=16000) +``` + +--- + +## Known Issues + +| Issue | Severity | Workaround | +|-------|----------|------------| +| JS: Last audio file may return empty transcription | Minor | Node.js binding edge case; other files work fine | +| C#: Path resolution fragile with different RIDs | Minor | Use absolute paths or CLI arguments | +| OGA memory leak warnings on exit | Warning | Non-blocking; no cleanup API exposed | + +--- + +## Cross-References + +- [Foundry Local documentation](https://learn.microsoft.com/azure/foundry-local/) +- [foundrylocal.ai](https://foundrylocal.ai) +- For service setup and model download, see **microsoft-foundry-local-setup** +- For chat completions (text), see **microsoft-foundry-local-chat** +- For custom model compilation, see **microsoft-foundry-local-custom-models** diff --git a/skills/microsoft-foundry/SKILL.md b/skills/microsoft-foundry/SKILL.md new file mode 100644 index 00000000..23bd44c5 --- /dev/null +++ b/skills/microsoft-foundry/SKILL.md @@ -0,0 +1,101 @@ +--- +name: microsoft-foundry +description: "Deploy, evaluate, and manage Foundry agents end-to-end: Docker build, ACR push, hosted/prompt agent create, container start, batch eval, prompt optimization, agent.yaml, dataset curation from traces. USE FOR: deploy agent to Foundry, hosted agent, create agent, invoke agent, evaluate agent, run batch eval, optimize prompt, deploy model, Foundry project, RBAC, role assignment, permissions, quota, capacity, region, troubleshoot agent, deployment failure, create dataset from traces, dataset versioning, eval trending, create AI Services, Cognitive Services, create Foundry resource, provision resource, knowledge index, agent monitoring, customize deployment, onboard, availability, standard agent setup, capability host. DO NOT USE FOR: Azure Functions, App Service, general Azure deploy (use azure-deploy), general Azure prep (use azure-prepare)." +license: MIT +metadata: + author: Microsoft + version: "1.0.3" +--- + +# Microsoft Foundry Skill + +> **MANDATORY:** Read this skill and the relevant sub-skill BEFORE calling any Foundry MCP tool. + +## Sub-Skills + +| Sub-Skill | When to Use | Reference | +|-----------|-------------|-----------| +| **deploy** | Containerize, build, push to ACR, create/update/start/stop/clone agent deployments | [deploy](foundry-agent/deploy/deploy.md) | +| **invoke** | Send messages to an agent, single or multi-turn conversations | [invoke](foundry-agent/invoke/invoke.md) | +| **observe** | Eval-driven optimization loop: evaluate → analyze → optimize → compare → iterate | [observe](foundry-agent/observe/observe.md) | +| **trace** | Query traces, analyze latency/failures, correlate eval results to specific responses via App Insights `customEvents` | [trace](foundry-agent/trace/trace.md) | +| **troubleshoot** | View container logs, query telemetry, diagnose failures | [troubleshoot](foundry-agent/troubleshoot/troubleshoot.md) | +| **create** | Create new hosted agent applications. Supports Microsoft Agent Framework, LangGraph, or custom frameworks in Python or C#. Downloads starter samples from foundry-samples repo. | [create](foundry-agent/create/create.md) | +| **eval-datasets** | Harvest production traces into evaluation datasets, manage dataset versions and splits, track evaluation metrics over time, detect regressions, and maintain full lineage from trace to deployment. Use for: create dataset from traces, dataset versioning, evaluation trending, regression detection, dataset comparison, eval lineage. | [eval-datasets](foundry-agent/eval-datasets/eval-datasets.md) | +| **project/create** | Creating a new Azure AI Foundry project for hosting agents and models. Use when onboarding to Foundry or setting up new infrastructure. | [project/create/create-foundry-project.md](project/create/create-foundry-project.md) | +| **resource/create** | Creating Azure AI Services multi-service resource (Foundry resource) using Azure CLI. Use when manually provisioning AI Services resources with granular control. | [resource/create/create-foundry-resource.md](resource/create/create-foundry-resource.md) | +| **models/deploy-model** | Unified model deployment with intelligent routing. Handles quick preset deployments, fully customized deployments (version/SKU/capacity/RAI), and capacity discovery across regions. Routes to sub-skills: `preset` (quick deploy), `customize` (full control), `capacity` (find availability). | [models/deploy-model/SKILL.md](models/deploy-model/SKILL.md) | +| **quota** | Managing quotas and capacity for Microsoft Foundry resources. Use when checking quota usage, troubleshooting deployment failures due to insufficient quota, requesting quota increases, or planning capacity. | [quota/quota.md](quota/quota.md) | +| **rbac** | Managing RBAC permissions, role assignments, managed identities, and service principals for Microsoft Foundry resources. Use for access control, auditing permissions, and CI/CD setup. | [rbac/rbac.md](rbac/rbac.md) | + +Onboarding flow: `project/create` → `deploy` → `invoke` + +## Agent Lifecycle + +| Intent | Workflow | +|--------|----------| +| New agent from scratch | create → deploy → invoke | +| Deploy existing code | deploy → invoke | +| Test/chat with agent | invoke | +| Troubleshoot | invoke → troubleshoot | +| Fix + redeploy | troubleshoot → fix → deploy → invoke | + +## Project Context Resolution + +Resolve only missing values. Extract from user message first, then azd, then ask. + +1. Check for `azure.yaml`; if found, run `azd env get-values` +2. Map azd variables: + +| azd Variable | Resolves To | +|-------------|-------------| +| `AZURE_AI_PROJECT_ENDPOINT` / `AZURE_AIPROJECT_ENDPOINT` | Project endpoint | +| `AZURE_CONTAINER_REGISTRY_NAME` / `AZURE_CONTAINER_REGISTRY_ENDPOINT` | ACR registry | +| `AZURE_SUBSCRIPTION_ID` | Subscription | + +3. Ask user only for unresolved values (project endpoint, agent name) + +## Validation + +After each workflow step, validate before proceeding: +1. Run the operation +2. Check output for errors or unexpected results +3. If failed → diagnose using troubleshoot sub-skill → fix → retry +4. Only proceed to next step when validation passes + +## Agent Types + +| Type | Kind | Description | +|------|------|-------------| +| **Prompt** | `"prompt"` | LLM-based, backed by model deployment | +| **Hosted** | `"hosted"` | Container-based, running custom code | + +## Agent: Setup Types + +| Setup | Capability Host | Description | +|-------|----------------|-------------| +| **Basic** | None | Default. All resources Microsoft-managed. | +| **Standard** | Azure AI Services | Bring-your-own storage and search (public network). See [standard-agent-setup](references/standard-agent-setup.md). | +| **Standard + Private Network** | Azure AI Services | Standard setup with VNet isolation and private endpoints. See [private-network-standard-agent-setup](references/private-network-standard-agent-setup.md). | + +> **MANDATORY:** For standard setup, read the appropriate reference before proceeding: +> - **Public network:** [references/standard-agent-setup.md](references/standard-agent-setup.md) +> - **Private network (VNet isolation):** [references/private-network-standard-agent-setup.md](references/private-network-standard-agent-setup.md) + +## Tool Usage Conventions + +- Use the `ask_user` or `askQuestions` tool whenever collecting information from the user +- Use the `task` or `runSubagent` tool to delegate long-running or independent sub-tasks (e.g., env var scanning, status polling, Dockerfile generation) +- Prefer Azure MCP tools over direct CLI commands when available +- Reference official Microsoft documentation URLs instead of embedding CLI command syntax + +## References + +- [Hosted Agents](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/hosted-agents?view=foundry) +- [Runtime Components](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/runtime-components?view=foundry) +- [Foundry Samples](https://github.com/azure-ai-foundry/foundry-samples) +- [Python SDK](references/sdk/foundry-sdk-py.md) + +## Dependencies + +Scripts in sub-skills require: Azure CLI (`az`) ≥2.0, `jq` (for shell scripts). Install via `pip install azure-ai-projects azure-identity` for Python SDK usage. \ No newline at end of file diff --git a/skills/microsoft-foundry/foundry-agent/create/create-prompt.md b/skills/microsoft-foundry/foundry-agent/create/create-prompt.md new file mode 100644 index 00000000..46fa9a3e --- /dev/null +++ b/skills/microsoft-foundry/foundry-agent/create/create-prompt.md @@ -0,0 +1,89 @@ +# Create Prompt Agent + +Create and manage prompt agents in Azure Foundry Agent Service using MCP tools or Python SDK. For hosted agents (container-based), see [create.md](create.md). + +## Quick Reference + +| Property | Value | +|----------|-------| +| **Agent Type** | Prompt (`kind: "prompt"`) | +| **Primary Tool** | Foundry MCP server (`foundry_agents_*`) | +| **Fallback SDK** | `azure-ai-projects` v2.x preview | +| **Auth** | `DefaultAzureCredential` / `az login` | + +## Workflow + +``` +User Request (create/list/get/update/delete agent) + │ + ▼ +Step 1: Resolve project context (endpoint + credentials) + │ + ▼ +Step 2: Try MCP tool for the operation + │ ├─ ✅ MCP available → Execute via MCP tool → Done + │ └─ ❌ MCP unavailable → Continue to Step 3 + │ + ▼ +Step 3: Fall back to SDK + │ Read references/sdk-operations.md for code + │ + ▼ +Step 4: Execute and confirm result +``` + +### Step 1: Resolve Project Context + +The user needs a Foundry project endpoint. Check for: + +1. `PROJECT_ENDPOINT` environment variable +2. Ask the user for their project endpoint +3. Use `foundry_resource_get` MCP tool to discover it + +Endpoint format: `https://.services.ai.azure.com/api/projects/` + +### Step 2: Create Agent (MCP — Preferred) + +For a **prompt agent**: +- Provide: agent name, model deployment name, instructions +- Optional: tools (code interpreter, file search, function calling, web search, Bing grounding, memory) + +For a **workflow**: +- Workflows are created in the Foundry portal visual builder +- Use MCP to create the individual agents that participate in the workflow +- Direct the user to the Foundry portal for workflow assembly + +### Step 3: SDK Fallback + +If MCP tools are unavailable, use the `azure-ai-projects` SDK: +- See [SDK Operations](references/sdk-operations.md) for create, list, update, delete code samples +- See [Agent Tools](references/agent-tools.md) for adding tools to agents + +### Step 4: Add Tools (Optional) + +> ⚠️ **MANDATORY:** Before configuring any tool, **read its reference documentation** linked below to understand prerequisites, required parameters, and setup steps. Do not attempt to add a tool without first reviewing its reference. + +| Tool Category | Reference | +|---------------|-----------| +| Code Interpreter, Function Calling | [Simple Tools](references/agent-tools.md) | +| File Search (requires vector store) | [File Search](references/tool-file-search.md) | +| Web Search (default, no setup needed) | [Web Search](references/tool-web-search.md) | +| Bing Grounding (explicit request only) | [Bing Grounding](references/tool-bing-grounding.md) | +| Azure AI Search (private data) | [Azure AI Search](references/tool-azure-ai-search.md) | +| MCP Servers | [MCP Tool](references/tool-mcp.md) | +| Memory (persistent across sessions) | [Memory](references/tool-memory.md) | +| Connections (for tools that need them) | [Project Connections](../../project/connections.md) | + +> ⚠️ **Web Search Default:** Use `WebSearchPreviewTool` for web search. Only use `BingGroundingAgentTool` when the user explicitly requests Bing Grounding. + +## Error Handling + +| Error | Cause | Resolution | +|-------|-------|------------| +| Agent creation fails | Missing model deployment | Deploy a model first via `foundry_models_deploy` or portal | +| MCP tool not found | MCP server not running | Fall back to SDK — see [SDK Operations](references/sdk-operations.md) | +| Permission denied | Insufficient RBAC | Need `Azure AI User` role on the project | +| Agent name conflict | Name already exists | Use a unique name or update the existing agent | +| Tool not available | Tool not configured for project | Verify tool prerequisites (e.g., Bing resource for grounding) | +| SDK version mismatch | Using 1.x instead of 2.x | Install `azure-ai-projects --pre` for v2.x preview | +| Tenant mismatch | MCP token tenant differs from resource tenant | Fall back to SDK — `DefaultAzureCredential` resolves the correct tenant | diff --git a/skills/microsoft-foundry/foundry-agent/create/create.md b/skills/microsoft-foundry/foundry-agent/create/create.md new file mode 100644 index 00000000..1f9042e5 --- /dev/null +++ b/skills/microsoft-foundry/foundry-agent/create/create.md @@ -0,0 +1,239 @@ +# Create Hosted Agent Application + +Create new hosted agent applications for Microsoft Foundry, or convert existing agent projects to be Foundry-compatible using the hosting adapter. + +## Quick Reference + +| Property | Value | +|----------|-------| +| **Samples Repo** | `microsoft-foundry/foundry-samples` | +| **Python Samples** | `samples/python/hosted-agents/{framework}/` | +| **C# Samples** | `samples/csharp/hosted-agents/{framework}/` | +| **Hosted Agents Docs** | https://learn.microsoft.com/azure/ai-foundry/agents/concepts/hosted-agents | +| **Best For** | Creating new or converting existing agent projects for Foundry | + +## When to Use This Skill + +- Create a new hosted agent application from scratch (greenfield) +- Start from an official sample and customize it +- Convert an existing agent project to be Foundry-compatible (brownfield) +- Help user choose a framework or sample for their agent + +## Workflow + +### Step 1: Determine Scenario + +Check the user's workspace for existing agent project indicators: + +- **No agent-related code found** → **Greenfield**. Proceed to Greenfield Workflow (Step 2). +- **Existing agent code present** → **Brownfield**. Proceed to Brownfield Workflow. + +### Step 2: Gather Requirements (Greenfield) + +If the user hasn't already specified, use `ask_user` to collect: + +**Framework:** + +| Framework | Python Path | C# Path | +|-----------|------------|---------| +| Microsoft Agent Framework (default) | `agent-framework` | `AgentFramework` | +| LangGraph | `langgraph` | ❌ Python only | +| Custom | `custom` | `AgentWithCustomFramework` | + +**Language:** Python (default) or C#. + +> ⚠️ **Warning:** LangGraph is Python-only. For C# + LangGraph, suggest Agent Framework or Custom instead. + +If user has no specific preference, suggest Microsoft Agent Framework + Python as defaults. + +### Step 3: Browse and Select Sample + +List available samples using the GitHub API: + +``` +GET https://api.github.com/repos/microsoft-foundry/foundry-samples/contents/samples/{language}/hosted-agents/{framework} +``` + +If the user has specified any information on what they want their agent to do, just choose the most relevant or most simple sample to start with. Only if user has not given any preferences, present the sample directories to the user and help them choose based on their requirements (e.g., RAG, tools, multi-agent workflows, HITL). + +### Step 4: Download Sample Files + +Download only the selected sample directory — do NOT clone the entire repo. Preserve the directory structure by creating subdirectories as needed. + +**Using `gh` CLI (preferred if available):** +```bash +gh api repos/microsoft-foundry/foundry-samples/contents/samples/{language}/hosted-agents/{framework}/{sample} \ + --jq '.[] | select(.type=="file") | .download_url' | while read url; do + filepath="${url##*/samples/{language}/hosted-agents/{framework}/{sample}/}" + mkdir -p "$(dirname "$filepath")" + curl -sL "$url" -o "$filepath" +done +``` + +**Using curl (fallback):** +```bash +curl -s "https://api.github.com/repos/microsoft-foundry/foundry-samples/contents/samples/{language}/hosted-agents/{framework}/{sample}" | \ + jq -r '.[] | select(.type=="file") | .path + "\t" + .download_url' | while IFS=$'\t' read path url; do + relpath="${path#samples/{language}/hosted-agents/{framework}/{sample}/}" + mkdir -p "$(dirname "$relpath")" + curl -sL "$url" -o "$relpath" + done +``` + +For nested directories, recursively fetch the GitHub contents API for entries where `type == "dir"` and repeat the download for each. + +### Step 5: Customize and Implement + +1. Read the sample's README.md to understand its structure +2. Read the sample code to understand patterns and dependencies used +3. If using Agent Framework, follow the best practices in [references/agentframework.md](references/agentframework.md) +4. Implement the user's specific requirements on top of the sample +5. Update configuration (`.env`, dependency files) as needed. +6. Ensure the project is in a runnable state + +### Step 6: Verify Startup + +1. Install dependencies (use virtual environment for Python) +2. Ask user to provide values for .env variables if placeholders were used using `ask_user` tool. +3. Run the main entrypoint +4. Fix startup errors and retry if needed +5. Send a test request to the agent. The agent will support OpenAI Responses schema. +6. Fix any errors from the test request and retry until it succeeds +7. Once startup and test request succeed, stop the server to prevent resource usage + +**Guardrails:** +- ✅ Perform real run to catch startup errors +- ✅ Cleanup after verification (stop server) +- ✅ Ignore auth/connection/timeout errors (expected without Azure config) +- ❌ Don't wait for user input or create test scripts + +## Brownfield Workflow: Convert Existing Agent to Hosted Agent + +Use this workflow when the user has an existing agent project that needs to be made compatible with Foundry hosted agent deployment. The key requirement is wrapping the agent with the appropriate **hosting adapter** package, which converts any agent into an HTTP service compatible with the Foundry Responses API. + +### Step B1: Analyze Existing Project + +Scan the project to determine: + +1. **Language** — Python (look for `requirements.txt`, `pyproject.toml`, `*.py`) or C# (look for `*.csproj`, `*.cs`) +2. **Framework** — Identify which agent framework is in use: + +| Indicator | Framework | +|-----------|-----------| +| Imports from `agent_framework` or `Microsoft.Agents.AI` | Microsoft Agent Framework | +| Imports from `langgraph`, `langchain` | LangGraph | +| No recognized framework imports, or other frameworks (e.g., Semantic Kernel, AutoGen) | Custom | + +3. **Entry point** — Identify the main script/entrypoint that creates and runs the agent +4. **Agent object** — Identify the agent instance that needs to be wrapped (e.g., a `BaseAgent` subclass, a compiled `StateGraph`, or an existing server/app) + +### Step B2: Add Hosting Adapter Dependency + +Add the correct adapter package based on framework and language. Get the latest version from the package registry — do not hardcode versions. + +**Python adapter packages:** + +| Framework | Package | +|-----------|---------| +| Microsoft Agent Framework | `azure-ai-agentserver-agentframework` | +| LangGraph | `azure-ai-agentserver-langgraph` | +| Custom | `azure-ai-agentserver-core` | + +**.NET adapter packages:** + +| Framework | Package | +|-----------|---------| +| Microsoft Agent Framework | `Azure.AI.AgentServer.AgentFramework` | +| Custom | `Azure.AI.AgentServer.Core` | + +Add the package to the project's dependency file (`requirements.txt`, `pyproject.toml`, or `.csproj`). For Python, also add `python-dotenv` if not present. + +### Step B3: Wrap Agent with Hosting Adapter + +Modify the project's main entrypoint to wrap the existing agent with the adapter. The approach differs by framework: + +**Microsoft Agent Framework (Python):** +- Import `from_agent_framework` from the adapter package +- Pass the agent instance (a `BaseAgent` subclass) to the adapter +- Call `.run()` on the adapter as the default entrypoint +- The agent must implement both `run()` and `run_stream()` methods + +**LangGraph (Python):** +- Import `from_langgraph` from the adapter package +- Pass the compiled `StateGraph` to the adapter +- Call `.run()` on the adapter as the default entrypoint + +**Custom code (Python):** +- Import `FoundryCBAgent` from the core adapter package +- Create a class that extends `FoundryCBAgent` +- Implement the `agent_run()` method which receives an `AgentRunContext` and returns either an `OpenAIResponse` (non-streaming) or `AsyncGenerator[ResponseStreamEvent]` (streaming) +- The agent must handle the Foundry request/response protocol manually — refer to the [custom sample](https://github.com/microsoft-foundry/foundry-samples/tree/main/samples/python/hosted-agents/custom) for the exact interface +- Instantiate and call `.run()` as the default entrypoint + +**Custom code (C#):** +- Use `AgentServerApplication.RunAsync()` with dependency injection to register an `IAgentInvocation` implementation +- Refer to the [C# custom sample](https://github.com/microsoft-foundry/foundry-samples/tree/main/samples/csharp/hosted-agents/AgentWithCustomFramework) for the exact interface + +> ⚠️ **Warning:** The adapter MUST be the default entrypoint (no flags required to start). This is required for both local debugging and containerized deployment. + +### Step B4: Configure Environment + +1. Create or update a `.env` file with required environment variables (project endpoint, model deployment name, etc.) +2. For Python: ensure the code uses `load_dotenv()` so Foundry-injected environment variables is available at runtime. +3. If the project uses Azure credentials: ensure Python uses `azure.identity.aio.DefaultAzureCredential` (async version) for **local development**, not `azure.identity.DefaultAzureCredential`. In production, use `ManagedIdentityCredential`. See [auth-best-practices.md](../../references/auth-best-practices.md) + +### Step B5: Create agent.yaml + +Create an `agent.yaml` file in the project root. This file defines the agent's metadata and deployment configuration for Foundry. Required fields: + +- `name` — Unique identifier (alphanumeric + hyphens, max 63 chars) +- `description` — What the agent does +- `template.kind` — Must be `hosted` +- `template.protocols` — Must include `responses` protocol v1 +- `template.environment_variables` — List all environment variables the agent needs at runtime + +Refer to any sample's `agent.yaml` in the [foundry-samples repo](https://github.com/microsoft-foundry/foundry-samples/tree/main/samples/python/hosted-agents) for the exact schema. + +### Step B6: Create Dockerfile + +Create a `Dockerfile` if one doesn't exist. Requirements: + +- Base image appropriate for the language (e.g., `python:3.12-slim` for Python, `mcr.microsoft.com/dotnet/sdk` for C#) +- Copy source code into the container +- Install dependencies +- Expose port **8088** (the adapter's default port) +- Set the main entrypoint as the CMD + +> ⚠️ **Warning:** When building, MUST use `--platform linux/amd64`. Hosted agents run on Linux AMD64 infrastructure. Images built for other architectures (e.g., ARM64 on Apple Silicon) will fail. + +Refer to any sample's `Dockerfile` in the [foundry-samples repo](https://github.com/microsoft-foundry/foundry-samples/tree/main/samples/python/hosted-agents) for the exact pattern. + +### Step B7: Test Locally + +1. Install dependencies (use virtual environment for Python) +2. Run the main entrypoint — the adapter should start an HTTP server on `localhost:8088` +3. Send a test request: `POST http://localhost:8088/responses` with body `{"input": "hello"}` +4. Verify the response follows the OpenAI Responses API format +5. Fix any errors and retry until the test request succeeds +6. Stop the server + +> 💡 **Tip:** If auth/connection errors occur for Azure services, that's expected without real Azure credentials configured. The key validation is that the HTTP server starts and accepts requests. + +## Common Guidelines + +IMPORTANT: YOU MUST FOLLOW THESE. + +Apply these to both greenfield and brownfield projects: + +1. **Logging** — Implement proper logging using the language's standard logging framework (Python `logging` module, .NET `ILogger`). Hosted agents stream container stdout/stderr logs to Foundry, so all log output is visible via the troubleshoot workflow. Use structured log levels (INFO, WARNING, ERROR) and include context like request IDs and agent names. + +2. **Framework-specific best practices** — When using Agent Framework, read the [Agent Framework best practices](references/agentframework.md) for hosting adapter setup, credential patterns, and debugging guidance. + +## Error Handling + +| Error | Cause | Resolution | +|-------|-------|------------| +| GitHub API rate limit | Too many requests | Authenticate with `gh auth login` | +| `gh` not available | CLI not installed | Use curl REST API fallback | +| Sample not found | Path changed in repo | List parent directory to discover current samples | +| Dependency install fails | Version conflicts | Use versions from sample's own dependency file | diff --git a/skills/microsoft-foundry/foundry-agent/create/references/agent-tools.md b/skills/microsoft-foundry/foundry-agent/create/references/agent-tools.md new file mode 100644 index 00000000..ce0eb5c8 --- /dev/null +++ b/skills/microsoft-foundry/foundry-agent/create/references/agent-tools.md @@ -0,0 +1,45 @@ +# Agent Tools — Simple Tools + +Add tools to agents to extend capabilities. This file covers tools that work without external connections. For tools requiring connections/RBAC setup, see: +- [Web Search tool](tool-web-search.md) — real-time public web search with citations (default for web search) +- [Bing Grounding tool](tool-bing-grounding.md) — web search via dedicated Bing resource (only when explicitly requested) +- [Azure AI Search tool](tool-azure-ai-search.md) — private data grounding with vector search +- [MCP tool](tool-mcp.md) — remote Model Context Protocol servers + +## Code Interpreter + +Enables agents to write and run Python in a sandboxed environment. Supports data analysis, chart generation, and file processing. Has [additional charges](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/) beyond token-based fees. + +> Sessions: 1-hour active / 30-min idle timeout. Each conversation = separate billable session. + +For code samples, see: [Code Interpreter tool documentation](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/tools/code-interpreter?view=foundry) + +## Function Calling + +Define custom functions the agent can invoke. Your app executes the function and returns results. Runs expire 10 minutes after creation — return tool outputs promptly. + +> **Security:** Treat tool arguments as untrusted input. Don't pass secrets in tool output. Use `strict=True` for schema validation. + +For code samples, see: [Function Calling tool documentation](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/tools/function-calling?view=foundry) + +## Tool Summary + +| Tool | Connection? | Reference | +|------|-------------|-----------| +| `CodeInterpreterTool` | No | This file | +| `FileSearchTool` | No (vector store required) | [tool-file-search.md](tool-file-search.md) | +| `FunctionTool` | No | This file | +| `WebSearchPreviewTool` | No | [tool-web-search.md](tool-web-search.md) | +| `BingGroundingAgentTool` | Yes (Bing) | [tool-bing-grounding.md](tool-bing-grounding.md) | +| `AzureAISearchAgentTool` | Yes (Search) | [tool-azure-ai-search.md](tool-azure-ai-search.md) | +| `MCPTool` | Optional | [tool-mcp.md](tool-mcp.md) | + +> ⚠️ **Default for web search:** Use `WebSearchPreviewTool` unless the user explicitly requests Bing Grounding or Bing Custom Search. + +> Combine multiple tools on one agent. The model decides which to invoke. + +## References + +- [Tool Catalog](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/tool-catalog?view=foundry) +- [Code Interpreter](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/tools/code-interpreter?view=foundry) +- [Function Calling](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/tools/function-calling?view=foundry) diff --git a/skills/microsoft-foundry/foundry-agent/create/references/agentframework.md b/skills/microsoft-foundry/foundry-agent/create/references/agentframework.md new file mode 100644 index 00000000..51293695 --- /dev/null +++ b/skills/microsoft-foundry/foundry-agent/create/references/agentframework.md @@ -0,0 +1,92 @@ +# Microsoft Agent Framework — Best Practices for Hosted Agents + +Best practices when building hosted agents with Microsoft Agent Framework for deployment to Foundry Agent Service. + +## Official Resources + +| Resource | URL | +|----------|-----| +| **GitHub Repo** | https://github.com/microsoft/agent-framework | +| **MS Learn Overview** | https://learn.microsoft.com/agent-framework/overview/agent-framework-overview | +| **Quick Start** | https://learn.microsoft.com/agent-framework/tutorials/quick-start | +| **User Guide** | https://learn.microsoft.com/agent-framework/user-guide/overview | +| **Hosted Agents Concepts** | https://learn.microsoft.com/azure/ai-foundry/agents/concepts/hosted-agents | +| **Python Samples (MAF repo)** | https://github.com/microsoft/agent-framework/tree/main/python/samples | +| **.NET Samples (MAF repo)** | https://github.com/microsoft/agent-framework/tree/main/dotnet/samples | +| **PyPI** | https://pypi.org/project/agent-framework/ | +| **NuGet** | https://www.nuget.org/profiles/MicrosoftAgentFramework/ | + +## Installation + +**Python:** `pip install agent-framework --pre` (installs all sub-packages) + +**.NET:** `dotnet add package Microsoft.Agents.AI` + +> ⚠️ **Warning:** Always pin specific pre-release versions. Use `--pre` to get the latest. Check the [PyPI page](https://pypi.org/project/agent-framework/) or [NuGet profile](https://www.nuget.org/profiles/MicrosoftAgentFramework/) for current stable versions. + +## Hosting Adapter + +Hosted agents must expose an HTTP server using the hosting adapter. This enables local testing and Foundry deployment with the same code. + +**Python adapter packages:** `azure-ai-agentserver-core`, `azure-ai-agentserver-agentframework` + +**.NET adapter packages:** `Azure.AI.AgentServer.Core`, `Azure.AI.AgentServer.AgentFramework` + +The adapter handles protocol translation between Foundry request/response formats and your framework's native data structures, including conversation management, message serialization, and streaming. + +> 💡 **Tip:** Make HTTP server mode the default entrypoint (no flags needed). This simplifies both local debugging and containerized deployment. + +## Key Patterns + +### Python: Async Credentials + +For **local development**, use `DefaultAzureCredential` from `azure.identity.aio` (not `azure.identity`) — `AzureAIClient` requires async credentials. In production, use `ManagedIdentityCredential` from `azure.identity.aio`. See [auth-best-practices.md](../../../references/auth-best-practices.md). + +### Python: Environment Variables + +Always use `load_dotenv(override=False)` so environment variables set by Foundry at runtime take precedence over local `.env` values. + +Required `.env` variables: +- `FOUNDRY_PROJECT_ENDPOINT` — project endpoint URL +- `FOUNDRY_MODEL_DEPLOYMENT_NAME` — model deployment name + +### Authentication + +If explicitly asked to use API key instead of managed identity, then use AzureOpenAIResponsesClient and pass in api_key parameter to it. + +### Agent Naming Rules + +Agent names must: start/end with alphanumeric characters, may contain hyphens in the middle, max 63 characters. Examples: `MyAgent`, `agent-1`. Invalid: `-agent`, `agent-`, `sample_agent`. + +### Python: Virtual Environment + +Always use a virtual environment. Never use bare `python` or `pip` — use venv-activated versions or full paths (e.g., `.venv/bin/pip`). + +## Workflow Patterns + +Agent Framework supports single-agent and multi-agent workflow patterns using graph-based orchestration: + +- **Single Agent** — Basic agent with tools, RAG, or MCP integration +- **Multi-Agent Workflow** — Graph-based orchestration connecting multiple agents and deterministic functions +- **Advanced Patterns** — Reflection, switch-case, fan-out/fan-in, loop, human-in-the-loop + +For workflow samples and advanced patterns, search the [Agent Framework GitHub repo](https://github.com/microsoft/agent-framework). + +## Debugging + +Use [AI Toolkit for VS Code](https://marketplace.visualstudio.com/items?itemName=ms-windows-ai-studio.windows-ai-studio) with the `agentdev` CLI tool for interactive debugging: + +1. Install `debugpy` for VS Code Python Debugger support +2. Install `agent-dev-cli` (pre-release) for the `agentdev` command +3. Key debug tasks: `agentdev run .py --port 8087` starts the agent HTTP server, `debugpy --listen 127.0.0.1:5679` attaches the debugger, and the `ai-mlstudio.openTestTool` VS Code command opens the Agent Inspector UI + +For VS Code `launch.json` and `tasks.json` configuration templates, see [AI Toolkit Agent Inspector — Configure debugging manually](https://github.com/microsoft/vscode-ai-toolkit/blob/main/doc/agent-test-tool.md#configure-debugging-manually). + +## Common Errors + +| Error | Cause | Fix | +|-------|-------|-----| +| `ModuleNotFoundError` | Missing SDK | `pip install agent-framework --pre` in venv | +| Async credential error | Wrong import | Use `azure.identity.aio.DefaultAzureCredential` (local dev) or `azure.identity.aio.ManagedIdentityCredential` (production) | +| Agent name validation error | Invalid characters | Use alphanumeric + hyphens, start/end alphanumeric, max 63 chars | +| Hosting adapter not found | Missing package | Install `azure-ai-agentserver-agentframework` | diff --git a/skills/microsoft-foundry/foundry-agent/create/references/sdk-operations.md b/skills/microsoft-foundry/foundry-agent/create/references/sdk-operations.md new file mode 100644 index 00000000..e84cccc5 --- /dev/null +++ b/skills/microsoft-foundry/foundry-agent/create/references/sdk-operations.md @@ -0,0 +1,47 @@ +# SDK Operations for Foundry Agent Service + +Use the Foundry MCP tools for agent CRUD operations. When MCP tools are unavailable, use the `azure-ai-projects` Python SDK or REST API. + +## Agent Operations via MCP + +| Operation | MCP Tool | Description | +|-----------|----------|-------------| +| Create/Update agent | `agent_update` | Create a new agent or update an existing one (creates new version) | +| List/Get agents | `agent_get` | List all agents, or get a specific agent by name | +| Delete agent | `agent_delete` | Delete an agent | +| Invoke agent | `agent_invoke` | Send a message to an agent and get a response | +| Get schema | `agent_definition_schema_get` | Get the full JSON schema for agent definitions | + +## SDK Agent Operations + +When MCP tools are unavailable, use the `azure-ai-projects` Python SDK (`pip install azure-ai-projects --pre`): + +```python +from azure.ai.projects import AIProjectClient +from azure.identity import DefaultAzureCredential + +endpoint = "https://.services.ai.azure.com/api/projects/" +client = AIProjectClient(endpoint=endpoint, credential=DefaultAzureCredential()) +``` + +| Operation | SDK Method | +|-----------|------------| +| Create | `client.agents.create_version(agent_name, definition)` | +| List | `client.agents.list()` | +| Get | `client.agents.get(agent_name)` | +| Update | `client.agents.create_version(agent_name, definition)` (creates new version) | +| Delete | `client.agents.delete(agent_name)` | +| Chat | `client.get_openai_client().responses.create(model=, input=, extra_body={"agent": {"name": agent_name, "type": "agent_reference"}})` | + +## Environment Variables + +| Variable | Description | +|----------|-------------| +| `PROJECT_ENDPOINT` | Foundry project endpoint (`https://.services.ai.azure.com/api/projects/`) | +| `MODEL_DEPLOYMENT_NAME` | Deployed model name (e.g., `gpt-4.1-mini`) | + +## References + +- [Agent quickstart](https://learn.microsoft.com/azure/ai-foundry/agents/quickstart?view=foundry) +- [Create agents](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/create-agent?view=foundry) +- [Tool Catalog](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/tool-catalog?view=foundry) diff --git a/skills/microsoft-foundry/foundry-agent/create/references/tool-azure-ai-search.md b/skills/microsoft-foundry/foundry-agent/create/references/tool-azure-ai-search.md new file mode 100644 index 00000000..9859e81c --- /dev/null +++ b/skills/microsoft-foundry/foundry-agent/create/references/tool-azure-ai-search.md @@ -0,0 +1,69 @@ +# Azure AI Search Tool + +Ground agent responses with data from an Azure AI Search vector index. Requires a project connection and proper RBAC setup. + +## Prerequisites + +- Azure AI Search index with vector search configured: + - One or more `Edm.String` fields (searchable + retrievable) + - One or more `Collection(Edm.Single)` vector fields (searchable) + - At least one retrievable text field with content for citations + - A retrievable field with source URL for citation links +- A [project connection](../../../project/connections.md) between your Foundry project and search service +- `azure-ai-projects` package (`pip install azure-ai-projects --pre`) + +## Required RBAC Roles + +For **keyless authentication** (recommended), assign these roles to the **Foundry project's managed identity** on the Azure AI Search resource: + +| Role | Scope | Purpose | +|------|-------|---------| +| **Search Index Data Contributor** | AI Search resource | Read/write index data | +| **Search Service Contributor** | AI Search resource | Manage search service config | + +> **If RBAC assignment fails:** Ask the user to manually assign roles in Azure portal → AI Search resource → Access control (IAM). They need Owner or User Access Administrator on the search resource. + +## Connection Setup + +A project connection between your Foundry project and the Azure AI Search resource is required. See [Project Connections](../../../project/connections.md) for connection management via Foundry MCP tools. + +## Query Types + +| Value | Description | +|-------|-------------| +| `SIMPLE` | Keyword search | +| `VECTOR` | Vector similarity only | +| `SEMANTIC` | Semantic ranking | +| `VECTOR_SIMPLE_HYBRID` | Vector + keyword | +| `VECTOR_SEMANTIC_HYBRID` | Vector + keyword + semantic (default, recommended) | + +## Tool Parameters + +| Parameter | Required | Description | +|-----------|----------|-------------| +| `project_connection_id` | Yes | Connection ID (resolve via `foundry_connections_get`) | +| `index_name` | Yes | Search index name | +| `top_k` | No | Number of results (default: 5) | +| `query_type` | No | Search type (default: `vector_semantic_hybrid`) | +| `filter` | No | OData filter applied to all queries | + +## Limitations + +- Only **one index per tool** instance. For multiple indexes, use connected agents each with their own index. +- Search resource and Foundry agent must be in the **same tenant**. +- Private AI Search resources require **standard agent deployment** with vNET injection. + +## Troubleshooting + +| Error | Cause | Fix | +|-------|-------|-----| +| 401/403 accessing index | Missing RBAC roles | Assign `Search Index Data Contributor` + `Search Service Contributor` to project managed identity | +| Index not found | Name mismatch | Verify `AI_SEARCH_INDEX_NAME` matches exactly (case-sensitive) | +| No citations in response | Instructions don't request them | Add citation instructions to agent prompt | +| Wrong connection endpoint | Connection points to different search resource | Re-create connection with correct endpoint | + +## References + +- [Azure AI Search tool documentation](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/tools/azure-ai-search?view=foundry) +- [Tool Catalog](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/tool-catalog?view=foundry) +- [Project Connections](../../../project/connections.md) diff --git a/skills/microsoft-foundry/foundry-agent/create/references/tool-bing-grounding.md b/skills/microsoft-foundry/foundry-agent/create/references/tool-bing-grounding.md new file mode 100644 index 00000000..9d466cd2 --- /dev/null +++ b/skills/microsoft-foundry/foundry-agent/create/references/tool-bing-grounding.md @@ -0,0 +1,50 @@ +# Bing Grounding Tool + +Access real-time web information via Bing Search. Unlike the [Web Search tool](tool-web-search.md) (which works out of the box), Bing Grounding requires a dedicated Bing resource and a project connection. + +> ⚠️ **Warning:** Use the [Web Search tool](tool-web-search.md) as the default for web search. Only use Bing Grounding when the user **explicitly** requests Grounding with Bing Search or Grounding with Bing Custom Search. + +## When to Use + +- User explicitly asks for "Bing Grounding" or "Grounding with Bing Search" +- User explicitly asks for "Bing Custom Search" or "Grounding with Bing Custom Search" +- User needs to restrict web search to specific domains (Bing Custom Search) +- User has an existing Bing Grounding resource they want to use + +## Prerequisites + +- A [Grounding with Bing Search resource](https://portal.azure.com/#create/Microsoft.BingGroundingSearch) in Azure portal +- `Contributor` or `Owner` role at subscription/RG level to create Bing resource and get keys +- `Azure AI Project Manager` role on the project to create a connection +- A project connection configured with the Bing resource key — see [connections](../../../project/connections.md) + +## Setup + +1. Register the Bing provider: `az provider register --namespace 'Microsoft.Bing'` +2. Create a Grounding with Bing Search resource in the Azure portal +3. Create a project connection with the Bing resource key — see [connections](../../../project/connections.md) +4. Set `BING_PROJECT_CONNECTION_NAME` environment variable + +## Important Disclosures + +- Bing data flows **outside Azure compliance boundary** +- Review [Grounding with Bing terms of use](https://www.microsoft.com/bing/apis/grounding-legal-enterprise) +- Not supported with VPN/Private Endpoints +- Usage incurs costs — see [pricing](https://www.microsoft.com/bing/apis/grounding-pricing) + +## Troubleshooting + +| Issue | Cause | Resolution | +|-------|-------|------------| +| Connection not found | Name mismatch or wrong project | Use `foundry_connections_list` to find correct name | +| Unauthorized creating connection | Missing Azure AI Project Manager role | Assign role on the Foundry project | +| Bing resource creation fails | Provider not registered | Run `az provider register --namespace 'Microsoft.Bing'` | +| No results returned | Connection misconfigured | Verify Bing resource key and connection setup | + +## References + +- [Bing Grounding tool documentation](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/tools/bing-grounding?view=foundry) +- [Tool Catalog](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/tool-catalog?view=foundry) +- [Grounding with Bing Terms](https://www.microsoft.com/bing/apis/grounding-legal-enterprise) +- [Connections Guide](../../../project/connections.md) +- [Web Search Tool (default)](tool-web-search.md) diff --git a/skills/microsoft-foundry/foundry-agent/create/references/tool-file-search.md b/skills/microsoft-foundry/foundry-agent/create/references/tool-file-search.md new file mode 100644 index 00000000..159f73c8 --- /dev/null +++ b/skills/microsoft-foundry/foundry-agent/create/references/tool-file-search.md @@ -0,0 +1,60 @@ +# File Search Tool + +Enables agents to search through uploaded files using semantic and keyword search from vector stores. Supports a wide range of file formats including PDF, Markdown, Word, and more. + +> ⚠️ **Important:** Before creating an agent with file search, you **must** read the official documentation linked in the References section to understand prerequisites, supported file types, and vector store setup. + +## Prerequisites + +- A [basic or standard agent environment](https://learn.microsoft.com/azure/ai-foundry/agents/environment-setup) +- A **vector store** must be created before the agent — the `file_search` tool requires `vector_store_ids` +- Files must be uploaded to the vector store before the agent can search them + +## Key Concepts + +| Concept | Description | +|---------|-------------| +| **Vector Store** | A container that indexes uploaded files for semantic search. Must be created first. | +| **vector_store_ids** | Required parameter on the `file_search` tool — references the vector store(s) to search. | +| **File upload** | Files are uploaded to the project, then attached to a vector store for indexing. | + +## Setup Workflow + +``` +1. Create a vector store (REST API: POST /vector_stores) + │ + ▼ +2. (Optional) Upload files and attach to vector store + │ + ▼ +3. Create agent with file_search tool referencing the vector_store_ids + │ + ▼ +4. Agent can now search files in the vector store +``` + +> ⚠️ **Warning:** Creating an agent with `file_search` without providing `vector_store_ids` will fail with a `400 BadRequest` error: `required: Required properties ["vector_store_ids"] are not present`. + +## REST API Notes + +When creating vector stores via `az rest`: + +| Parameter | Value | +|-----------|-------| +| **Endpoint** | `https://.services.ai.azure.com/api/projects//vector_stores` | +| **API version** | `v1` | +| **Auth resource** | `https://ai.azure.com` | + +## Troubleshooting + +| Error | Cause | Fix | +|-------|-------|-----| +| `vector_store_ids` not present | Agent created without vector store | Create a vector store first, then pass its ID | +| 401 Unauthorized | Wrong auth resource for REST API | Use `--resource "https://ai.azure.com"` with `az rest` | +| Bad API version | Using ARM-style API version | Use `api-version=v1` for the data-plane vector store API | +| No search results | Vector store is empty | Upload files to the vector store before querying | + +## References + +- [File Search tool documentation](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/tools/file-search?view=foundry&pivots=python) +- [Tool Catalog](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/tool-catalog?view=foundry) diff --git a/skills/microsoft-foundry/foundry-agent/create/references/tool-mcp.md b/skills/microsoft-foundry/foundry-agent/create/references/tool-mcp.md new file mode 100644 index 00000000..0a70e593 --- /dev/null +++ b/skills/microsoft-foundry/foundry-agent/create/references/tool-mcp.md @@ -0,0 +1,66 @@ +# MCP Tool (Model Context Protocol) + +Connect agents to remote MCP servers to extend capabilities with external tools and data sources. MCP is an open standard for LLM tool integration. + +## Prerequisites + +- A remote MCP server endpoint (e.g., `https://api.githubcopilot.com/mcp`) +- For authenticated servers: a [project connection](../../../project/connections.md) storing credentials +- RBAC: **Contributor** or **Owner** role on the Foundry project + +## Authenticated Server Connections + +For authenticated MCP servers, create an `api_key` project connection to store credentials. Unauthenticated servers (public endpoints) don't need a connection — omit `project_connection_id`. + +See [Project Connections](../../../project/connections.md) for connection management via Foundry MCP tools. + +## MCPTool Parameters + +| Parameter | Required | Description | +|-----------|----------|-------------| +| `server_label` | Yes | Unique label for this MCP server within the agent | +| `server_url` | Yes | Remote MCP server endpoint URL | +| `require_approval` | No | `"always"` (default), `"never"`, or `{"never": ["tool1"]}` / `{"always": ["tool1"]}` | +| `allowed_tools` | No | List of specific tools to enable (default: all) | +| `project_connection_id` | No | Connection ID for authenticated servers | + +## Approval Workflow + +1. Agent sends request → MCP server returns tool calls +2. Response contains `mcp_approval_request` items +3. Your code reviews tool name + arguments +4. Submit `McpApprovalResponse` with `approve=True/False` +5. Agent completes work using approved tool results + +> **Best practice:** Always use `require_approval="always"` unless you fully trust the MCP server. Use `allowed_tools` to restrict which tools the agent can access. + +## Hosting Local MCP Servers + +Agent Service only accepts **remote** MCP endpoints. To use a local server, deploy it to: + +| Platform | Transport | Notes | +|----------|-----------|-------| +| [Azure Container Apps](https://github.com/Azure-Samples/mcp-container-ts) | HTTP POST/GET | Any language, container rebuild needed | +| [Azure Functions](https://github.com/Azure-Samples/mcp-sdk-functions-hosting-python) | HTTP streamable | Python/Node/.NET/Java, key-based auth | + +## Known Limitations + +- **100-second timeout** for non-streaming MCP tool calls +- **Identity passthrough not supported in Teams** — agents published to Teams use project managed identity +- **Network-secured Foundry** can't use private MCP servers in same vNET — only public endpoints + +## Troubleshooting + +| Error | Cause | Fix | +|-------|-------|-----| +| `Invalid tool schema` | `anyOf`/`allOf` in MCP server definition | Update MCP server schema to use simple types | +| `Unauthorized` / `Forbidden` | Wrong credentials in connection | Verify connection credentials match server requirements | +| Model never calls MCP tool | Misconfigured server_label/url | Check `server_label`, `server_url`, `allowed_tools` values | +| Agent stalls after approval | Missing `previous_response_id` | Include `previous_response_id` in follow-up request | +| Timeout | Server takes >100s | Optimize server-side logic or break into smaller operations | + +## References + +- [MCP tool documentation](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/tools/mcp?view=foundry) +- [Tool Catalog](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/tool-catalog?view=foundry) +- [Project Connections](../../../project/connections.md) diff --git a/skills/microsoft-foundry/foundry-agent/create/references/tool-memory.md b/skills/microsoft-foundry/foundry-agent/create/references/tool-memory.md new file mode 100644 index 00000000..8ad90259 --- /dev/null +++ b/skills/microsoft-foundry/foundry-agent/create/references/tool-memory.md @@ -0,0 +1,109 @@ +# Agent Memory + +Managed long-term memory for Foundry agents. Enables agent continuity across sessions, devices, and workflows. Agents retain user preferences, conversation history, and deliver personalized experiences. Memory is stored in your project's owned storage. + +## Prerequisites + +- A [Foundry project](https://learn.microsoft.com/azure/ai-foundry/how-to/create-projects) with authorization configured +- A **chat model deployment** (e.g., `gpt-5.2`) +- An **embedding model deployment** (e.g., `text-embedding-3-small`) — see [Check Embedding Model](#check-embedding-model) below +- Python packages: `pip install azure-ai-projects azure-identity` + +### Check Embedding Model + +An embedding model is **required** before enabling memory. Check if one is already deployed: + +Use `foundry_models_list` MCP tool to list all deployments and look for an embedding model (e.g., `text-embedding-3-small`, `text-embedding-3-large`, `text-embedding-ada-002`). + +| Result | Action | +|--------|--------| +| ✅ Embedding model found | Note the deployment name and proceed | +| ❌ No embedding model | Deploy one before enabling memory — see below | + +### Deploy Embedding Model + +If no embedding model exists, use `foundry_models_deploy` MCP tool with: +- `deploymentName`: `text-embedding-3-small` (or preferred name) +- `modelName`: `text-embedding-3-small` +- `modelFormat`: `OpenAI` + +## Authorization and Permissions + +| Role | Scope | Purpose | +|------|-------|---------| +| **Azure AI User** | AI Services resource | Assigned to project managed identity | +| **System-assigned managed identity** | Project | Must be enabled on the project | + +**Setup steps:** +1. In Azure portal → project → **Resource Management** → **Identity** → enable system-assigned managed identity +2. On the AI Services resource → **Access control (IAM)** → assign **Azure AI User** to the project managed identity + +## Workflow + +``` +User wants agent memory + │ + ▼ +Step 1: Check for embedding model deployment + │ ├─ ✅ Found → Continue + │ └─ ❌ Not found → Deploy one (ask user) + │ + ▼ +Step 2: Create memory store + │ + ▼ +Step 3: Attach memory tool to agent + │ + ▼ +Step 4: Test with conversation +``` + +## Key Concepts + +### Memory Store Options + +| Option | Description | +|--------|-------------| +| `chat_summary_enabled` | Summarize conversations for memory | +| `user_profile_enabled` | Build and maintain user profile | +| `user_profile_details` | Control what data gets stored (e.g., `"Avoid sensitive data such as age, financials, location, credentials"`) | + +> 💡 **Tip:** Use `user_profile_details` to control what the agent stores — e.g., `"flight carrier preference and dietary restrictions"` for a travel agent, or exclude sensitive data. + +### Scope + +The `scope` parameter partitions memory per user: + +| Scope Value | Behavior | +|-------------|----------| +| `{{$userId}}` | Auto-extracts TID+OID from auth token (recommended) | +| `"user_123"` | Static identifier — you manage user mapping | + +### Memory Store Operations + +| Operation | Description | +|-----------|-------------| +| Create | Initialize a memory store with chat/embedding models and options | +| List | List all memory stores in the project | +| Update | Update memory store description or configuration | +| Delete scope | Delete memories for a specific user scope | +| Delete store | Delete entire memory store (irreversible — all scopes lost) | + +> ⚠️ **Warning:** Deleting a memory store removes all memories across all scopes. Agents with attached memory stores lose access to historical context. + +## Troubleshooting + +| Issue | Cause | Resolution | +|-------|-------|------------| +| Auth/authorization error | Identity or managed identity lacks required roles | Verify roles in Authorization section; refresh access token for REST | +| Memories don't appear after conversation | Updates are debounced or still processing | Increase wait time or call update API with `update_delay=0` | +| Memory search returns no results | Scope mismatch between update and search | Use same scope value for storing and retrieving memories | +| Agent response ignores stored memory | Agent not configured with memory search tool | Confirm agent definition includes `MemorySearchTool` with correct store name | +| No embedding model available | Embedding deployment missing | Deploy an embedding model — see Check Embedding Model section | + +## References + +- [Memory tool documentation](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/memory-usage?view=foundry) +- [Memory Concepts](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/what-is-memory) +- [Tool Catalog](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/tool-catalog?view=foundry) +- [Python Samples](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/ai/azure-ai-projects/samples/memories) diff --git a/skills/microsoft-foundry/foundry-agent/create/references/tool-web-search.md b/skills/microsoft-foundry/foundry-agent/create/references/tool-web-search.md new file mode 100644 index 00000000..7dfc9b3a --- /dev/null +++ b/skills/microsoft-foundry/foundry-agent/create/references/tool-web-search.md @@ -0,0 +1,57 @@ +# Web Search Tool (Preview) + +Enables agents to retrieve and ground responses with real-time public web information before generating output. Returns up-to-date answers with inline URL citations. This is the **default tool for web search** — no external resource or connection setup required. + +> ⚠️ **Warning:** For Bing Grounding or Bing Custom Search (which require a separate Bing resource and project connection), see [tool-bing-grounding.md](tool-bing-grounding.md). Only use those when explicitly requested. + +## Important Disclosures + +- Web Search (preview) uses Grounding with Bing Search and Grounding with Bing Custom Search, which are [First Party Consumption Services](https://www.microsoft.com/licensing/terms/product/Glossary/EAEAS) governed by [Grounding with Bing terms of use](https://www.microsoft.com/bing/apis/grounding-legal-enterprise) and the [Microsoft Privacy Statement](https://go.microsoft.com/fwlink/?LinkId=521839&clcid=0x409). +- The [Data Protection Addendum](https://aka.ms/dpa) **does not apply** to data sent to Grounding with Bing Search and Grounding with Bing Custom Search. +- Data transfers occur **outside compliance and geographic boundaries**. +- Usage incurs costs — see [pricing](https://www.microsoft.com/bing/apis/grounding-pricing). + +## Prerequisites + +- A [basic or standard agent environment](https://learn.microsoft.com/azure/ai-foundry/agents/environment-setup) +- Azure credentials configured (e.g., `DefaultAzureCredential`) + +## Setup + +No external resource or project connection is required. The web search tool works out of the box when added to an agent definition. + +## Configuration Options + +| Parameter | Description | Default | +|-----------|-------------|---------| +| `user_location` | Approximate location (country/region/city) for localized results | None | +| `search_context_size` | Context window space for search: `low`, `medium`, `high` | `medium` | + +## Administrator Control + +Admins can enable or disable web search at the subscription level via Azure CLI. Requires Owner or Contributor access. + +- **Disable:** `az feature register --name OpenAI.BlockedTools.web_search --namespace Microsoft.CognitiveServices --subscription ""` +- **Enable:** `az feature unregister --name OpenAI.BlockedTools.web_search --namespace Microsoft.CognitiveServices --subscription ""` + +## Security Considerations + +- Treat web search results as **untrusted input**. Validate before use in downstream systems. +- Avoid sending secrets or sensitive data in prompts forwarded to external services. + +## Troubleshooting + +| Issue | Cause | Resolution | +|-------|-------|------------| +| No citations appear | Model didn't determine web search was needed | Update instructions to explicitly allow web search; ask queries requiring current info | +| Requests fail after enabling | Web search disabled at subscription level | Ask admin to enable — see Administrator Control above | +| Authentication errors (REST) | Bearer token missing, expired, or insufficient | Refresh token; confirm project/agent access | +| Outdated results | Content not recently indexed by Bing | Refine query to request most recent info | +| No results for specific topics | Query too narrow | Broaden query; niche topics may have limited coverage | +| Rate limiting (429) | Too many requests | Implement exponential backoff; space out requests | + +## References + +- [Web Search tool documentation](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/tools/web-search?view=foundry) +- [Tool Catalog](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/tool-catalog?view=foundry) +- [Bing Pricing](https://www.microsoft.com/bing/apis/grounding-pricing) diff --git a/skills/microsoft-foundry/foundry-agent/deploy/deploy.md b/skills/microsoft-foundry/foundry-agent/deploy/deploy.md new file mode 100644 index 00000000..2a2a7891 --- /dev/null +++ b/skills/microsoft-foundry/foundry-agent/deploy/deploy.md @@ -0,0 +1,368 @@ +# Foundry Agent Deploy + +Create and manage agent deployments in Azure AI Foundry. For hosted agents, this includes the full workflow from containerizing the project to starting the agent container. + +## Quick Reference + +| Property | Value | +|----------|-------| +| Agent types | Prompt (LLM-based), Hosted (ACA based), Hosted (vNext) | +| MCP server | `foundry-mcp` | +| Key MCP tools | `agent_update`, `agent_container_control`, `agent_container_status_get` | +| CLI tools | `docker`, `az acr` (hosted agents only) | +| Container protocols | `a2a`, `responses`, `mcp` | +| Supported languages | .NET, Node.js, Python, Go, Java | + +## When to Use This Skill + +USE FOR: deploy agent to foundry, push agent to foundry, ship my agent, build and deploy container agent, deploy hosted agent, create hosted agent, deploy prompt agent, start agent container, stop agent container, ACR build, container image for agent, docker build for foundry, redeploy agent, update agent deployment, clone agent, delete agent, azd deploy hosted agent, azd ai agent, azd up for agent, deploy agent with azd. + +> ⚠️ **DO NOT manually run** `azd up`, `azd deploy`, `az acr build`, `docker build`, `agent_update`, or `agent_container_control` **without reading this skill first.** This skill orchestrates the full deployment pipeline: project scan → env var collection → Dockerfile generation → image build → agent creation → container startup → verification. Running CLI commands or calling MCP tools individually skips critical steps (env var confirmation, schema validation, status polling). + +## MCP Tools + +| Tool | Description | Parameters | +|------|-------------|------------| +| `agent_definition_schema_get` | Get JSON schema for agent definitions | `projectEndpoint` (required), `schemaType` (`prompt`, `hosted`, `tools`, `all`) | +| `agent_update` | Create, update, or clone an agent | `projectEndpoint`, `agentName` (required); `agentDefinition` (JSON), `isCloneRequest`, `cloneTargetAgentName`, `modelName`, `creationOptions` (JSON with `description` and `metadata`) | +| `agent_get` | List all agents or get a specific agent | `projectEndpoint` (required), `agentName` (optional) | +| `agent_delete` | Delete an agent with container cleanup | `projectEndpoint`, `agentName` (required) | +| `agent_container_control` | Start or stop a hosted agent container | `projectEndpoint`, `agentName`, `action` (`start`/`stop`) (required); `agentVersion`, `minReplicas`, `maxReplicas` | +| `agent_container_status_get` | Check container running status | `projectEndpoint`, `agentName` (required); `agentVersion` | + +## Workflow: Hosted Agent Deployment +There are two types of hosted agent - ACA based and vNext. There is only one change in the deployment flow for vNext which is indicated in the steps below. You must use vNext experience only when user explicitly asks you to deploy the agent to vNext (or v2, or v-next, or similar words). For all other cases, use the ACA based deployment flow. + + +### Step 1: Detect and Scan Project + +Get the project path from the project context (see Common: Project Context Resolution). Detect the project type by checking for these files: + +| Project Type | Detection Files | +|--------------|-----------------| +| .NET | `*.csproj`, `*.fsproj` | +| Node.js | `package.json` | +| Python | `requirements.txt`, `pyproject.toml`, `setup.py` | +| Go | `go.mod` | +| Java (Maven) | `pom.xml` | +| Java (Gradle) | `build.gradle` | + +Delegate an environment variable scan to a sub-agent. Provide the project path and project type. Search source files for these patterns: + +| Project Type | Patterns to Search | +|--------------|--------------------| +| .NET (`*.cs`) | `Environment.GetEnvironmentVariable("...")`, `configuration["..."]`, `configuration.GetValue("...")` | +| Node.js (`*.js`, `*.ts`, `*.mjs`) | `process.env.VAR_NAME`, `process.env["..."]` | +| Python (`*.py`) | `os.environ["..."]`, `os.environ.get("...")`, `os.getenv("...")` | +| Go (`*.go`) | `os.Getenv("...")`, `os.LookupEnv("...")` | +| Java (`*.java`) | `System.getenv("...")`, `@Value("${...}")` | + +Classification: if followed by a throw/error → required; if followed by a fallback value → optional with default; otherwise → assume required, ask user. + +### Step 2: Collect and Confirm Environment Variables + +> ⚠️ **Warning:** Environment variables are included in the agent payload and are difficult to change after deployment. + +Use azd environment values from the project context to pre-fill discovered variables. Merge with any user-provided values. Present all variables to the user for confirmation with variable name, value, and source (`azd`, `project default`, or `user`). Mask sensitive values. + +Loop until the user confirms or cancels: +- `yes` → Proceed +- `VAR_NAME=new_value` → Update the value, show updated table, ask again +- `cancel` → Abort deployment + +### Step 3: Generate Dockerfile and Build Image + +Delegate Dockerfile creation to a sub-agent. Guidelines: +- Use official base image for the detected language and runtime version +- Use multi-stage builds for compiled languages +- Use Alpine or slim variants for smaller images +- Always target `linux/amd64` platform +- Expose the correct port (usually 8088) + +> 💡 **Tip:** Reference [Hosted Agents Foundry Samples](https://github.com/microsoft-foundry/foundry-samples/tree/main/samples/python/hosted-agents) for containerized agent examples. + +Also generate `docker-compose.yml` and `.env` files for local development. + +**IMPORTANT**: You MUST always generate image tag as current timestamp (e.g., `myagent:202401011230`) to ensure uniqueness and avoid conflicts with existing images in ACR. DO NOT use static tags like `latest` or `v1`. + +Collect ACR details from project context. Let the user choose the build method: + +**Cloud Build (ACR Tasks) (Recommended)** — no local Docker required: +```bash +az acr build --registry --image : --platform linux/amd64 --source-acr-auth-id "[caller]" --file Dockerfile . +``` + +**Local Docker Build:** +```bash +docker build --platform linux/amd64 -t : -f Dockerfile . +az acr login --name +docker tag : .azurecr.io/: +docker push .azurecr.io/: +``` + +> 💡 **Tip:** Prefer Cloud Build if Docker is not available locally. On Windows with WSL, prefix Docker commands with `wsl -e` if `docker info` fails but `wsl -e docker info` succeeds. + +### Step 4: Collect Agent Configuration + +Use the project endpoint and ACR name from the project context. Ask the user only for values not already resolved: +- **Agent name** — Unique name for the agent +- **Model deployment** — Model deployment name (e.g., `gpt-4o`) + +### Step 5: Get Agent Definition Schema + +Use `agent_definition_schema_get` with `schemaType: hosted` to retrieve the current schema and validate required fields. + +### Step 6: Create the Agent + +> **VNext Experience:** You MUST pass `enableVnextExperience = true` in the `metadata` field of `creationOptions`. This is required for vNext deployments. + +Use `agent_update` with the agent definition: + +For ACA one: +```json +{ + "kind": "hosted", + "image": ".azurecr.io/:", + "cpu": "", + "memory": "", + "container_protocol_versions": [ + { "protocol": "", "version": "" } + ], + "environment_variables": { "": "" } +} +``` + +For vNext one: +```json +{ + "agentDefinition": { + "kind": "hosted", + "image": ".azurecr.io/:", + "cpu": "", + "memory": "", + "container_protocol_versions": [ + { "protocol": "", "version": "" } + ], + "environment_variables": { "": "" } + }, + "creationOptions": { + "metadata": { + "enableVnextExperience": "true" + } + } +} +``` + +### Step 7: Start Agent Container + +Use `agent_container_control` with `action: start` to start the container. + +### Step 8: Verify Agent Status + +Delegate status polling to a sub-agent. Provide the project endpoint, agent name, and instruct it to use `agent_container_status_get` repeatedly until the status is `Running` or `Failed`. + +**Container status values:** +- `Starting` — Container is initializing +- `Running` — Container is active and ready ✅ +- `Stopped` — Container has been stopped +- `Failed` — Container failed to start ❌ + +### Step 9: Test the Agent + +Read and follow the [invoke skill](../invoke/invoke.md) to send a test message and verify the agent responds correctly. DO NOT SKIP reading the invoke skill — it contains important information about how to format messages for hosted agents for vNext experience. + +> ⚠️ **DO NOT stop here.** Continue to Step 10 (Auto-Create Evaluators & Dataset). This step is mandatory after every successful deployment. + +### Step 10: Auto-Create Evaluators & Dataset + +Follow [After Deployment — Auto-Create Evaluators & Dataset](#after-deployment--auto-create-evaluators--dataset) below. + +## Workflow: Prompt Agent Deployment + +### Step 1: Collect Agent Configuration + +Use the project endpoint from the project context (see Common: Project Context Resolution). Ask the user only for values not already resolved: +- **Agent name** — Unique name for the agent +- **Model deployment** — Model deployment name (e.g., `gpt-4o`) +- **Instructions** — System prompt (optional) +- **Temperature** — Response randomness 0-2 (optional, default varies by model) +- **Tools** — Tool configurations (optional) + +### Step 2: Get Agent Definition Schema + +Use `agent_definition_schema_get` with `schemaType: prompt` to retrieve the current schema. + +### Step 3: Create the Agent + +Use `agent_update` with the agent definition: + +```json +{ + "kind": "prompt", + "model": "", + "instructions": "", + "temperature": 0.7 +} +``` + +### Step 4: Test the Agent + +Read and follow the [invoke skill](../invoke/invoke.md) to send a test message and verify the agent responds correctly. + +> ⚠️ **DO NOT stop here.** Continue to Step 5 (Auto-Create Evaluators & Dataset). This step is mandatory after every successful deployment. + +### Step 5: Auto-Create Evaluators & Dataset + +Follow [After Deployment — Auto-Create Evaluators & Dataset](#after-deployment--auto-create-evaluators--dataset) below. + +## Display Agent Information +Once deployment is done for either hosted or prompt agent, display the agent's details in a nicely formatted table. + +Below the table you MUST also display a Playground link for direct access to the agent in Azure AI Foundry: + +[Open in Playground](https://ai.azure.com/nextgen/r/{encodedSubId},{resourceGroup},,{accountName},{projectName}/build/agents/{agentName}/build?version={agentVersion}) + +To calculate the encodedSubId, you need to take subscription id and convert it into its 16-byte GUID, then encode it as URL-safe base64 without padding (= characters trimmed). You can use the following Python code to do this conversion: + +``` +python -c "import base64,uuid;print(base64.urlsafe_b64encode(uuid.UUID('').bytes).rstrip(b'=').decode())" +``` + +## Document Deployment Context + +After a successful deployment, persist the following to a `.env` or config file in the repo so future conversations (e.g., evaluation, monitoring) can pick them up automatically: + +| Variable | Purpose | Example | +|----------|---------|---------| +| `AZURE_AI_PROJECT_ENDPOINT` | Foundry project endpoint | `https://.services.ai.azure.com/api/projects/` | +| `AZURE_AI_AGENT_NAME` | Deployed agent name | `my-support-agent` | +| `AZURE_AI_AGENT_VERSION` | Current agent version | `1` | +| `AZURE_CONTAINER_REGISTRY` | ACR resource (hosted agents) | `myregistry.azurecr.io` | + +If a `.env` file already exists, read it first and merge — do not overwrite existing values without confirmation. + +## After Deployment — Auto-Create Evaluators & Dataset + +> ⚠️ **This step is automatic.** After a successful deployment, immediately prepare for evaluation without waiting for the user to request it. This matches the eval-driven optimization loop. + +### 1. Read Agent Instructions + +Use **`agent_get`** (or local `agent.yaml`) to understand the agent's purpose and capabilities. + +### 2. Select Default Evaluators + +| Category | Evaluators | +|----------|-----------| +| **Quality (built-in)** | intent_resolution, task_adherence, coherence | +| **Safety (include ≥2)** | violence, self_harm, hate_unfairness | + +### 3. Identify LLM-Judge Deployment + +Use **`model_deployment_get`** to find a suitable model (e.g., `gpt-4o`) for quality evaluators. + +### 4. Generate Local Test Dataset + +Use the identified LLM deployment to generate realistic test queries based on the agent's instructions and tool capabilities. Save to `datasets/-test.jsonl` with each line containing at minimum a `query` field (optionally `context`, `ground_truth`). + +> ⚠️ **Prefer local dataset generation.** Generate test queries locally and save to `datasets/*.jsonl` rather than using `generateSyntheticData=true` on the eval API. Local datasets provide reproducibility, version control, and can be reviewed before running evals. + +### 5. Persist Artifacts + +Save evaluator definitions to `evaluators/.yaml` and any locally generated test datasets to `datasets/*.jsonl`: + +``` +evaluators/ # custom evaluator definitions + .yaml # prompt text, scoring type, thresholds +datasets/ # locally generated input datasets + *.jsonl # test queries +``` + +### 6. Prompt User + +*"Your agent is deployed and running. Evaluators and a test dataset have been auto-configured. Would you like to run an evaluation to identify optimization opportunities?"* + +- **Yes** → follow the [observe skill](../observe/observe.md) starting at **Step 2 (Evaluate)** — evaluators and dataset are already prepared. +- **No** → stop. The user can return later. +- **Production trace analysis** → follow the [trace skill](../trace/trace.md) to search conversations, diagnose failures, and analyze latency using App Insights. + +## Agent Definition Schemas + +### Prompt Agent + +| Property | Type | Required | Description | +|----------|------|----------|-------------| +| `kind` | string | ✅ | Must be `"prompt"` | +| `model` | string | ✅ | Model deployment name (e.g., `gpt-4o`) | +| `instructions` | string | | System message for the model | +| `temperature` | number | | Response randomness (0-2) | +| `top_p` | number | | Nucleus sampling (0-1) | +| `tools` | array | | Tools the model may call | +| `tool_choice` | string/object | | Tool selection strategy | +| `rai_config` | object | | Responsible AI configuration | + +### Hosted Agent + +| Property | Type | Required | Description | +|----------|------|----------|-------------| +| `kind` | string | ✅ | Must be `"hosted"` | +| `image` | string | ✅ | Container image URL | +| `cpu` | string | ✅ | CPU allocation (e.g., `"0.5"`, `"1"`, `"2"`) | +| `memory` | string | ✅ | Memory allocation (e.g., `"1Gi"`, `"2Gi"`) | +| `container_protocol_versions` | array | ✅ | Protocol and version pairs | +| `environment_variables` | object | | Key-value pairs for container env vars | +| `tools` | array | | Tool configurations | +| `rai_config` | object | | Responsible AI configuration | + +> **Reminder:** Always pass `creationOptions.metadata.enableVnextExperience: "true"` when creating vNext hosted agents. + +### Container Protocols + +| Protocol | Description | +|----------|-------------| +| `a2a` | Agent-to-Agent protocol | +| `responses` | OpenAI Responses API | +| `mcp` | Model Context Protocol | + +## Agent Management Operations + +### Clone an Agent + +Use `agent_update` with `isCloneRequest: true` and `cloneTargetAgentName` to create a copy. For prompt agents, optionally override the model with `modelName`. + +### Delete an Agent + +Use `agent_delete` — automatically cleans up containers for hosted agents. + +### List Agents + +Use `agent_get` without `agentName` to list all agents, or with `agentName` to get a specific agent's details. + +## Error Handling + +| Error | Cause | Resolution | +|-------|-------|------------| +| Project type not detected | No known project files found | Ask user to specify project type manually | +| Docker not running | Docker Desktop not started or not installed | Start Docker Desktop, or use Cloud Build (ACR Tasks) instead | +| ACR login failed | Not authenticated to Azure | Run `az login` first, then `az acr login --name ` | +| Build/push failed | Dockerfile errors or insufficient ACR permissions | Check Dockerfile syntax, verify Contributor or AcrPush role on registry | +| Agent creation failed | Invalid definition or missing required fields | Use `agent_definition_schema_get` to verify schema, check all required fields | +| Container start failed | Image not accessible or invalid configuration | Verify ACR image path, check cpu/memory values, confirm ACR permissions | +| Container status: Failed | Runtime error in container | Check container logs, verify environment variables, ensure image runs correctly | +| Permission denied | Insufficient Foundry project permissions | Verify Azure AI Owner or Contributor role on the project | +| Schema fetch failed | Invalid project endpoint | Verify project endpoint URL format: `https://.services.ai.azure.com/api/projects/` | + +## Non-Interactive / YOLO Mode + +When running in non-interactive mode (e.g., `nonInteractive: true` or YOLO mode), the skill skips user confirmation prompts and uses sensible defaults: + +- **Environment variables** — Uses values resolved from `azd env get-values` and project defaults without prompting for confirmation +- **Agent name** — Must be provided in the initial user message or derived sensibly from the project context; if missing, the skill fails with an error instead of prompting +- **Container lifecycle** — Automatically starts the container and polls for `Running` status without user confirmation + +> ⚠️ **Warning:** In non-interactive mode, ensure all required values (project endpoint, agent name, ACR image) are provided upfront in the user message or available via `azd env get-values`. Missing values will cause the deployment to fail rather than prompt. + +## Additional Resources + +- [Foundry Hosted Agents](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/hosted-agents?view=foundry) +- [Foundry Agent Runtime Components](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/runtime-components?view=foundry) +- [Foundry Samples](https://github.com/microsoft-foundry/foundry-samples/) diff --git a/skills/microsoft-foundry/foundry-agent/eval-datasets/eval-datasets.md b/skills/microsoft-foundry/foundry-agent/eval-datasets/eval-datasets.md new file mode 100644 index 00000000..ab62846b --- /dev/null +++ b/skills/microsoft-foundry/foundry-agent/eval-datasets/eval-datasets.md @@ -0,0 +1,81 @@ +# Evaluation Datasets — Trace-to-Dataset Pipeline & Lifecycle Management + +Manage the full lifecycle of evaluation datasets for Foundry agents — from harvesting production traces into test datasets, through versioning and organization, to evaluation trending and regression detection. This skill closes the gap between **production observability** and **evaluation quality** by turning real-world agent failures into reproducible test cases. + +## When to Use This Skill + +USE FOR: create dataset from traces, harvest traces into dataset, build test dataset, dataset versioning, version my dataset, tag dataset, pin dataset version, organize datasets, dataset splits, curate test cases, review trace candidates, evaluation trending, metrics over time, eval regression, regression detection, compare evaluations over time, dataset comparison, evaluation lineage, trace to dataset pipeline, annotation review, production traces to test cases. + +> ⚠️ **DO NOT manually run** KQL queries to extract datasets or call `evaluation_dataset_create` **without reading this skill first.** This skill defines the correct trace extraction patterns, schema transformation, versioning conventions, and quality gates that raw tools do not enforce. + +> 💡 **Tip:** This skill complements the [observe skill](../observe/observe.md) (eval-driven optimization loop) and the [trace skill](../trace/trace.md) (production trace analysis). Use this skill when you need to **bridge traces and evaluations** — turning production data into test cases and tracking evaluation quality over time. + +## Quick Reference + +| Property | Value | +|----------|-------| +| MCP server | `foundry-mcp` | +| Key MCP tools | `evaluation_dataset_get`, `evaluation_get`, `evaluation_comparison_create`, `evaluation_comparison_get` | +| Azure services | Application Insights (via `monitor_resource_log_query`) | +| ⚠️ Not available | `evaluation_dataset_create` (dataset upload MCP not ready — use local JSONL + `inputData`) | +| Prerequisites | Agent deployed, App Insights connected (see [trace skill](../trace/trace.md)) | +| Artifact paths | `datasets/`, `results/`, `evaluators/` | + +## Entry Points + +| User Intent | Start At | +|-------------|----------| +| "Create dataset from production traces" / "Harvest traces" | [Trace-to-Dataset Pipeline](references/trace-to-dataset.md) | +| "Version my dataset" / "Tag dataset" / "Pin dataset version" | [Dataset Versioning](references/dataset-versioning.md) | +| "Organize my datasets" / "Dataset splits" / "Filter datasets" | [Dataset Organization](references/dataset-organization.md) | +| "Review trace candidates" / "Curate test cases" | [Dataset Curation](references/dataset-curation.md) | +| "Show eval metrics over time" / "Evaluation trending" | [Eval Trending](references/eval-trending.md) | +| "Did my agent regress?" / "Regression detection" | [Eval Regression](references/eval-regression.md) | +| "Compare datasets" / "Experiment comparison" / "A/B test" | [Dataset Comparison](references/dataset-comparison.md) | +| "Trace my evaluation lineage" / "Audit eval history" | [Eval Lineage](references/eval-lineage.md) | + +## Before Starting — Detect Current State + +1. Check `.env` for `AZURE_AI_PROJECT_ENDPOINT`, `AZURE_AI_AGENT_NAME`, and `APPLICATIONINSIGHTS_CONNECTION_STRING` +2. If App Insights is missing, resolve via [trace skill](../trace/trace.md) (Before Starting section) +3. Check `datasets/` for existing datasets and `results/` for evaluation history +4. Check if `evaluation_dataset_get` returns any server-side datasets +5. Route to the appropriate entry point based on user intent + +## The Foundry Flywheel + +This skill enables a closed-loop improvement cycle where production failures become regression tests: + +``` +Production Agent → [1] Trace (App Insights + OTel) + → [2] Harvest (KQL extraction) + → [3] Curate (human review) + → [4] Dataset (versioned, tagged) + → [5] Evaluate (batch eval) + → [6] Analyze (trending + regression) + → [7] Compare (version diff) + → [8] Deploy → back to [1] +``` + +Each cycle makes the test suite harder and more representative. Production failures from release N become regression tests for release N+1. + +## Behavioral Rules + +1. **Always show KQL queries.** Before executing any trace extraction query, display it in a code block. Never run queries silently. +2. **Scope to time ranges.** Always include a time range in KQL queries (default: last 7 days for trace harvesting). Ask user for the range if not specified. +3. **Require human review.** Never auto-commit harvested traces to a dataset without showing candidates to the user first. The curation step is mandatory. +4. **Use versioning conventions.** Follow the naming pattern `--v` (e.g., `support-bot-traces-v3`). +5. **Persist artifacts.** Save datasets to `datasets/`, evaluation results to `results/`, and track lineage in `datasets/manifest.json`. +6. **Confirm before overwriting.** If a dataset version already exists, warn the user and ask for confirmation before replacing. +7. **Never upload datasets to cloud storage.** Do not use blob upload, SAS URLs, or `evaluation_dataset_create`. Always persist datasets locally and reference them via `inputData` when running evaluations. +8. **Never remove dataset rows or weaken evaluators to recover scores.** Score drops after a dataset update are expected — harder tests expose real gaps. Optimize the agent for new failure patterns; do not shrink the test suite. + +## Related Skills + +| User Intent | Skill | +|-------------|-------| +| "Run an evaluation" / "Optimize my agent" | [observe skill](../observe/observe.md) | +| "Search traces" / "Analyze failures" / "Latency analysis" | [trace skill](../trace/trace.md) | +| "Find eval scores for a response ID" / "Link eval results to traces" | [trace skill → Eval Correlation](../trace/references/eval-correlation.md) (in `foundry-agent/trace/references/`) | +| "Deploy my agent" | [deploy skill](../deploy/deploy.md) | +| "Debug container issues" | [troubleshoot skill](../troubleshoot/troubleshoot.md) | diff --git a/skills/microsoft-foundry/foundry-agent/eval-datasets/references/dataset-comparison.md b/skills/microsoft-foundry/foundry-agent/eval-datasets/references/dataset-comparison.md new file mode 100644 index 00000000..ed5feca6 --- /dev/null +++ b/skills/microsoft-foundry/foundry-agent/eval-datasets/references/dataset-comparison.md @@ -0,0 +1,98 @@ +# Dataset Comparison — Experiment Framework & A/B Testing + +Run structured experiments that compare agent versions against the same dataset, and present results as leaderboards with per-evaluator breakdowns. + +## Experiment Structure + +An experiment consists of: +1. **One pinned dataset version** — ensures fair comparison +2. **Multiple agent versions** — the variables being compared +3. **Same evaluators** — applied consistently across all versions +4. **Comparison results** — which version wins on each metric + +## Step 1 — Define the Experiment + +| Parameter | Value | Example | +|-----------|-------|---------| +| Dataset | Pinned version from `datasets/manifest.json` | `support-bot-traces-v3` (tag: `prod`) | +| Baseline | Agent version to compare against | `v2` | +| Treatment(s) | Agent version(s) to evaluate | `v3`, `v4` | +| Evaluators | Same set for all runs | coherence, fluency, relevance, intent_resolution, task_adherence | + +## Step 2 — Run Evaluations + +For each agent version, run **`evaluation_agent_batch_eval_create`** with: +- Same `evaluationId` (groups all runs for comparison) +- Same `inputData` (from the pinned dataset) +- Same `evaluatorNames` +- Different `agentVersion` + +> **Important:** Use `evaluationId` (NOT `evalId`) to group runs. All versions must be in the same evaluation group for comparison to work. + +## Step 3 — Compare Results + +Use **`evaluation_comparison_create`** with the baseline and treatment runs: + +```json +{ + "insightRequest": { + "displayName": "Experiment: v2 vs v3 vs v4 on traces-v3", + "state": "NotStarted", + "request": { + "type": "EvaluationComparison", + "evalId": "", + "baselineRunId": "", + "treatmentRunIds": ["", ""] + } + } +} +``` + +## Step 4 — Leaderboard + +Present results as a leaderboard table: + +| Evaluator | v2 (baseline) | v3 | v4 | Best | +|-----------|:---:|:---:|:---:|:---:| +| Coherence | 3.5 | 4.1 | 4.0 | ✅ v3 | +| Fluency | 4.2 | 4.4 | 4.5 | ✅ v4 | +| Relevance | 3.0 | 3.8 | 3.6 | ✅ v3 | +| Intent Resolution | 3.3 | 4.0 | 4.1 | ✅ v4 | +| Task Adherence | 2.8 | 3.5 | 3.9 | ✅ v4 | +| **Wins** | **0** | **2** | **3** | — | + +### Recommendation + +Based on the comparison: + +*"v4 wins on 3/5 evaluators (Fluency, Intent Resolution, Task Adherence). v3 wins on 2/5 (Coherence, Relevance). Recommend deploying v4 with additional prompt tuning to recover Relevance."* + +## Pairwise A/B Comparison + +For detailed pairwise analysis between exactly two versions: + +| Evaluator | Baseline (v2) | Treatment (v3) | Delta | p-value | Effect | +|-----------|:---:|:---:|:---:|:---:|:---:| +| Coherence | 3.5 ± 0.8 | 4.1 ± 0.6 | +0.6 | 0.02 | Improved | +| Fluency | 4.2 ± 0.5 | 4.4 ± 0.4 | +0.2 | 0.15 | Inconclusive | +| Relevance | 3.0 ± 1.1 | 3.8 ± 0.9 | +0.8 | 0.01 | Improved | + +> 💡 **Tip:** The `evaluation_comparison_create` result includes `pValue` and `treatmentEffect` fields. Use `pValue < 0.05` as the threshold for statistical significance. + +## Multi-Dataset Comparison + +Compare how the same agent version performs across different datasets: + +| Dataset | Coherence | Fluency | Relevance | Notes | +|---------|:---------:|:-------:|:---------:|-------| +| traces-v3 (prod) | 4.0 | 4.5 | 3.6 | Production-derived | +| synthetic-v2 | 4.3 | 4.6 | 4.1 | May overestimate quality | +| manual-v1 (curated) | 3.8 | 4.4 | 3.2 | Hardest test cases | + +> ⚠️ **Warning:** Be cautious comparing scores across different datasets. Differences may reflect dataset difficulty, not agent quality. Always compare agent versions on the same dataset. + +## Next Steps + +- **Track trends over time** → [Eval Trending](eval-trending.md) +- **Check for regressions** → [Eval Regression](eval-regression.md) +- **Audit full lineage** → [Eval Lineage](eval-lineage.md) diff --git a/skills/microsoft-foundry/foundry-agent/eval-datasets/references/dataset-curation.md b/skills/microsoft-foundry/foundry-agent/eval-datasets/references/dataset-curation.md new file mode 100644 index 00000000..da1d76d2 --- /dev/null +++ b/skills/microsoft-foundry/foundry-agent/eval-datasets/references/dataset-curation.md @@ -0,0 +1,102 @@ +# Dataset Curation — Human-in-the-Loop Review + +Review, annotate, and approve harvested trace candidates before including them in evaluation datasets. This ensures dataset quality by adding a human review gate between raw trace extraction and finalized test cases. + +## Workflow Overview + +``` +Raw Traces (from KQL harvest) + │ + ▼ +[1] Candidate File (unreviewed) + │ + ▼ +[2] Human Review (approve/edit/reject each) + │ + ▼ +[3] Approved Dataset (versioned, ready for eval) +``` + +## Step 1 — Generate Candidate File + +After running a [trace harvest](trace-to-dataset.md), save candidates with a `status` field: + +``` +datasets/-candidates-.jsonl +``` + +Each line includes a review status: + +```json +{"query": "How do I reset my password?", "response": "...", "status": "pending", "metadata": {"source": "trace", "conversationId": "conv-abc-123", "harvestRule": "error", "errorType": "TimeoutError", "duration": 12300}} +{"query": "What's the refund policy?", "response": "...", "status": "pending", "metadata": {"source": "trace", "conversationId": "conv-def-456", "harvestRule": "latency", "duration": 8700}} +``` + +## Step 2 — Present for Review + +Show candidates in a review table: + +| # | Status | Query (preview) | Source | Error | Duration | Eval Score | +|---|--------|----------------|--------|-------|----------|------------| +| 1 | ⏳ pending | "How do I reset my..." | error harvest | TimeoutError | 12.3s | — | +| 2 | ⏳ pending | "What's the refund..." | latency harvest | — | 8.7s | — | +| 3 | ⏳ pending | "Can you help me..." | low-eval harvest | — | 0.4s | 2.0 | + +### Review Actions + +For each candidate, the user can: + +| Action | Result | +|--------|--------| +| **Approve** | Include in dataset as-is | +| **Approve + Edit** | Include with modified query/response/ground_truth | +| **Add Ground Truth** | Approve and add the expected correct answer | +| **Reject** | Exclude from dataset | +| **Flag** | Mark for later review | + +### Batch Operations + +- *"Approve all"* — include all pending candidates +- *"Approve all errors"* — include all candidates from error harvest +- *"Reject duplicates"* — exclude candidates with similar queries to existing dataset entries +- *"Approve #1, #3, #5; reject #2, #4"* — selective approval by number + +## Step 3 — Finalize Dataset + +After review, filter approved candidates and save to a versioned dataset: + +1. Read `datasets/manifest.json` to find the latest version number +2. Filter candidates where `status == "approved"` +3. Remove the `status` field from the output +4. Save to `datasets/--v.jsonl` +5. Update `datasets/manifest.json` with metadata + +### Update Candidate Status + +Mark the candidate file with final statuses: + +```json +{"query": "How do I reset my password?", "status": "approved", "ground_truth": "Navigate to Settings > Security > Reset Password", "metadata": {...}} +{"query": "What's the refund policy?", "status": "rejected", "rejectReason": "duplicate of existing test case", "metadata": {...}} +{"query": "Can you help me...", "status": "approved", "metadata": {...}} +``` + +> 💡 **Tip:** Keep candidate files as an audit trail. They document what was reviewed, when, and why items were accepted or rejected. + +## Quality Checks + +Before finalizing, verify dataset quality: + +| Check | Criteria | +|-------|----------| +| **No duplicates** | Ensure no query appears in both the new dataset and existing datasets | +| **Balanced categories** | Verify reasonable distribution across categories (not all edge-cases) | +| **Ground truth coverage** | Flag examples without ground_truth that may benefit from one | +| **Minimum size** | Warn if dataset has fewer than 20 examples (may not be statistically meaningful) | +| **Safety coverage** | Ensure safety-related test cases are included if the agent handles sensitive topics | + +## Next Steps + +- **Version the approved dataset** → [Dataset Versioning](dataset-versioning.md) +- **Organize into splits** → [Dataset Organization](dataset-organization.md) +- **Run evaluation** → [observe skill Step 2](../../observe/references/evaluate-step.md) diff --git a/skills/microsoft-foundry/foundry-agent/eval-datasets/references/dataset-organization.md b/skills/microsoft-foundry/foundry-agent/eval-datasets/references/dataset-organization.md new file mode 100644 index 00000000..1e627521 --- /dev/null +++ b/skills/microsoft-foundry/foundry-agent/eval-datasets/references/dataset-organization.md @@ -0,0 +1,112 @@ +# Dataset Organization — Metadata, Splits, and Filtered Evaluation + +Organize datasets using metadata fields, create train/validation/test splits, and run targeted evaluations on dataset subsets. This addresses the need for hierarchical dataset organization without requiring rigid container structures. + +## Metadata Schema + +Add metadata to each JSONL example to enable filtering and organization: + +| Field | Values | Purpose | +|-------|--------|---------| +| `category` | `edge-case`, `regression`, `happy-path`, `multi-turn`, `safety` | Test case classification | +| `source` | `trace`, `synthetic`, `manual`, `feedback` | How the example was created | +| `split` | `train`, `val`, `test` | Dataset split assignment | +| `priority` | `P0`, `P1`, `P2` | Severity/importance ranking | +| `harvestRule` | `error`, `latency`, `low-eval`, `combined` | Which harvest template captured it | +| `agentVersion` | `"1"`, `"2"`, etc. | Agent version when trace was captured | + +### Example JSONL with Metadata + +```json +{"query": "Reset my password", "ground_truth": "Navigate to Settings > Security > Reset Password", "metadata": {"category": "happy-path", "source": "manual", "split": "test", "priority": "P0"}} +{"query": "What happens if I delete my account while a refund is pending?", "metadata": {"category": "edge-case", "source": "trace", "split": "test", "priority": "P1", "harvestRule": "error"}} +{"query": "I want to harm myself", "ground_truth": "I'm concerned about your safety. Please contact...", "metadata": {"category": "safety", "source": "manual", "split": "test", "priority": "P0"}} +``` + +## Creating Splits + +### Automatic Split Assignment + +When creating a new dataset, assign splits based on rules: + +| Rule | Split | Rationale | +|------|-------|-----------| +| First 70% of examples | `train` | Bulk of data for development | +| Next 15% of examples | `val` | Validation during optimization | +| Final 15% of examples | `test` | Held-out for final evaluation | +| All `priority: P0` examples | `test` | Critical cases always in test | +| All `category: safety` examples | `test` | Safety always evaluated | + +### Manual Split Assignment + +Users can assign splits during [curation](dataset-curation.md) or by editing the JSONL metadata directly. + +## Filtered Evaluation Runs + +Run evaluations on specific subsets of a dataset by filtering JSONL before passing to the evaluator. + +### Filter by Split + +```python +import json + +# Read full dataset +with open("datasets/support-bot-traces-v3.jsonl") as f: + examples = [json.loads(line) for line in f] + +# Filter to test split only +test_examples = [e for e in examples if e.get("metadata", {}).get("split") == "test"] + +# Pass test_examples as inputData to evaluation_agent_batch_eval_create +``` + +### Filter by Category + +```python +# Only edge cases +edge_cases = [e for e in examples if e.get("metadata", {}).get("category") == "edge-case"] + +# Only safety test cases +safety_cases = [e for e in examples if e.get("metadata", {}).get("category") == "safety"] + +# Only P0 critical cases +p0_cases = [e for e in examples if e.get("metadata", {}).get("priority") == "P0"] +``` + +### Filter by Source + +```python +# Only production trace-derived cases (most representative) +trace_cases = [e for e in examples if e.get("metadata", {}).get("source") == "trace"] + +# Only manually curated cases (highest quality ground truth) +manual_cases = [e for e in examples if e.get("metadata", {}).get("source") == "manual"] +``` + +## Dataset Statistics + +Generate summary statistics to understand dataset composition: + +```python +from collections import Counter + +categories = Counter(e.get("metadata", {}).get("category", "unknown") for e in examples) +sources = Counter(e.get("metadata", {}).get("source", "unknown") for e in examples) +splits = Counter(e.get("metadata", {}).get("split", "unassigned") for e in examples) +priorities = Counter(e.get("metadata", {}).get("priority", "none") for e in examples) +``` + +Present as a table: + +| Dimension | Values | Count | +|-----------|--------|-------| +| **Category** | happy-path: 20, edge-case: 15, regression: 8, safety: 5, multi-turn: 10 | 58 total | +| **Source** | trace: 30, synthetic: 18, manual: 10 | 58 total | +| **Split** | train: 40, val: 9, test: 9 | 58 total | +| **Priority** | P0: 12, P1: 25, P2: 21 | 58 total | + +## Next Steps + +- **Run targeted evaluation** → [observe skill Step 2](../../observe/references/evaluate-step.md) (pass filtered `inputData`) +- **Compare splits** → [Dataset Comparison](dataset-comparison.md) +- **Track lineage** → [Eval Lineage](eval-lineage.md) diff --git a/skills/microsoft-foundry/foundry-agent/eval-datasets/references/dataset-versioning.md b/skills/microsoft-foundry/foundry-agent/eval-datasets/references/dataset-versioning.md new file mode 100644 index 00000000..f0fa5f4e --- /dev/null +++ b/skills/microsoft-foundry/foundry-agent/eval-datasets/references/dataset-versioning.md @@ -0,0 +1,156 @@ +# Dataset Versioning — Version Management & Tagging + +Manage dataset versions with naming conventions, tagging, and version pinning for reproducible evaluations. This workflow formalizes dataset lifecycle management using existing MCP tools and local conventions. + +## Naming Convention + +Use the pattern `--v`: + +| Component | Values | Example | +|-----------|--------|---------| +| `` | Agent name from `.env` | `support-bot` | +| `` | `traces`, `synthetic`, `manual`, `combined` | `traces` | +| `v` | Incremental version number | `v3` | + +**Full examples:** +- `support-bot-traces-v1` — first dataset from trace harvesting +- `support-bot-synthetic-v2` — second synthetic dataset +- `support-bot-combined-v5` — fifth dataset combining traces + manual examples + +## Tagging Conventions + +Tags are stored in `datasets/manifest.json` alongside dataset metadata: + +| Tag | Meaning | When to Apply | +|-----|---------|---------------| +| `baseline` | Reference dataset for comparison | When establishing a new evaluation baseline | +| `prod` | Dataset used for current production evaluation | After successful deployment | +| `canary` | Dataset for canary/staging evaluation | During staged rollout | +| `regression-` | Dataset that caught a regression | When a regression is detected | +| `deprecated` | Dataset no longer in active use | When replaced by a newer version | + +## Version Pinning + +Pin evaluations to a specific dataset version to ensure reproducible, comparable results: + +### Local Pinning (JSONL Datasets) + +When using local JSONL files, reference the exact filename in evaluation runs: + +``` +datasets/support-bot-traces-v3.jsonl ← pinned by filename +``` + +Pass the contents via `inputData` parameter in **`evaluation_agent_batch_eval_create`**. + +### ~~Server-Side Pinning~~ (Not Available) + +> ⚠️ **Dataset upload MCP tools are not yet ready.** Skip `evaluation_dataset_create` (uploads) for now. You may use `evaluation_dataset_get` for read-only inspection of any existing server-side datasets, but do **not** rely on them for version pinning—use local JSONL files and pass data via `inputData` when running evaluations. + +## Manifest File + +Track all dataset versions, tags, and lineage in `datasets/manifest.json`: + +```json +{ + "datasets": [ + { + "name": "support-bot-traces-v1", + "file": "support-bot-traces-v1.jsonl", + "version": "1", + "tag": "deprecated", + "source": "trace-harvest", + "harvestRule": "error", + "timeRange": "2025-01-01 to 2025-01-07", + "exampleCount": 32, + "createdAt": "2025-01-08T10:00:00Z", + "evalRunIds": ["run-abc-123"] + }, + { + "name": "support-bot-traces-v2", + "file": "support-bot-traces-v2.jsonl", + "version": "2", + "tag": "baseline", + "source": "trace-harvest", + "harvestRule": "error+latency", + "timeRange": "2025-01-15 to 2025-01-21", + "exampleCount": 47, + "createdAt": "2025-01-22T10:00:00Z", + "evalRunIds": ["run-def-456", "run-ghi-789"] + }, + { + "name": "support-bot-traces-v3", + "file": "support-bot-traces-v3.jsonl", + "version": "3", + "tag": "prod", + "source": "trace-harvest", + "harvestRule": "error+latency+low-eval", + "timeRange": "2025-02-01 to 2025-02-07", + "exampleCount": 63, + "createdAt": "2025-02-08T10:00:00Z", + "evalRunIds": [] + } + ] +} +``` + +## Creating a New Version + +1. **Check existing versions**: Read `datasets/manifest.json` to find the latest version number +2. **Increment version**: Use `v` as the new version +3. **Create dataset**: Via [Trace-to-Dataset](trace-to-dataset.md) or manual JSONL creation +4. **Update manifest**: Add the new entry with metadata +5. **Tag appropriately**: Apply `baseline`, `prod`, or other tags as needed +6. **Deprecate old**: Optionally mark previous versions as `deprecated` + +> ⚠️ **DO NOT stop here.** After creating a new dataset version, continue to the Dataset Update Loop below. + +## Dataset Update Loop — Eval → Analyze → Optimize → Re-Eval + +When a dataset is updated (new rows, better coverage, new failure modes), run this loop to validate the agent against the harder test suite: + +``` +[1] Eval with new dataset (v2) using same agent version + │ + ▼ +[2] Compare: eval on v1 vs eval on v2 (same agent, different datasets) + │ + ▼ +[3] Analyze score changes — expect some drops (harder tests ≠ worse agent) + │ + ▼ +[4] Optimize agent prompt based on NEW failure patterns only + │ + ▼ +[5] Re-eval optimized agent on v2 dataset → compare to pre-optimization + │ + ▼ +[6] If satisfied → tag v2 as `prod`, archive v1 +``` + +### ⛔ Guardrails for This Loop + +- **Never remove dataset rows to recover scores.** If eval scores drop after a dataset update, the dataset is likely exposing real gaps. Removing hard cases defeats the purpose. +- **Never weaken evaluators to recover scores.** Do not lower thresholds, remove evaluators, or switch to easier scoring when scores drop on an expanded dataset. +- **Distinguish dataset difficulty from agent regression.** A score drop on a harder dataset is expected and healthy — it means test coverage improved. Only flag as regression when the same dataset + same evaluators produce worse scores on a new agent version. +- **Optimize for NEW failure patterns only.** When optimizing the agent prompt after a dataset update, target the newly added test cases. Do not re-optimize for cases that were already passing. + +## Comparing Versions + +To understand how a dataset evolved between versions: + +```bash +# Count examples per version +wc -l datasets/support-bot-traces-v*.jsonl + +# Diff example queries between versions +jq -r '.query' datasets/support-bot-traces-v2.jsonl | sort > /tmp/v2-queries.txt +jq -r '.query' datasets/support-bot-traces-v3.jsonl | sort > /tmp/v3-queries.txt +diff /tmp/v2-queries.txt /tmp/v3-queries.txt +``` + +## Next Steps + +- **Organize into splits** → [Dataset Organization](dataset-organization.md) +- **Run evaluation with pinned version** → [observe skill Step 2](../../observe/references/evaluate-step.md) +- **Track lineage** → [Eval Lineage](eval-lineage.md) diff --git a/skills/microsoft-foundry/foundry-agent/eval-datasets/references/eval-lineage.md b/skills/microsoft-foundry/foundry-agent/eval-datasets/references/eval-lineage.md new file mode 100644 index 00000000..0c6b56bc --- /dev/null +++ b/skills/microsoft-foundry/foundry-agent/eval-datasets/references/eval-lineage.md @@ -0,0 +1,125 @@ +# Eval Lineage — Full Traceability from Production to Deployment + +Track the complete chain from production traces through dataset creation, evaluation runs, comparisons, and deployment decisions. Enables "why was this deployed?" audit queries and compliance reporting. + +## Lineage Chain + +``` +Production Trace (App Insights) + │ conversationId, responseId + ▼ +Dataset Version (datasets/*.jsonl) + │ metadata.conversationId, metadata.harvestRule + ▼ +Evaluation Run (evaluation_agent_batch_eval_create) + │ evaluationId, evalRunId + ▼ +Comparison (evaluation_comparison_create) + │ insightId, baselineRunId, treatmentRunIds + ▼ +Deployment Decision (agent_update + agent_container_control) + │ agentVersion + ▼ +Production Trace (cycle repeats) +``` + +## Lineage Manifest + +Track lineage in `datasets/manifest.json`: + +```json +{ + "datasets": [ + { + "name": "support-bot-traces-v3", + "file": "support-bot-traces-v3.jsonl", + "version": "3", + "tag": "prod", + "source": "trace-harvest", + "harvestRule": "error+latency", + "timeRange": "2025-02-01 to 2025-02-07", + "exampleCount": 63, + "createdAt": "2025-02-08T10:00:00Z", + "evalRuns": [ + { + "evalId": "eval-group-001", + "runId": "run-abc-123", + "agentVersion": "3", + "date": "2025-02-08T12:00:00Z", + "status": "completed" + }, + { + "evalId": "eval-group-001", + "runId": "run-def-456", + "agentVersion": "4", + "date": "2025-02-10T09:00:00Z", + "status": "completed" + } + ], + "comparisons": [ + { + "insightId": "insight-xyz-789", + "baselineRunId": "run-abc-123", + "treatmentRunIds": ["run-def-456"], + "result": "v4 improved on 3/5 metrics", + "date": "2025-02-10T10:00:00Z" + } + ], + "deployments": [ + { + "agentVersion": "4", + "deployedAt": "2025-02-10T14:00:00Z", + "reason": "v4 improved coherence +25%, relevance +10% vs v3" + } + ] + } + ] +} +``` + +## Audit Queries + +### "Why was version X deployed?" + +1. Read `datasets/manifest.json` +2. Find entries where `deployments[].agentVersion == X` +3. Show the comparison that justified the deployment +4. Show the dataset and eval runs that informed the comparison + +### "What traces led to this dataset?" + +1. Read the dataset JSONL file +2. Extract `metadata.conversationId` from each example +3. Look up each conversation in App Insights using the [trace skill](../../trace/trace.md) + +### "What evaluation history does this agent have?" + +1. Use **`evaluation_get`** to list all evaluation groups +2. For each group, list runs with `isRequestForRuns=true` +3. Build the timeline from [Eval Trending](eval-trending.md) +4. Show comparisons from **`evaluation_comparison_get`** + +### "Did this dataset version catch any regressions?" + +1. Find the dataset version in the manifest +2. Check `evalRuns` for runs that used this dataset +3. Check `comparisons` for any regression results +4. Cross-reference with `tag == "regression-"` entries + +## Maintaining Lineage + +Update `datasets/manifest.json` at each step: + +| Event | Fields to Update | +|-------|-----------------| +| Dataset created | Add new entry with `name`, `version`, `source`, `exampleCount` | +| Evaluation run | Append to `evalRuns[]` with `evalId`, `runId`, `agentVersion` | +| Comparison | Append to `comparisons[]` with `insightId`, `result` | +| Deployment | Append to `deployments[]` with `agentVersion`, `reason` | +| Tag change | Update `tag` field | + +## Next Steps + +- **View metric trends** → [Eval Trending](eval-trending.md) +- **Check for regressions** → [Eval Regression](eval-regression.md) +- **Harvest new traces** → [Trace-to-Dataset](trace-to-dataset.md) (start the next cycle) diff --git a/skills/microsoft-foundry/foundry-agent/eval-datasets/references/eval-regression.md b/skills/microsoft-foundry/foundry-agent/eval-datasets/references/eval-regression.md new file mode 100644 index 00000000..c9377de2 --- /dev/null +++ b/skills/microsoft-foundry/foundry-agent/eval-datasets/references/eval-regression.md @@ -0,0 +1,121 @@ +# Eval Regression — Automated Regression Detection + +Automatically detect when evaluation metrics degrade between agent versions. Compare each evaluation run against the baseline and generate pass/fail verdicts with actionable recommendations. + +## Prerequisites + +- At least 2 evaluation runs in the same evaluation group +- Baseline run identified (either the first run or the one tagged as `baseline`) + +## Step 1 — Identify Baseline and Treatment + +### Automatic Baseline Selection + +1. Read `datasets/manifest.json` and find the dataset tagged `baseline`. +2. If the baseline dataset entry includes a stored `baselineRunId` (or mapping to one or more `evalRunIds`), use that `baselineRunId` as the baseline run. +3. If no explicit `baselineRunId` is recorded, select the first (oldest) run in the evaluation group as the baseline. + +### Treatment Selection + +The latest (most recent) run in the evaluation group is the treatment. + +## Step 2 — Run Comparison + +Use **`evaluation_comparison_create`** to compare baseline vs treatment: + +> **Critical:** `displayName` is **required** in the `insightRequest`. Despite the MCP tool schema showing it as optional, the API rejects requests without it. + +```json +{ + "insightRequest": { + "displayName": "Regression Check - v1 vs v4", + "state": "NotStarted", + "request": { + "type": "EvaluationComparison", + "evalId": "", + "baselineRunId": "", + "treatmentRunIds": [""] + } + } +} +``` + +Retrieve results with **`evaluation_comparison_get`** using the returned `insightId`. + +## Step 3 — Regression Verdicts + +For each evaluator in the comparison results, apply regression thresholds: + +| Treatment Effect | Delta | Verdict | Action | +|-----------------|-------|---------|--------| +| `Improved` | > +2% | ✅ PASS | No action needed | +| `Changed` | ±2% | ⚠️ NEUTRAL | Monitor, no immediate action | +| `Degraded` | > -2% | 🔴 REGRESSION | Investigate and remediate | +| `Inconclusive` | — | ❓ INCONCLUSIVE | Increase sample size and re-run | +| `TooFewSamples` | — | ❓ INSUFFICIENT DATA | Need more test cases (≥30 recommended) | + +### Example Regression Report + +``` +╔═══════════════════════════════════════════════════════════════╗ +║ REGRESSION REPORT: v1 (baseline) → v4 ║ +╠═══════════════════════════════════════════════════════════════╣ +║ Evaluator │ Baseline │ Treatment │ Delta │ Verdict ║ +╠════════════════════╪══════════╪═══════════╪════════╪═════════╣ +║ Coherence │ 3.2 │ 4.0 │ +0.8 │ ✅ PASS ║ +║ Fluency │ 4.1 │ 4.5 │ +0.4 │ ✅ PASS ║ +║ Relevance │ 2.8 │ 3.6 │ +0.8 │ ✅ PASS ║ +║ Intent Resolution │ 3.0 │ 4.1 │ +1.1 │ ✅ PASS ║ +║ Task Adherence │ 2.5 │ 3.9 │ +1.4 │ ✅ PASS ║ +║ Safety │ 0.95 │ 0.98 │ +0.03 │ ✅ PASS ║ +╠═══════════════════════════════════════════════════════════════╣ +║ OVERALL: ✅ ALL EVALUATORS PASSED — Safe to deploy ║ +╚═══════════════════════════════════════════════════════════════╝ +``` + +### Example with Regression + +``` +╔═══════════════════════════════════════════════════════════════╗ +║ REGRESSION REPORT: v3 → v4 ║ +╠═══════════════════════════════════════════════════════════════╣ +║ Evaluator │ v3 │ v4 │ Delta │ Verdict ║ +╠════════════════════╪══════════╪═══════════╪════════╪═════════╣ +║ Coherence │ 4.1 │ 4.0 │ -0.1 │ ⚠️ NEUT║ +║ Fluency │ 4.4 │ 4.5 │ +0.1 │ ✅ PASS ║ +║ Relevance │ 4.0 │ 3.6 │ -0.4 │ 🔴 REGR║ +║ Intent Resolution │ 4.2 │ 4.1 │ -0.1 │ ⚠️ NEUT║ +║ Task Adherence │ 3.8 │ 3.9 │ +0.1 │ ✅ PASS ║ +║ Safety │ 0.96 │ 0.98 │ +0.02 │ ✅ PASS ║ +╠═══════════════════════════════════════════════════════════════╣ +║ OVERALL: 🔴 REGRESSION DETECTED on Relevance (-10%) ║ +║ RECOMMENDATION: Do NOT deploy v4. Investigate relevance drop.║ +╚═══════════════════════════════════════════════════════════════╝ +``` + +## Step 4 — Remediation Recommendations + +When regression is detected, provide actionable guidance: + +| Regression Type | Likely Cause | Recommended Action | +|----------------|-------------|-------------------| +| Relevance drop | Prompt changes reduced focus on user query | Review prompt diff, restore relevance instructions | +| Coherence drop | Added conflicting instructions | Simplify prompt, use `prompt_optimize` | +| Safety regression | Removed safety guardrails | Restore safety instructions, add safety test cases | +| Task adherence drop | Tool configuration changed | Verify tool definitions, check for missing tools | +| Across-the-board drop | Dataset drift or model change | Check if evaluation dataset changed, verify model deployment | + +## CI/CD Integration + +Include regression checks in automated pipelines. See [observe skill CI/CD](../../observe/references/cicd-monitoring.md) for GitHub Actions workflow templates that: + +1. Run batch evaluation after every deployment +2. Compare against baseline +3. Block deployment if any evaluator shows > 5% regression +4. Alert team via GitHub issue or Slack webhook + +## Next Steps + +- **View full trend history** → [Eval Trending](eval-trending.md) +- **Optimize to fix regression** → [observe skill Step 4](../../observe/references/optimize-deploy.md) +- **Roll back if critical** → [deploy skill](../../deploy/deploy.md) diff --git a/skills/microsoft-foundry/foundry-agent/eval-datasets/references/eval-trending.md b/skills/microsoft-foundry/foundry-agent/eval-datasets/references/eval-trending.md new file mode 100644 index 00000000..6ea2d45c --- /dev/null +++ b/skills/microsoft-foundry/foundry-agent/eval-datasets/references/eval-trending.md @@ -0,0 +1,91 @@ +# Eval Trending — Metrics Over Time + +Track evaluation metrics across multiple runs and versions to visualize improvement trends and detect regressions. This addresses the gap of understanding how agent quality changes over time. + +## Prerequisites + +- At least 2 evaluation runs in the same evaluation group (same `evaluationId`) +- Project endpoint available in `.env` + +## Step 1 — Retrieve Evaluation History + +Use **`evaluation_get`** to list all evaluation groups: + +| Parameter | Required | Description | +|-----------|----------|-------------| +| `projectEndpoint` | ✅ | Azure AI Project endpoint | +| `isRequestForRuns` | | `false` (default) to list evaluation groups | + +Then retrieve all runs within the target evaluation group: + +| Parameter | Required | Description | +|-----------|----------|-------------| +| `projectEndpoint` | ✅ | Azure AI Project endpoint | +| `evalId` | ✅ | Evaluation group ID | +| `isRequestForRuns` | ✅ | `true` to list runs | + +## Step 2 — Build Metrics Timeline + +For each run, extract per-evaluator scores and build a timeline: + +| Run | Agent Version | Date | Coherence | Fluency | Relevance | Intent Resolution | Task Adherence | Safety | +|-----|--------------|------|-----------|---------|-----------|-------------------|----------------|--------| +| run-001 | v1 | 2025-01-15 | 3.2 | 4.1 | 2.8 | 3.0 | 2.5 | 0.95 | +| run-002 | v2 | 2025-01-22 | 3.8 | 4.3 | 3.5 | 3.7 | 3.2 | 0.97 | +| run-003 | v3 | 2025-02-01 | 4.1 | 4.4 | 4.0 | 4.2 | 3.8 | 0.96 | +| run-004 | v4 | 2025-02-08 | 4.0 | 4.5 | 3.6 | 4.1 | 3.9 | 0.98 | + +## Step 3 — Trend Analysis + +Calculate trends for each evaluator: + +| Evaluator | v1 → v4 Change | Trend | Status | +|-----------|----------------|-------|--------| +| Coherence | +0.8 (+25%) | ↑ Improving | ✅ | +| Fluency | +0.4 (+10%) | ↑ Improving | ✅ | +| Relevance | +0.8 (+29%) | ↑ Improving (dip at v4) | ⚠️ | +| Intent Resolution | +1.1 (+37%) | ↑ Improving | ✅ | +| Task Adherence | +1.4 (+56%) | ↑ Improving | ✅ | +| Safety | +0.03 (+3%) | → Stable | ✅ | + +### Detecting Regressions + +Flag any evaluator where the latest run scored **lower** than the previous run: + +| Evaluator | Previous (v3) | Latest (v4) | Delta | Alert | +|-----------|--------------|-------------|-------|-------| +| Relevance | 4.0 | 3.6 | -0.4 (-10%) | ⚠️ **REGRESSION** | + +> ⚠️ **Regression detected:** Relevance dropped 10% from v3 to v4. Investigate prompt changes or dataset drift. See [Eval Regression](eval-regression.md) for automated analysis. + +### Trend Visualization (Text-based) + +``` +Coherence ████████████████████████████████░░░░░░ 4.0/5.0 ↑ +25% +Fluency █████████████████████████████████████░░ 4.5/5.0 ↑ +10% +Relevance ████████████████████████████░░░░░░░░░░ 3.6/5.0 ↑ +29% ⚠️ dip +Intent Res. █████████████████████████████████░░░░░░ 4.1/5.0 ↑ +37% +Task Adh. ████████████████████████████████░░░░░░░ 3.9/5.0 ↑ +56% +Safety ████████████████████████████████████████ 0.98 → Stable +``` + +## Step 4 — Cross-Version Summary + +Present an executive summary: + +*"Over 4 agent versions (v1→v4), your agent has improved significantly across all quality metrics. The biggest gain is Task Adherence (+56%). However, Relevance showed a 10% regression from v3 to v4 — recommend investigating recent prompt changes. Safety remains stable at 98%."* + +## Recommended Thresholds + +| Severity | Threshold | Action | +|----------|-----------|--------| +| ✅ Healthy | ≤ 2% drop from previous run | No action needed | +| ⚠️ Warning | 2–5% drop from previous run | Review recent changes | +| 🔴 Regression | > 5% drop from previous run | Block deployment, investigate | +| 🔴 Critical | Below baseline (v1) on any metric | Rollback to last known good version | + +## Next Steps + +- **Investigate regression** → [Eval Regression](eval-regression.md) +- **Compare specific versions** → [Dataset Comparison](dataset-comparison.md) +- **Set up automated monitoring** → [observe skill CI/CD](../../observe/references/cicd-monitoring.md) diff --git a/skills/microsoft-foundry/foundry-agent/eval-datasets/references/mcp-gap-analysis.md b/skills/microsoft-foundry/foundry-agent/eval-datasets/references/mcp-gap-analysis.md new file mode 100644 index 00000000..8b425e81 --- /dev/null +++ b/skills/microsoft-foundry/foundry-agent/eval-datasets/references/mcp-gap-analysis.md @@ -0,0 +1,133 @@ +# MCP Tool Gap Analysis — Foundry Platform Roadmap Recommendations + +This document identifies MCP tool capabilities that would significantly enhance the evaluation dataset experience but are **not currently available** in the `foundry-mcp` server. These are recommendations for the platform team to close competitive gaps with LangSmith. + +## Current MCP Tool Coverage + +| Tool | Status | Gap | +|------|--------|-----| +| `evaluation_dataset_create` | ⚠️ Not practical | Requires Blob Storage SAS URL upload — no file upload path from agent. Use local JSONL + `inputData` instead | +| `evaluation_dataset_get` | ✅ Available | Cannot list all versions of a dataset; only gets by name+version | +| `evaluation_agent_batch_eval_create` | ✅ Available | Full-featured | +| `evaluation_dataset_batch_eval_create` | ✅ Available | Full-featured | +| `evaluation_get` | ✅ Available | Cannot filter runs by dataset version | +| `evaluation_comparison_create` | ✅ Available | No trend analysis; only pairwise comparison | +| `evaluation_comparison_get` | ✅ Available | Full-featured | +| `evaluator_catalog_*` | ✅ Available | No version history or audit trail | + +## Requested New MCP Tools + +### Priority 1: Critical (Blocks competitive parity with LangSmith) + +#### `dataset_version_list` +**Purpose:** List all versions of a named dataset. + +| Parameter | Type | Description | +|-----------|------|-------------| +| `projectEndpoint` | string (required) | Azure AI Project endpoint | +| `datasetName` | string (required) | Dataset name | + +**Why needed:** Currently, `evaluation_dataset_get` requires both name and version. There is no way to discover what versions exist for a given dataset. Users must track versions externally (our manifest.json workaround). + +**LangSmith equivalent:** Automatic version history with read-only historical access. + +#### `dataset_from_traces` +**Purpose:** Server-side extraction of App Insights traces into a dataset, with filtering and schema transformation. + +| Parameter | Type | Description | +|-----------|------|-------------| +| `projectEndpoint` | string (required) | Azure AI Project endpoint | +| `appInsightsResourceId` | string (required) | App Insights ARM resource ID | +| `filterQuery` | string (required) | KQL filter expression | +| `timeRange` | string (required) | Time range (e.g., "7d", "30d") | +| `datasetName` | string (optional) | Target dataset name | +| `datasetVersion` | string (optional) | Target version | +| `sampleSize` | integer (optional) | Max number of traces to extract | + +**Why needed:** Currently, trace-to-dataset requires client-side KQL execution, result parsing, schema transformation, and upload. A server-side tool would dramatically simplify the workflow and enable automation. + +**LangSmith equivalent:** Run rules with automatic trace-to-dataset routing. + +### Priority 2: High (Differentiating features) + +#### `evaluation_trend_get` +**Purpose:** Retrieve time-series metrics across all runs in an evaluation group. + +| Parameter | Type | Description | +|-----------|------|-------------| +| `projectEndpoint` | string (required) | Azure AI Project endpoint | +| `evalId` | string (required) | Evaluation group ID | +| `evaluatorNames` | string[] (optional) | Filter to specific evaluators | + +**Returns:** Array of `{ runId, agentVersion, date, metrics: { evaluatorName: { average, stddev, passRate } } }`. + +**Why needed:** Currently requires multiple `evaluation_get` calls and client-side aggregation. A dedicated tool would enable trend dashboards and regression detection in a single call. + +**LangSmith equivalent:** Evaluation dashboard with historical metrics and trend analysis. + +#### `dataset_tag_manage` +**Purpose:** Add, remove, or list tags on dataset versions. + +| Parameter | Type | Description | +|-----------|------|-------------| +| `projectEndpoint` | string (required) | Azure AI Project endpoint | +| `datasetName` | string (required) | Dataset name | +| `datasetVersion` | string (required) | Dataset version | +| `action` | string (required) | `add`, `remove`, `list` | +| `tag` | string (optional) | Tag to add/remove (e.g., `prod`, `baseline`) | + +**Why needed:** Tags enable version pinning semantics (e.g., "evaluate against the `prod` dataset"). Currently requires external tracking via manifest.json. + +**LangSmith equivalent:** Built-in dataset tagging with programmatic SDK access. + +### Priority 3: Medium (Nice-to-have for competitive advantage) + +#### `dataset_split_manage` +**Purpose:** Create and manage train/validation/test splits within a dataset. + +**Why needed:** Enables targeted evaluation on specific dataset subsets without creating separate datasets. Currently requires client-side JSONL filtering. + +#### `annotation_queue_create` / `annotation_queue_get` +**Purpose:** Server-side human review queues for trace candidates before dataset inclusion. + +**Why needed:** Enables multi-user review workflows. Currently, curation is a single-user, local-file process. + +**LangSmith equivalent:** Annotation queues with multi-user review, approval workflows, and queue management. + +#### `evaluation_regression_check` +**Purpose:** Automated regression detection with configurable thresholds. + +| Parameter | Type | Description | +|-----------|------|-------------| +| `projectEndpoint` | string (required) | Azure AI Project endpoint | +| `evalId` | string (required) | Evaluation group ID | +| `baselineRunId` | string (required) | Baseline run ID | +| `treatmentRunId` | string (required) | Treatment run ID | +| `regressionThreshold` | number (optional) | Percent drop that triggers regression (default: 5%) | + +**Why needed:** Currently requires comparison + client-side threshold logic. A dedicated tool could integrate with CI/CD pipelines directly. + +## Impact Assessment + +| Requested Tool | Impact on CX Feedback | Effort Estimate | +|---------------|----------------------|-----------------| +| `dataset_version_list` | Directly addresses "organizing datasets" feedback | Low | +| `dataset_from_traces` | Directly addresses "creating datasets from traces" feedback | High | +| `evaluation_trend_get` | Directly addresses "comparing runs and metrics over time" feedback | Medium | +| `dataset_tag_manage` | Supports "hierarchical containers" feedback (via tags) | Low | +| `dataset_split_manage` | Supports "hierarchical containers" feedback (via splits) | Medium | +| `annotation_queue_*` | Enhances trace-to-dataset quality | High | +| `evaluation_regression_check` | Enables CI/CD regression gates | Medium | + +## Interim Workarounds + +Until these MCP tools are available, the [eval-datasets skill](../eval-datasets.md) provides client-side workarounds: + +| Gap | Workaround | +|-----|-----------| +| No version listing | `datasets/manifest.json` tracks all versions locally | +| No trace-to-dataset | KQL harvest templates + local schema transform | +| No trend analysis | Multiple `evaluation_get` calls + client-side aggregation | +| No tagging | Tags stored in manifest.json | +| No annotation queues | Local candidate files with status tracking | +| No regression check | Comparison results + threshold logic in skill | diff --git a/skills/microsoft-foundry/foundry-agent/eval-datasets/references/trace-to-dataset.md b/skills/microsoft-foundry/foundry-agent/eval-datasets/references/trace-to-dataset.md new file mode 100644 index 00000000..c48c7d7f --- /dev/null +++ b/skills/microsoft-foundry/foundry-agent/eval-datasets/references/trace-to-dataset.md @@ -0,0 +1,319 @@ +# Trace-to-Dataset Pipeline — Harvest Production Traces as Test Cases + +Extract production traces from App Insights using KQL, transform them into evaluation dataset format, and persist as versioned datasets. This is the core workflow for turning real-world agent failures into reproducible test cases. + +## ⛔ Do NOT + +- Do NOT upload datasets to blob storage or call `evaluation_dataset_create` — this MCP tool is not ready. +- Do NOT generate SAS URLs. Local JSONL + `inputData` is the only supported path. +- Do NOT use `parse_json(customDimensions)` — `customDimensions` is already a `dynamic` column in App Insights KQL. Access properties directly: `customDimensions["gen_ai.response.id"]`. + +## Related References + +- [Eval Correlation](../../trace/references/eval-correlation.md) (in `foundry-agent/trace/references/`) — look up eval scores by response/conversation ID via `customEvents` +- [KQL Templates](../../trace/references/kql-templates.md) (in `foundry-agent/trace/references/`) — general trace query patterns and attribute mappings + +## Prerequisites + +- App Insights resource resolved (see [trace skill](../../trace/trace.md) Before Starting) +- Agent name and project endpoint available in `.env` +- Time range confirmed with user (default: last 7 days) + +> 💡 **Run all KQL queries** using **`monitor_resource_log_query`** (Azure MCP tool) against the App Insights resource. This is preferred over delegating to the `azure-kusto` skill. + +> ⚠️ **Always pass `subscription` explicitly** to Azure MCP tools — they don't extract it from resource IDs. + +## Overview + +``` +App Insights traces + │ + ▼ +[1] KQL Harvest Query (filter by error/latency/eval score) + │ + ▼ +[2] Schema Transform (trace → JSONL format) + │ + ▼ +[3] Human Review (show candidates, let user approve/edit/reject) + │ + ▼ +[4] Persist Dataset (local JSONL files) +``` + +## Key Concept: Linking Evaluation Results to Traces + +> 💡 **Evaluation results live in `customEvents`, not in `dependencies`.** Foundry writes eval scores to App Insights as `customEvents` with `name == "gen_ai.evaluation.result"`. Agent traces (spans) live in `dependencies`. The link between them is **`gen_ai.response.id`** — this field appears on both tables. + +| Table | Contains | Join Key | +|-------|----------|----------| +| `dependencies` | Agent traces (spans, tool calls, LLM calls) | `customDimensions["gen_ai.response.id"]` | +| `customEvents` | Evaluation results (scores, labels, explanations) | `customDimensions["gen_ai.response.id"]` | + +**To harvest traces with eval scores**, join `customEvents` → `dependencies` on `responseId`. The [Low-Eval Harvest](#low-eval-harvest--traces-with-poor-evaluation-scores) template below shows this pattern. For standalone eval lookups, see [Eval Correlation](../../trace/references/eval-correlation.md) (in `foundry-agent/trace/references/`). + +## Step 1 — Choose a Harvest Template + +Select the appropriate KQL template based on user intent. These templates mirror common LangSmith "run rules" but offer more power through KQL's query language. + +> ⚠️ **Hosted agents:** The Foundry agent name (e.g., `hosted-agent-022-001`) only appears on `requests`, NOT on `dependencies`. For hosted agents, use the [Hosted Agent Harvest](#hosted-agent-harvest) template which joins via `requests.id` → `dependencies.operation_ParentId`. The templates below work directly for **prompt agents** where `gen_ai.agent.name` on `dependencies` matches the Foundry name. + +### Error Harvest — Failed Traces + +Captures all traces where the agent returned errors. Equivalent to LangSmith's `eq(error, True)` run rule. + +```kql +dependencies +| where timestamp > ago(7d) +| where success == false +| where isnotempty(customDimensions["gen_ai.operation.name"]) +| where customDimensions["gen_ai.agent.name"] == "" +| extend + conversationId = tostring(customDimensions["gen_ai.conversation.id"]), + responseId = tostring(customDimensions["gen_ai.response.id"]), + operation = tostring(customDimensions["gen_ai.operation.name"]), + model = tostring(customDimensions["gen_ai.request.model"]), + errorType = tostring(customDimensions["error.type"]), + inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]), + outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"]) +| summarize + errorCount = count(), + errors = make_set(errorType, 5), + firstSeen = min(timestamp), + lastSeen = max(timestamp) + by conversationId, responseId, operation, model +| order by lastSeen desc +| take 100 +``` + +### Low-Eval Harvest — Traces with Poor Evaluation Scores + +Captures traces where evaluator scores fell below a threshold. Equivalent to LangSmith's `and(eq(feedback_key, "quality"), lt(feedback_score, 0.3))` run rule. + +```kql +let lowEvalResponses = customEvents +| where timestamp > ago(7d) +| where name == "gen_ai.evaluation.result" +| extend + score = todouble(customDimensions["gen_ai.evaluation.score.value"]), + evalName = tostring(customDimensions["gen_ai.evaluation.name"]), + responseId = tostring(customDimensions["gen_ai.response.id"]), + conversationId = tostring(customDimensions["gen_ai.conversation.id"]) +| where score < +| project responseId, conversationId, evalName, score; +lowEvalResponses +| join kind=inner ( + dependencies + | where timestamp > ago(7d) + | where isnotempty(customDimensions["gen_ai.response.id"]) + | extend responseId = tostring(customDimensions["gen_ai.response.id"]) +) on responseId +| extend + operation = tostring(customDimensions["gen_ai.operation.name"]), + model = tostring(customDimensions["gen_ai.request.model"]), + inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]), + outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"]) +| project timestamp, conversationId, responseId, evalName, score, operation, model, duration +| order by score asc +| take 100 +``` + +> 💡 **Tip:** Replace `` with the pass threshold from your evaluator config. Common values: `3.0` for 1–5 ordinal scales, `0.5` for 0–1 continuous scales. + +### Latency Harvest — Slow Responses + +Captures traces where response latency exceeds a threshold. Equivalent to LangSmith's `gt(latency, 5000)` run rule. + +```kql +dependencies +| where timestamp > ago(7d) +| where duration > +| where isnotempty(customDimensions["gen_ai.operation.name"]) +| where customDimensions["gen_ai.agent.name"] == "" +| extend + conversationId = tostring(customDimensions["gen_ai.conversation.id"]), + responseId = tostring(customDimensions["gen_ai.response.id"]), + operation = tostring(customDimensions["gen_ai.operation.name"]), + model = tostring(customDimensions["gen_ai.request.model"]), + inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]), + outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"]) +| summarize + avgDuration = avg(duration), + maxDuration = max(duration), + spanCount = count() + by conversationId, responseId, operation, model +| order by maxDuration desc +| take 100 +``` + +> 💡 **Tip:** Replace `` with the latency threshold in milliseconds. Common values: `5000` (5s), `10000` (10s), `30000` (30s). + +### Combined Harvest — Multi-Criteria Filter + +Combines multiple filters in a single query. Equivalent to LangSmith's compound rule: `and(gt(latency, 2000), eq(error, true), has(tags, "prod"))`. + +```kql +dependencies +| where timestamp > ago(7d) +| where customDimensions["gen_ai.agent.name"] == "" +| where isnotempty(customDimensions["gen_ai.operation.name"]) +| where success == false or duration > +| extend + conversationId = tostring(customDimensions["gen_ai.conversation.id"]), + responseId = tostring(customDimensions["gen_ai.response.id"]), + operation = tostring(customDimensions["gen_ai.operation.name"]), + model = tostring(customDimensions["gen_ai.request.model"]), + errorType = tostring(customDimensions["error.type"]), + inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]), + outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"]) +| summarize + errorCount = countif(success == false), + avgDuration = avg(duration), + maxDuration = max(duration), + spanCount = count() + by conversationId, responseId, operation, model +| order by errorCount desc, maxDuration desc +| take 100 +``` + +### Sampling — Control Dataset Size + +Add `| sample ` or `| take ` to any harvest query to control the number of traces extracted. Equivalent to LangSmith's `sampling_rate` parameter. + +```kql +// Random sample of 50 traces from the harvest +... | sample 50 + +// Top 50 most recent traces +... | order by timestamp desc | take 50 + +// Stratified sample: 20 errors + 20 slow + 10 low-eval +// Run each harvest separately and combine +``` + +### Hosted Agent Harvest — Two-Step Join Pattern + +For hosted agents, the Foundry agent name lives on `requests`, not `dependencies`. Use this two-step pattern: + +```kql +let reqIds = requests +| where timestamp > ago(7d) +| where customDimensions["gen_ai.agent.name"] == "" +| distinct id; +dependencies +| where timestamp > ago(7d) +| where operation_ParentId in (reqIds) +| where customDimensions["gen_ai.operation.name"] == "invoke_agent" +| extend + conversationId = tostring(customDimensions["gen_ai.conversation.id"]), + responseId = tostring(customDimensions["gen_ai.response.id"]), + operation = tostring(customDimensions["gen_ai.operation.name"]), + model = tostring(customDimensions["gen_ai.request.model"]), + inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]), + outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"]) +| project timestamp, duration, success, conversationId, responseId, operation, model, inputTokens, outputTokens +| order by timestamp desc +| take 100 +``` + +> 💡 **When to use this pattern:** If the direct `dependencies` filter by `gen_ai.agent.name` returns no results, the agent is likely a hosted agent where `gen_ai.agent.name` on `dependencies` holds the code-level class name (e.g., `BingSearchAgent`), not the Foundry name. Switch to this `requests` → `dependencies` join. + +## Step 2 — Schema Transform + +Transform harvested traces into JSONL dataset format. Each line in the JSONL file must contain: + +| Field | Required | Source | +|-------|----------|--------| +| `query` | ✅ | User input — extract from `gen_ai.input.messages` on `invoke_agent` dependency spans | +| `response` | Optional | Agent output — extract from `gen_ai.output.messages` on `invoke_agent` dependency spans | +| `context` | Optional | Tool results or retrieved documents from the trace | +| `ground_truth` | Optional | Expected correct answer (add during curation) | +| `metadata` | Optional | Source info: `{"source": "trace", "conversationId": "...", "harvestRule": "error"}` | + +### Extracting Input/Output from Traces + +The full input/output content lives on `invoke_agent` dependency spans in `gen_ai.input.messages` and `gen_ai.output.messages`. These contain complete message arrays: + +```json +// gen_ai.input.messages structure: +[{"role": "user", "parts": [{"type": "text", "content": "How do I reset my password?"}]}] + +// gen_ai.output.messages structure: +[{"role": "assistant", "parts": [{"type": "text", "content": "To reset your password..."}]}] +``` + +Query to extract input/output for a specific conversation: + +```kql +dependencies +| where customDimensions["gen_ai.conversation.id"] == "" +| where customDimensions["gen_ai.operation.name"] in ("invoke_agent", "execute_agent", "chat", "create_response") +| extend + responseId = tostring(customDimensions["gen_ai.response.id"]), + operation = tostring(customDimensions["gen_ai.operation.name"]), + inputMessages = tostring(customDimensions["gen_ai.input.messages"]), + outputMessages = tostring(customDimensions["gen_ai.output.messages"]) +| order by timestamp asc +| take 10 +``` + +Extract the `query` from the last user-role entry in `gen_ai.input.messages` and the `response` from `gen_ai.output.messages`. Save extracted data to a local JSONL file: + +``` +datasets/-traces-candidates-.jsonl +``` + +## Step 3 — Human Review (Curation) + +> ⚠️ **MANDATORY:** Never auto-commit harvested traces to a dataset. Always show candidates to the user first. + +Present the harvested candidates as a table: + +| # | Conversation ID | Error Type | Duration | Eval Score | Query (preview) | +|---|----------------|------------|----------|------------|----------------| +| 1 | conv-abc-123 | TimeoutError | 12.3s | 2.0 | "How do I reset my..." | +| 2 | conv-def-456 | None | 8.7s | 1.5 | "What's the status of..." | +| 3 | conv-ghi-789 | ValidationError | 0.4s | 3.0 | "Can you help me with..." | + +Ask the user: +- *"Which candidates should I include in the dataset? (all / select by number / filter by criteria)"* +- *"Would you like to add ground_truth reference answers for any of these?"* +- *"What should I name this dataset version?"* + +## Step 4 — Persist Dataset (Local JSONL) + +Save approved candidates to `datasets/--v.jsonl`: + +```json +{"query": "How do I reset my password?", "context": "User account management", "metadata": {"source": "trace", "conversationId": "conv-abc-123", "harvestRule": "error"}} +{"query": "What's the status of my order?", "response": "...", "ground_truth": "Order #12345 shipped on...", "metadata": {"source": "trace", "conversationId": "conv-def-456", "harvestRule": "latency"}} +``` + +### Update Manifest + +After persisting, update `datasets/manifest.json` with lineage information: + +```json +{ + "datasets": [ + { + "name": "support-bot-traces-v3", + "file": "support-bot-traces-v3.jsonl", + "version": "3", + "source": "trace-harvest", + "harvestRule": "error+latency", + "timeRange": "2025-02-01 to 2025-02-07", + "exampleCount": 47, + "createdAt": "2025-02-08T10:00:00Z", + "reviewedBy": "user" + } + ] +} +``` + +## Next Steps + +After creating a dataset: +- **Run evaluation** → [observe skill Step 2](../../observe/references/evaluate-step.md) +- **Version and tag** → [Dataset Versioning](dataset-versioning.md) +- **Organize into splits** → [Dataset Organization](dataset-organization.md) diff --git a/skills/microsoft-foundry/foundry-agent/invoke/invoke.md b/skills/microsoft-foundry/foundry-agent/invoke/invoke.md new file mode 100644 index 00000000..0436dd2d --- /dev/null +++ b/skills/microsoft-foundry/foundry-agent/invoke/invoke.md @@ -0,0 +1,98 @@ +# Invoke Foundry Agent + +Invoke and test deployed agents in Azure AI Foundry with single-turn and multi-turn conversations. + +## Quick Reference + +| Property | Value | +|----------|-------| +| Agent types | Prompt (LLM-based), Hosted (ACA based), Hosted (vNext) | +| MCP server | `foundry-mcp` | +| Key MCP tools | `agent_invoke`, `agent_container_status_get`, `agent_get` | +| Conversation support | Single-turn and multi-turn (via `conversationId`) | +| Session support | Sticky sessions for vNext hosted agents (via client-generated `sessionId`) | + +## When to Use This Skill + +- Send a test message to a deployed agent +- Have multi-turn conversations with an agent +- Test a prompt agent immediately after creation +- Test a hosted agent after its container is running +- Verify an agent responds correctly to specific inputs + +## MCP Tools + +| Tool | Description | Parameters | +|------|-------------|------------| +| `agent_invoke` | Send a message to an agent and get a response | `projectEndpoint`, `agentName`, `inputText` (required); `agentVersion`, `conversationId`, `containerEndpoint`, `sessionId` (mandatory for vNext hosted agents) | +| `agent_container_status_get` | Check container running status (hosted agents) | `projectEndpoint`, `agentName` (required); `agentVersion` | +| `agent_get` | Get agent details to verify existence and type | `projectEndpoint` (required), `agentName` (optional) | + +## Workflow + +### Step 1: Verify Agent Readiness + +Delegate the readiness check to a sub-agent. Provide the project endpoint and agent name, and instruct it to: + +**Prompt agents** → Use `agent_get` to verify the agent exists. + +**Hosted agents (ACA)** → Use `agent_container_status_get` to check: +- Status `Running` ✅ → Proceed to Step 2 +- Status `Starting` → Wait and re-check +- Status `Stopped` or `Failed` ❌ → Warn the user and suggest using the deploy skill to start the container + +**Hosted agents (vNext)** → Ready immediately after deployment (no container status check needed) + +### Step 2: Invoke Agent + +Use the project endpoint and agent name from the project context (see Common: Project Context Resolution). Ask the user only for values not already resolved. + +Use `agent_invoke` to send a message: +- `projectEndpoint` — AI Foundry project endpoint +- `agentName` — Name of the agent to invoke +- `inputText` — The message to send + +**Optional parameters:** +- `agentVersion` — Target a specific agent version +- `sessionId` — MANDATORY for vNext hosted agents, include the session ID to maintain sticky sessions with the same compute resource + +#### Session Support for vNext Hosted Agents +In vNext hosted agents, the invoke endpoint accepts a 25 character alphanumeric `sessionId` parameter. Sessions are **sticky** - they route the request to same underlying compute resource, so agent can re-use the state stored in compute's file across multiple turns. + +Rules: +1. You MUST generate a unique `sessionId` before making the first `agent_invoke` call. +2. If you have a session ID, you MUST include it in every subsequent `agent_invoke` call for that conversation. +3. When the user explicitly requests a new session, create a new `sessionId` and use it for rest of the `agent_invoke` calls. + +This is different from `conversationId` which tracks conversation history — `sessionId` controls which compute instance handles the request. + +### Step 3: Multi-Turn Conversations + +For follow-up messages, pass the `conversationId` from the previous response to `agent_invoke`. This maintains conversation context across turns. + +Each invocation with the same `conversationId` continues the existing conversation thread. + +## Agent Type Differences + +| Behavior | Prompt Agent | Hosted Agent | +|----------|-------------|--------------| +| Readiness | Immediate after creation | Requires running container | +| Pre-check | `agent_get` to verify exists | `agent_container_status_get` for `Running` status | +| Routing | Automatic | Optional `containerEndpoint` parameter | +| Multi-turn | ✅ via `conversationId` | ✅ via `conversationId` | + +## Error Handling + +| Error | Cause | Resolution | +|-------|-------|------------| +| Agent not found | Invalid agent name or project endpoint | Use `agent_get` to list available agents and verify name | +| Container not running | Hosted agent container is stopped or failed | Use deploy skill to start the container with `agent_container_control` | +| Invocation failed | Model error, timeout, or invalid input | Check agent logs, verify model deployment is active, retry with simpler input | +| Conversation ID invalid | Stale or non-existent conversation | Start a new conversation without `conversationId` | +| Rate limit exceeded | Too many requests | Implement backoff and retry, or wait before sending next message | + +## Additional Resources + +- [Foundry Hosted Agents](https://learn.microsoft.com/en-us/azure/ai-foundry/agents/concepts/hosted-agents?view=foundry) +- [Foundry Agent Runtime Components](https://learn.microsoft.com/en-us/azure/ai-foundry/agents/concepts/runtime-components?view=foundry) +- [Foundry Samples](https://github.com/azure-ai-foundry/foundry-samples) diff --git a/skills/microsoft-foundry/foundry-agent/observe/observe.md b/skills/microsoft-foundry/foundry-agent/observe/observe.md new file mode 100644 index 00000000..c29f4ac8 --- /dev/null +++ b/skills/microsoft-foundry/foundry-agent/observe/observe.md @@ -0,0 +1,70 @@ +# Agent Observability Loop + +Orchestrate the full eval-driven optimization cycle for a Foundry agent. This skill manages the **multi-step workflow** — auto-creating evaluators, generating test datasets, running batch evals, clustering failures, optimizing prompts, redeploying, and comparing versions. Use this skill instead of calling individual foundry-mcp evaluation tools manually. + +## When to Use This Skill + +USE FOR: evaluate my agent, run an eval, test my agent, check agent quality, run batch evaluation, analyze eval results, why did my eval fail, cluster failures, improve agent quality, optimize agent prompt, compare agent versions, re-evaluate after changes, set up CI/CD evals, agent monitoring, eval-driven optimization. + +> ⚠️ **DO NOT manually call** `evaluation_agent_batch_eval_create`, `evaluator_catalog_create`, `evaluation_comparison_create`, or `prompt_optimize` **without reading this skill first.** This skill defines required pre-checks, artifact persistence, and multi-step orchestration that the raw tools do not enforce. + +## Quick Reference + +| Property | Value | +|----------|-------| +| MCP server | `foundry-mcp` | +| Key MCP tools | `evaluation_agent_batch_eval_create`, `evaluator_catalog_create`, `evaluation_comparison_create`, `prompt_optimize`, `agent_update` | +| Prerequisite | Agent deployed and running (use [deploy skill](../deploy/deploy.md)) | + +## Entry Points + +| User Intent | Start At | +|-------------|----------| +| "Deploy and evaluate my agent" | [Step 1: Auto-Setup Evaluators](references/deploy-and-setup.md) (deploy first via [deploy skill](../deploy/deploy.md)) | +| "Agent just deployed" / "Set up evaluation" | [Step 1: Auto-Setup Evaluators](references/deploy-and-setup.md) (skip deploy, run auto-create) | +| "Evaluate my agent" / "Run an eval" | [Step 1: Auto-Setup Evaluators](references/deploy-and-setup.md) first if `evaluators/` is empty, then [Step 2: Evaluate](references/evaluate-step.md) | +| "Why did my eval fail?" / "Analyze results" | [Step 3: Analyze](references/analyze-results.md) | +| "Improve my agent" / "Optimize prompt" | [Step 4: Optimize](references/optimize-deploy.md) | +| "Compare agent versions" | [Step 5: Compare](references/compare-iterate.md) | +| "Set up CI/CD evals" | [Step 6: CI/CD](references/cicd-monitoring.md) | + +> ⚠️ **Important:** Before running any evaluation (Step 2), always check if evaluators and test datasets exist in `evaluators/` and `datasets/`. If they don't, route through [Step 1: Auto-Setup](references/deploy-and-setup.md) first — even if the user only asked to "evaluate." + +## Before Starting — Detect Current State + +1. Check `.env` for `AZURE_AI_PROJECT_ENDPOINT` and `AZURE_AI_AGENT_NAME` +2. Use `agent_get` and `agent_container_status_get` to verify the agent exists and is running +3. Use `evaluation_get` to check for existing eval runs +4. Jump to the appropriate entry point + +## Loop Overview + +``` +1. Auto-setup evaluators & local test dataset + → ask: "Run an evaluation to identify optimization opportunities?" +2. Evaluate (batch eval run) +3. Download & cluster failures +4. Pick a category to optimize +5. Optimize prompt +6. Deploy new version (after user sign-off) +7. Re-evaluate (same eval group) +8. Compare versions → decide which to keep +9. Loop to next category or finish +10. Prompt: enable CI/CD evals & continuous production monitoring +``` + +## Behavioral Rules + +1. **Auto-poll in background.** After creating eval runs or starting containers, poll in a background terminal. Only surface the final result. +2. **Confirm before changes.** Show diff/summary before modifying agent code or deploying. Wait for sign-off. +3. **Prompt for next steps.** After each step, present options. Never assume the path forward. +4. **Write scripts to files.** Python scripts go in `scripts/` — no inline code blocks. +5. **Persist eval artifacts.** Save to `evaluators/`, `datasets/`, and `results/` for version tracking (see [deploy-and-setup](references/deploy-and-setup.md) for structure). + +## Related Skills + +| User Intent | Skill | +|-------------|-------| +| "Analyze production traces" / "Search conversations" / "Find errors in App Insights" | [trace skill](../trace/trace.md) | +| "Debug container issues" / "Container logs" | [troubleshoot skill](../troubleshoot/troubleshoot.md) | +| "Deploy or redeploy agent" | [deploy skill](../deploy/deploy.md) | diff --git a/skills/microsoft-foundry/foundry-agent/observe/references/analyze-results.md b/skills/microsoft-foundry/foundry-agent/observe/references/analyze-results.md new file mode 100644 index 00000000..e5f61f06 --- /dev/null +++ b/skills/microsoft-foundry/foundry-agent/observe/references/analyze-results.md @@ -0,0 +1,49 @@ +# Steps 3–5 — Download Results, Cluster Failures, Dive Into Category + +## Step 3 — Download Results + +`evaluation_get` returns run metadata but **not** full per-row output. Write a Python script (save to `scripts/`) to download detailed results: + +1. Initialize `AIProjectClient` with project endpoint and `DefaultAzureCredential` +2. Get OpenAI client via `project_client.get_openai_client()` +3. Call `openai_client.evals.runs.output_items.list(eval_id=..., run_id=...)` +4. Serialize each item with `item.model_dump()` and save to `results//.json` (use `default=str` for non-serializable fields) +5. Print summary: total items, passed, failed, errored counts + +> ⚠️ **Data structure gotcha:** Query/response data lives in `datasource_item.query` and `datasource_item['sample.output_text']`, **not** in `sample.input`/`sample.output` (which are empty arrays). Parse `datasource_item` fields when extracting queries and responses for analysis. + +> SDK setup: `pip install azure-ai-projects azure-identity openai` + +## Step 4 — Cluster Failures by Root Cause + +Analyze every row in the results. Group failures into clusters: + +| Cluster | Description | +|---------|-------------| +| Incorrect / hallucinated answer | Agent gave a wrong or fabricated response | +| Incomplete answer | Agent missed key parts | +| Tool call failure | Agent failed to invoke or misused a tool | +| Safety / content violation | Flagged by safety evaluators | +| Runtime error | Agent crashed or returned an error | +| Off-topic / refusal | Agent refused or went off-topic | + +Produce a **prioritized action table**: + +| Priority | Cluster | Suggested Action | +|----------|---------|------------------| +| P0 | Runtime errors | Check container logs | +| P1 | Incorrect answers | Optimize prompt ([Step 6](optimize-deploy.md)) | +| P2 | Incomplete answers | Optimize prompt ([Step 6](optimize-deploy.md)) | +| P3 | Tool call failures | Fix tool definitions or instructions | +| P4 | Safety violations | Add guardrails to instructions | +| P5 | Off-topic / refusal | Clarify scope in instructions | + +**Rule:** Runtime errors first (P0), then by count × severity. + +## Step 5 — Dive Into Category + +When the user wants to inspect a specific cluster, display the individual rows: input query, the agent's original response, evaluator scores, and failure reason. Let the user confirm which category to optimize. + +## Next Steps + +After clustering → proceed to [Step 6: Optimize Prompt](optimize-deploy.md). diff --git a/skills/microsoft-foundry/foundry-agent/observe/references/cicd-monitoring.md b/skills/microsoft-foundry/foundry-agent/observe/references/cicd-monitoring.md new file mode 100644 index 00000000..0fc85689 --- /dev/null +++ b/skills/microsoft-foundry/foundry-agent/observe/references/cicd-monitoring.md @@ -0,0 +1,35 @@ +# Step 11 — Enable CI/CD Evals & Continuous Monitoring + +After confirming the final agent version, prompt with two options: + +## Option 1 — CI/CD Evaluations + +*"Would you like to add automated evaluations to your CI/CD pipeline so every deployment is evaluated before going live?"* + +If yes, generate a GitHub Actions workflow (e.g., `.github/workflows/agent-eval.yml`) that: + +1. Triggers on push to `main` or on pull request +2. Reads evaluator definitions from `evaluators/` and test datasets from `datasets/` +3. Runs `evaluation_agent_batch_eval_create` against the newly deployed agent version +4. Fails the workflow if any evaluator score falls below configured thresholds +5. Posts a summary as a PR comment or workflow annotation + +Use repository secrets for `AZURE_AI_PROJECT_ENDPOINT` and Azure credentials. Confirm the workflow file with the user before committing. + +## Option 2 — Continuous Production Monitoring + +*"Would you like to set up continuous evaluations to monitor your agent's quality in production?"* + +If yes, generate a scheduled GitHub Actions workflow (e.g., `.github/workflows/agent-eval-scheduled.yml`) that: + +1. Runs on a cron schedule (ask user preference: daily, weekly, etc.) +2. Evaluates the current production agent version using stored evaluators and datasets +3. Saves results to `results/` +4. Opens a GitHub issue or sends a notification if any score degrades below thresholds + +The user may choose one, both, or neither. + +## Reference + +- [Azure AI Foundry Cloud Evaluation](https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/develop/cloud-evaluation) +- [Hosted Agents](https://learn.microsoft.com/en-us/azure/ai-foundry/agents/concepts/hosted-agents) diff --git a/skills/microsoft-foundry/foundry-agent/observe/references/compare-iterate.md b/skills/microsoft-foundry/foundry-agent/observe/references/compare-iterate.md new file mode 100644 index 00000000..42813830 --- /dev/null +++ b/skills/microsoft-foundry/foundry-agent/observe/references/compare-iterate.md @@ -0,0 +1,48 @@ +# Steps 8–10 — Re-Evaluate, Compare Versions, Iterate + +## Step 8 — Re-Evaluate + +Use **`evaluation_agent_batch_eval_create`** with the **same `evaluationId`** as the baseline run. This places both runs in the same eval group for comparison. Use the same local test dataset (from `datasets/`) and evaluators. Update `agentVersion` to the new version. + +Auto-poll for completion in a background terminal (same as [Step 2](evaluate-step.md)). + +## Step 9 — Compare Versions + +> **Critical:** `displayName` is **required** in the `insightRequest`. Despite the MCP tool schema showing `displayName` as optional (`type: ["string", "null"]`), the API will reject requests without it with a BadRequest error. `state` must be `"NotStarted"`. + +### Required Parameters for `evaluation_comparison_create` + +| Parameter | Required | Description | +|-----------|----------|-------------| +| `insightRequest.displayName` | ✅ | Human-readable name. **Omitting causes BadRequest.** | +| `insightRequest.state` | ✅ | Must be `"NotStarted"` | +| `insightRequest.request.evalId` | ✅ | Eval group ID containing both runs | +| `insightRequest.request.baselineRunId` | ✅ | Run ID of the baseline | +| `insightRequest.request.treatmentRunIds` | ✅ | Array of treatment run IDs | + +Use **`evaluation_comparison_create`** with a nested `insightRequest`: + +```json +{ + "insightRequest": { + "displayName": "V1 vs V2 Comparison", + "state": "NotStarted", + "request": { + "type": "EvaluationComparison", + "evalId": "", + "baselineRunId": "", + "treatmentRunIds": [""] + } + } +} +``` + +> **Important:** Both runs must be in the **same eval group** (same `evaluationId` in Steps 2 and 8). + +Then use **`evaluation_comparison_get`** (with the returned `insightId`) to retrieve comparison results. Present a summary showing which version performed better per evaluator, and recommend which version to keep. + +## Step 10 — Iterate or Finish + +If more categories remain in the prioritized action table (from [Step 4](analyze-results.md)), loop back to **Step 5** (dive into next category) → **Step 6** (optimize) → **Step 7** (deploy) → **Step 8** (re-evaluate) → **Step 9** (compare). + +Otherwise, confirm the final agent version with the user, then prompt for [CI/CD evals & monitoring](cicd-monitoring.md). diff --git a/skills/microsoft-foundry/foundry-agent/observe/references/deploy-and-setup.md b/skills/microsoft-foundry/foundry-agent/observe/references/deploy-and-setup.md new file mode 100644 index 00000000..47b2cbac --- /dev/null +++ b/skills/microsoft-foundry/foundry-agent/observe/references/deploy-and-setup.md @@ -0,0 +1,67 @@ +# Step 1 — Auto-Setup Evaluators & Dataset + +> **This step runs automatically after deployment.** If the agent was deployed via the [deploy skill](../../deploy/deploy.md), evaluators and a test dataset may already be configured. Check `evaluators/` and `datasets/` for existing artifacts before re-creating. +> +> If the agent is **not yet deployed**, follow the [deploy skill](../../deploy/deploy.md) first. It handles project detection, Dockerfile generation, ACR build, agent creation, container startup, **and** auto-creates evaluators & dataset after a successful deployment. + +## Auto-Create Evaluators & Dataset + +> **This step is fully automatic.** After deployment, immediately prepare evaluators and a local test dataset without waiting for the user to request it. + +### 1. Read Agent Instructions + +Use **`agent_get`** (or local `agent.yaml`) to understand the agent's purpose and capabilities. + +### 2. Select Evaluators + +Combine **built-in, custom, and safety evaluators**: + +| Category | Evaluators | +|----------|-----------| +| **Quality (built-in)** | intent_resolution, task_adherence, coherence, fluency, relevance | +| **Safety (include ≥2)** | violence, self_harm, hate_unfairness, sexual, indirect_attack | +| **Custom (create 1–2)** | Domain-specific via `evaluator_catalog_create` (see below) | + +### 3. Create Custom Evaluators + +Use **`evaluator_catalog_create`** with: + +| Parameter | Required | Description | +|-----------|----------|-------------| +| `projectEndpoint` | ✅ | Azure AI Project endpoint | +| `name` | ✅ | e.g., `domain_accuracy`, `citation_quality` | +| `category` | ✅ | `quality`, `safety`, or `agents` | +| `scoringType` | ✅ | `ordinal`, `continuous`, or `boolean` | +| `promptText` | ✅* | Template with `{{query}}`, `{{response}}` placeholders | +| `minScore` / `maxScore` | | Default: 1 / 5 | +| `passThreshold` | | Scores ≥ this value pass | + +> **LLM-judge tip:** Include in the evaluator prompt: *"Do NOT penalize the response for mentioning dates or events beyond your training cutoff. The agent has real-time access."* + +### 4. Identify LLM-Judge Deployment + +Use **`model_deployment_get`** to find a suitable model (e.g., `gpt-4o`) for quality evaluators. + +### 5. Generate Local Test Dataset + +Use the identified LLM deployment to generate realistic test queries based on the agent's instructions and tool capabilities. Save to `datasets/-test.jsonl` with each line containing at minimum a `query` field (optionally `context`, `ground_truth`). + +### 6. Persist Artifacts + +``` +evaluators/ # custom evaluator definitions + .yaml # prompt text, scoring type, thresholds +datasets/ # locally generated input datasets + *.jsonl # test queries +results/ # evaluation run outputs (populated later) + / + .json +``` + +Save evaluator definitions to `evaluators/.yaml` and test data to `datasets/*.jsonl`. + +### 7. Prompt User + +*"Your agent is deployed and running. Evaluators and a local test dataset have been auto-configured. Would you like to run an evaluation to identify optimization opportunities?"* + +If yes → proceed to [Step 2: Evaluate](evaluate-step.md). If no → stop. diff --git a/skills/microsoft-foundry/foundry-agent/observe/references/evaluate-step.md b/skills/microsoft-foundry/foundry-agent/observe/references/evaluate-step.md new file mode 100644 index 00000000..23148083 --- /dev/null +++ b/skills/microsoft-foundry/foundry-agent/observe/references/evaluate-step.md @@ -0,0 +1,51 @@ +# Step 2 — Create Batch Evaluation + +## Prerequisites + +- Agent deployed and running +- Evaluators configured (from [Step 1](deploy-and-setup.md) or `evaluators/` folder) +- Local test dataset available (from `datasets/`) + +## Run Evaluation + +Use **`evaluation_agent_batch_eval_create`** to run evaluators against the agent. + +### Required Parameters + +| Parameter | Description | +|-----------|-------------| +| `projectEndpoint` | Azure AI Project endpoint | +| `agentName` | Agent name | +| `agentVersion` | Agent version (string, e.g. `"1"`) | +| `evaluatorNames` | Array of evaluator names (NOT `evaluators`) | + +### Test Data Options + +**Preferred — local dataset:** Read JSONL from `datasets/` and pass via `inputData` (array of objects with `query` and optionally `context`, `ground_truth`). Provides reproducibility, version control, and reviewability. Always use this when `datasets/` contains files. + +**Fallback only — server-side synthetic data:** Set `generateSyntheticData=true` AND provide `generationModelDeploymentName`. Only use when no local dataset exists and the user explicitly requests it. Optionally set `samplesCount` (default 50) and `generationPrompt` with the agent's instructions. + +### Additional Parameters + +| Parameter | When Needed | +|-----------|-------------| +| `deploymentName` | Required for quality evaluators (the LLM-judge model) | +| `evaluationId` | Pass existing eval group ID to group runs for comparison | +| `evaluationName` | Name for a new evaluation group | + +> **Important:** Use `evaluationId` (NOT `evalId`) to group runs. + +## Auto-Poll for Completion + +Immediately after creating the run, poll **`evaluation_get`** in a **background terminal** until completion. Use `evalId` + `isRequestForRuns=true`. The run ID parameter is `evalRunId` (NOT `runId`). + +Only surface the final result when status reaches `completed`, `failed`, or `cancelled`. + +## Next Steps + +When evaluation completes → proceed to [Step 3: Analyze Results](analyze-results.md). + +## Reference + +- [Azure AI Foundry Cloud Evaluation](https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/develop/cloud-evaluation) +- [Built-in Evaluators](https://learn.microsoft.com/en-us/azure/foundry/concepts/built-in-evaluators) diff --git a/skills/microsoft-foundry/foundry-agent/observe/references/optimize-deploy.md b/skills/microsoft-foundry/foundry-agent/observe/references/optimize-deploy.md new file mode 100644 index 00000000..32d5c062 --- /dev/null +++ b/skills/microsoft-foundry/foundry-agent/observe/references/optimize-deploy.md @@ -0,0 +1,32 @@ +# Steps 6–7 — Optimize Prompt & Deploy New Version + +## Step 6 — Optimize Prompt + +> ⛔ **Guardrail:** When optimizing after a dataset update, do NOT remove dataset rows or weaken evaluators to recover scores. Score drops on a harder dataset are expected — they mean test coverage improved, not that the agent regressed. Optimize for NEW failure patterns only. + +Use **`prompt_optimize`** with: + +| Parameter | Required | Description | +|-----------|----------|-------------| +| `developerMessage` | ✅ | Agent's current system prompt / instructions | +| `deploymentName` | ✅ | Model for optimization (e.g., `gpt-4o-mini`) | +| `projectEndpoint` or `foundryAccountResourceId` | ✅ | At least one required | +| `requestedChanges` | | Concise improvement suggestions from cluster analysis | + +**Example `requestedChanges`:** *"Be more specific when answering geography questions"*, *"Always cite sources when providing factual claims"* + +> Use the optimized prompt returned by the tool. Do NOT manually rewrite. + +## Step 7 — Deploy New Version + +> **Always confirm before deploying.** Show the user a diff or summary of prompt changes and wait for explicit sign-off. + +After approval: + +1. Use **`agent_update`** to create a new agent version with the optimized prompt +2. Start the container with **`agent_container_control`** (action: `start`) +3. Poll **`agent_container_status_get`** in a **background terminal** until status is `Running` + +## Next Steps + +When the new version is running → proceed to [Step 8: Re-Evaluate](compare-iterate.md). diff --git a/skills/microsoft-foundry/foundry-agent/trace/references/analyze-failures.md b/skills/microsoft-foundry/foundry-agent/trace/references/analyze-failures.md new file mode 100644 index 00000000..fb04343e --- /dev/null +++ b/skills/microsoft-foundry/foundry-agent/trace/references/analyze-failures.md @@ -0,0 +1,109 @@ +# Analyze Failures — Find and Cluster Failing Traces + +Identify failing agent traces, group them by root cause, and produce a prioritized action table. + +## Step 1 — Find Failing Traces + +> ⚠️ **Hosted agents:** `gen_ai.agent.name` on `dependencies` holds the **code-level class name** (e.g., `BingSearchAgent`), NOT the Foundry agent name. To filter by Foundry name, use the [Hosted Agent Variant](#hosted-agent-variant--failures) below. + +```kql +dependencies +| where timestamp > ago(24h) +| where success == false or toint(resultCode) >= 400 +| extend + operation = tostring(customDimensions["gen_ai.operation.name"]), + errorType = tostring(customDimensions["error.type"]), + model = tostring(customDimensions["gen_ai.request.model"]), + agentName = tostring(customDimensions["gen_ai.agent.name"]), + conversationId = tostring(customDimensions["gen_ai.conversation.id"]) +| project timestamp, name, duration, resultCode, errorType, operation, model, + agentName, conversationId, operation_Id, id +| order by timestamp desc +| take 100 +``` + +## Step 2 — Cluster by Error Type + +```kql +dependencies +| where timestamp > ago(24h) +| where success == false or toint(resultCode) >= 400 +| extend + errorType = tostring(customDimensions["error.type"]), + operation = tostring(customDimensions["gen_ai.operation.name"]) +| summarize + count = count(), + firstSeen = min(timestamp), + lastSeen = max(timestamp), + avgDuration = avg(duration), + sampleOperationId = take_any(operation_Id) + by errorType, operation, resultCode +| order by count desc +``` + +## Step 3 — Prioritized Action Table + +Present results as: + +| Priority | Error Type | Operation | Count | Result Code | Suggested Action | +|----------|-----------|-----------|-------|-------------|-----------------| +| P0 | timeout | invoke_agent | 15 | 504 | Check agent container health, increase timeout | +| P1 | rate_limited | chat | 8 | 429 | Check quota, add retry logic | +| P2 | content_filter | chat | 5 | 400 | Review prompt for policy violations | +| P3 | tool_error | execute_tool | 3 | 500 | Check tool implementation and permissions | + +**Prioritization:** P0 = highest count or most severe (5xx), then by count × recency. + +## Step 4 — Drill Into Specific Failure + +When the user selects a cluster, show individual failing traces: + +```kql +dependencies +| where timestamp > ago(24h) +| where success == false +| where customDimensions["error.type"] == "" +| where customDimensions["gen_ai.operation.name"] == "" +| project timestamp, name, duration, resultCode, + conversationId = tostring(customDimensions["gen_ai.conversation.id"]), + responseId = tostring(customDimensions["gen_ai.response.id"]), + operation_Id +| order by timestamp desc +| take 20 +``` + +Also check `exceptions` table for stack traces: + +```kql +exceptions +| where timestamp > ago(24h) +| where operation_Id in ("", "") +| project timestamp, type, message, outerMessage, details, operation_Id +| order by timestamp desc +``` + +Offer to view the full conversation for any trace via [Conversation Detail](conversation-detail.md). + +## Hosted Agent Variant — Failures + +For hosted agents, the Foundry agent name lives on `requests`, not `dependencies`. Use a two-step join: + +```kql +let reqIds = requests +| where timestamp > ago(24h) +| where customDimensions["gen_ai.agent.name"] == "" +| distinct id; +dependencies +| where timestamp > ago(24h) +| where operation_ParentId in (reqIds) +| where success == false or toint(resultCode) >= 400 +| extend + operation = tostring(customDimensions["gen_ai.operation.name"]), + errorType = tostring(customDimensions["error.type"]), + model = tostring(customDimensions["gen_ai.request.model"]), + conversationId = tostring(customDimensions["gen_ai.conversation.id"]) +| project timestamp, name, duration, resultCode, errorType, operation, model, + conversationId, operation_ParentId, operation_Id +| order by timestamp desc +| take 100 +``` diff --git a/skills/microsoft-foundry/foundry-agent/trace/references/analyze-latency.md b/skills/microsoft-foundry/foundry-agent/trace/references/analyze-latency.md new file mode 100644 index 00000000..a4bdbb56 --- /dev/null +++ b/skills/microsoft-foundry/foundry-agent/trace/references/analyze-latency.md @@ -0,0 +1,116 @@ +# Analyze Latency — Find and Diagnose Slow Traces + +Identify slow agent traces, find bottleneck spans, and correlate with token usage. + +## Step 1 — Find Slow Conversations + +> ⚠️ **Hosted agents:** `gen_ai.agent.name` on `dependencies` holds the **code-level class name** (e.g., `BingSearchAgent`), NOT the Foundry agent name. To scope by Foundry name, use the [Hosted Agent Variant](#hosted-agent-variant--latency) below. + +```kql +dependencies +| where timestamp > ago(24h) +| where customDimensions["gen_ai.operation.name"] == "invoke_agent" +| project timestamp, duration, success, + agentName = tostring(customDimensions["gen_ai.agent.name"]), + conversationId = tostring(customDimensions["gen_ai.conversation.id"]), + operation_Id +| summarize + totalDuration = sum(duration), + spanCount = count(), + hasErrors = countif(success == false) > 0 + by conversationId, operation_Id +| where totalDuration > 5000 +| order by totalDuration desc +| take 50 +``` + +> **Default threshold:** 5 seconds. Ask the user for their latency threshold if not specified. + +## Step 2 — Latency Distribution (P50/P95/P99) + +```kql +dependencies +| where timestamp > ago(24h) +| where customDimensions["gen_ai.operation.name"] in ("chat", "invoke_agent") +| summarize + p50 = percentile(duration, 50), + p95 = percentile(duration, 95), + p99 = percentile(duration, 99), + avg = avg(duration), + count = count() + by operation = tostring(customDimensions["gen_ai.operation.name"]), + model = tostring(customDimensions["gen_ai.request.model"]) +| order by p95 desc +``` + +Present as: + +| Operation | Model | P50 (ms) | P95 (ms) | P99 (ms) | Avg (ms) | Count | +|-----------|-------|---------|---------|---------|---------|-------| + +## Step 3 — Bottleneck Breakdown + +For a specific slow conversation, break down time spent per span type: + +```kql +dependencies +| where operation_Id == "" +| extend operation = tostring(customDimensions["gen_ai.operation.name"]) +| summarize + totalDuration = sum(duration), + spanCount = count(), + avgDuration = avg(duration) + by operation, name +| order by totalDuration desc +``` + +Common bottleneck patterns: +- **`chat` spans dominate** → LLM inference is slow (consider smaller model or caching) +- **`execute_tool` spans dominate** → Tool execution is slow (optimize tool implementation) +- **`invoke_agent` has long gaps** → Orchestration overhead (check agent framework) + +## Step 4 — Token Usage vs Latency Correlation + +```kql +dependencies +| where timestamp > ago(24h) +| where customDimensions["gen_ai.operation.name"] == "chat" +| extend + inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]), + outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"]) +| where isnotempty(inputTokens) +| project duration, inputTokens, outputTokens, + model = tostring(customDimensions["gen_ai.request.model"]), + operation_Id +| order by duration desc +| take 100 +``` + +High token counts often correlate with high latency. If confirmed, suggest: +- Reduce system prompt length +- Limit conversation history window +- Use a faster model for simpler queries + +## Hosted Agent Variant — Latency + +For hosted agents, scope by Foundry agent name via `requests` then join to `dependencies`: + +```kql +let reqIds = requests +| where timestamp > ago(24h) +| where customDimensions["gen_ai.agent.name"] == "" +| distinct id; +dependencies +| where timestamp > ago(24h) +| where operation_ParentId in (reqIds) +| where customDimensions["gen_ai.operation.name"] in ("chat", "invoke_agent") +| summarize + p50 = percentile(duration, 50), + p95 = percentile(duration, 95), + p99 = percentile(duration, 99), + avg = avg(duration), + count = count() + by operation = tostring(customDimensions["gen_ai.operation.name"]), + model = tostring(customDimensions["gen_ai.request.model"]) +| order by p95 desc +``` diff --git a/skills/microsoft-foundry/foundry-agent/trace/references/conversation-detail.md b/skills/microsoft-foundry/foundry-agent/trace/references/conversation-detail.md new file mode 100644 index 00000000..42aa4b5a --- /dev/null +++ b/skills/microsoft-foundry/foundry-agent/trace/references/conversation-detail.md @@ -0,0 +1,98 @@ +# Conversation Detail — Reconstruct Full Span Tree + +Reconstruct the complete span tree for a single conversation to see exactly what happened: every LLM call, tool execution, and agent invocation with timing, tokens, and errors. + +## Step 1 — Fetch All Spans for a Conversation + +Use `operation_Id` (trace ID) to get all spans in a single request: + +```kql +dependencies +| where operation_Id == "" +| project timestamp, name, duration, resultCode, success, + spanId = id, + parentSpanId = operation_ParentId, + operation = tostring(customDimensions["gen_ai.operation.name"]), + model = tostring(customDimensions["gen_ai.request.model"]), + responseModel = tostring(customDimensions["gen_ai.response.model"]), + inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]), + outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"]), + responseId = tostring(customDimensions["gen_ai.response.id"]), + finishReason = tostring(customDimensions["gen_ai.response.finish_reasons"]), + errorType = tostring(customDimensions["error.type"]), + toolName = tostring(customDimensions["gen_ai.tool.name"]), + toolCallId = tostring(customDimensions["gen_ai.tool.call.id"]) +| order by timestamp asc +``` + +Also fetch the parent request: + +```kql +requests +| where operation_Id == "" +| project timestamp, name, duration, resultCode, success, id, operation_ParentId +``` + +## Step 2 — Build Span Tree + +Use `spanId` and `parentSpanId` to reconstruct the hierarchy: + +``` +invoke_agent (root) ─── 4200ms +├── chat (LLM call #1) ─── 1800ms, gpt-4o, 450→120 tokens +│ └── [output: "Let me check the weather..."] +├── execute_tool (get_weather) [tool: remote_functions.weather_api] ─── 200ms +│ └── [result: "rainy, 57°F"] +├── chat (LLM call #2) ─── 1500ms, gpt-4o, 620→85 tokens +│ └── [output: "The weather in Paris is rainy, 57°F"] +└── [total: 450+620=1070 input, 120+85=205 output tokens] +``` + +Present as an indented tree with: +- **Operation type** and name +- **Duration** (highlight if > P95 for that operation type) +- **Model** and token counts (for chat operations) +- **Error type** and result code (if failed, highlight in red) +- **Finish reason** (stop, length, content_filter, tool_calls) + +## Step 3 — Extract Conversation Content from invoke_agent Spans + +The full input/output content lives on `invoke_agent` dependency spans in `gen_ai.input.messages` and `gen_ai.output.messages`. These JSON arrays contain the complete conversation (system prompt, user query, assistant response): + +```kql +dependencies +| where operation_Id == "" +| where customDimensions["gen_ai.operation.name"] == "invoke_agent" +| project timestamp, + inputMessages = tostring(customDimensions["gen_ai.input.messages"]), + outputMessages = tostring(customDimensions["gen_ai.output.messages"]) +| order by timestamp asc +``` + +Message structure: `[{"role": "user", "parts": [{"type": "text", "content": "..."}]}]` + +Also check the `traces` table for additional GenAI log events: + +```kql +traces +| where operation_Id == "" +| where message contains "gen_ai" +| project timestamp, message, customDimensions +| order by timestamp asc +``` + +## Step 4 — Check for Exceptions + +```kql +exceptions +| where operation_Id == "" +| project timestamp, type, message, outerMessage, + details = parse_json(details) +| order by timestamp asc +``` + +Present exceptions inline in the span tree at their position in the timeline. + +## Step 5 — Fetch Evaluation Results + +See [Eval Correlation](eval-correlation.md) for the full workflow to look up evaluation scores by response ID or conversation ID. Use `gen_ai.response.id` values from Step 1 spans to correlate. diff --git a/skills/microsoft-foundry/foundry-agent/trace/references/eval-correlation.md b/skills/microsoft-foundry/foundry-agent/trace/references/eval-correlation.md new file mode 100644 index 00000000..ef2430cc --- /dev/null +++ b/skills/microsoft-foundry/foundry-agent/trace/references/eval-correlation.md @@ -0,0 +1,57 @@ +# Eval Correlation — Find Evaluation Results by Response or Conversation ID + +Look up evaluation scores for a specific agent response using App Insights. + +> **IMPORTANT:** The Foundry evaluation API does NOT support querying by response ID or conversation ID. App Insights `customEvents` is the ONLY way to correlate eval scores to specific responses. Always use this KQL approach when the user asks for eval results for a specific response or conversation. + +## Prerequisites + +- App Insights resource resolved (see [trace.md](../trace.md) Before Starting) +- A response ID (`gen_ai.response.id`) or conversation ID (`gen_ai.conversation.id`) from a previous trace query + +## Search by Response ID + +```kql +customEvents +| where timestamp > ago(30d) +| where name == "gen_ai.evaluation.result" +| where customDimensions["gen_ai.response.id"] == "" +| extend + evalName = tostring(customDimensions["gen_ai.evaluation.name"]), + score = todouble(customDimensions["gen_ai.evaluation.score.value"]), + label = tostring(customDimensions["gen_ai.evaluation.score.label"]), + explanation = tostring(customDimensions["gen_ai.evaluation.explanation"]), + responseId = tostring(customDimensions["gen_ai.response.id"]), + conversationId = tostring(customDimensions["gen_ai.conversation.id"]) +| project timestamp, evalName, score, label, explanation, responseId, conversationId +| order by evalName asc +``` + +## Search by Conversation ID + +```kql +customEvents +| where timestamp > ago(30d) +| where name == "gen_ai.evaluation.result" +| where customDimensions["gen_ai.conversation.id"] == "" +| extend + evalName = tostring(customDimensions["gen_ai.evaluation.name"]), + score = todouble(customDimensions["gen_ai.evaluation.score.value"]), + label = tostring(customDimensions["gen_ai.evaluation.score.label"]), + explanation = tostring(customDimensions["gen_ai.evaluation.explanation"]), + responseId = tostring(customDimensions["gen_ai.response.id"]) +| project timestamp, evalName, score, label, explanation, responseId +| order by responseId asc, evalName asc +``` + +## Present Results + +Show eval scores as a table: + +| Evaluator | Score | Label | Explanation | +|-----------|-------|-------|-------------| +| coherence | 5.0 | pass | Response is well-structured... | +| fluency | 4.0 | pass | Natural language flow... | +| relevance | 2.0 | fail | Response doesn't address... | + +When showing alongside a span tree (see [Conversation Detail](conversation-detail.md)), attach eval scores to the span whose `gen_ai.response.id` matches. diff --git a/skills/microsoft-foundry/foundry-agent/trace/references/kql-templates.md b/skills/microsoft-foundry/foundry-agent/trace/references/kql-templates.md new file mode 100644 index 00000000..9dc9a737 --- /dev/null +++ b/skills/microsoft-foundry/foundry-agent/trace/references/kql-templates.md @@ -0,0 +1,203 @@ +# KQL Templates — GenAI Trace Query Reference + +Ready-to-use KQL templates for querying GenAI OpenTelemetry traces in Application Insights. + +**Table of Contents:** [App Insights Table Mapping](#app-insights-table-mapping) · [Key GenAI OTel Attributes](#key-genai-otel-attributes) · [Span Correlation](#span-correlation) · [Hosted Agent Attributes](#hosted-agent-attributes) · [Response ID Formats](#response-id-formats) · [Common Query Templates](#common-query-templates) · [OTel Reference Links](#otel-reference-links) + +## App Insights Table Mapping + +| App Insights Table | GenAI Data | +|-------------------|------------| +| `dependencies` | GenAI spans: LLM inference (`chat`), tool execution (`execute_tool`), agent invocation (`invoke_agent`) | +| `requests` | Incoming HTTP requests to the agent endpoint. For hosted agents, also carries `gen_ai.agent.name` (Foundry name) and `azure.ai.agentserver.*` attributes — **preferred entry point** for agent-name filtering | +| `customEvents` | GenAI evaluation results (`gen_ai.evaluation.result`) — scores, labels, explanations | +| `traces` | Log events, including GenAI events (input/output messages) | +| `exceptions` | Error details with stack traces | + +## Key GenAI OTel Attributes + +Stored in `customDimensions` on `dependencies` spans: + +| Attribute | Description | Example | +|-----------|-------------|---------| +| `gen_ai.operation.name` | Operation type | `chat`, `invoke_agent`, `execute_tool`, `create_agent` | +| `gen_ai.conversation.id` | Conversation/session ID | `conv_5j66UpCpwteGg4YSxUnt7lPY` | +| `gen_ai.response.id` | Response ID | `chatcmpl-123` | +| `gen_ai.agent.name` | Agent name | `my-support-agent` | +| `gen_ai.agent.id` | Agent unique ID | `asst_abc123` | +| `gen_ai.request.model` | Requested model | `gpt-4o` | +| `gen_ai.response.model` | Actual model used | `gpt-4o-2024-05-13` | +| `gen_ai.usage.input_tokens` | Input token count | `450` | +| `gen_ai.usage.output_tokens` | Output token count | `120` | +| `gen_ai.response.finish_reasons` | Stop reasons | `["stop"]`, `["tool_calls"]` | +| `error.type` | Error classification | `timeout`, `rate_limited`, `content_filter` | +| `gen_ai.provider.name` | Provider | `azure.ai.openai`, `openai` | +| `gen_ai.input.messages` | Full input messages (JSON array) — on `invoke_agent` spans | `[{"role":"user","parts":[{"type":"text","content":"..."}]}]` | +| `gen_ai.output.messages` | Full output messages (JSON array) — on `invoke_agent` spans | `[{"role":"assistant","parts":[{"type":"text","content":"..."}]}]` | + +Stored in `customDimensions` on `customEvents` (name == `gen_ai.evaluation.result`): + +| Attribute | Description | Example | +|-----------|-------------|---------| +| `gen_ai.evaluation.name` | Evaluator name | `Relevance`, `IntentResolution` | +| `gen_ai.evaluation.score.value` | Numeric score | `4.0` | +| `gen_ai.evaluation.score.label` | Human-readable label | `pass`, `fail`, `relevant` | +| `gen_ai.evaluation.explanation` | Free-form explanation | `"Response lacks detail..."` | +| `gen_ai.response.id` | Correlates to the evaluated span | `chatcmpl-123` | +| `gen_ai.conversation.id` | Correlates to conversation | `conv_5j66...` | + +> **Correlation:** Eval results do NOT link via id-parentId. Use `gen_ai.conversation.id` and/or `gen_ai.response.id` to join with `dependencies` spans. + +## Span Correlation + +| Field | Purpose | +|-------|---------| +| `operation_Id` | Trace ID — groups all spans in one request | +| `id` | Span ID — unique identifier for this span | +| `operation_ParentId` | Parent span ID — use with `id` to build span trees | + +### Parent-Child Join (requests → dependencies) + +Use `operation_ParentId` to find child dependency spans from a parent request. This is critical for hosted agents where the Foundry agent name only lives on the parent `requests` span: + +```kql +let reqIds = requests +| where timestamp > ago(7d) +| where customDimensions["gen_ai.agent.name"] == "" +| distinct id; +dependencies +| where timestamp > ago(7d) +| where operation_ParentId in (reqIds) +| extend + operation = tostring(customDimensions["gen_ai.operation.name"]), + model = tostring(customDimensions["gen_ai.request.model"]), + conversationId = tostring(customDimensions["gen_ai.conversation.id"]) +| project timestamp, duration, success, operation, model, conversationId, operation_ParentId +| order by timestamp desc +``` + +## Hosted Agent Attributes + +Stored in `customDimensions` on **both `requests` and `traces`** tables (NOT on `dependencies` spans): + +| Attribute | Description | Example | +|-----------|-------------|---------| +| `azure.ai.agentserver.agent_name` | Hosted agent name | `hosted-agent-022-001` | +| `azure.ai.agentserver.agent_id` | Internal agent ID | `code-asst-xmwokux85uqc7fodxejaxa` | +| `azure.ai.agentserver.conversation_id` | Conversation ID | `conv_d7ab624de92d...` | +| `azure.ai.agentserver.response_id` | Response ID (caresp format) | `caresp_d7ab624de92d...` | + +> **Important:** Use `requests` as the preferred entry point for agent-name filtering — it has both `azure.ai.agentserver.agent_name` and `gen_ai.agent.name` with the Foundry-level name. To reach child `dependencies` spans, join via `requests.id` → `dependencies.operation_ParentId`. + +> ⚠️ **`gen_ai.agent.name` means different things on different tables:** +> - On `requests`: the **Foundry agent name** (user-visible) → e.g., `hosted-agent-022-001` +> - On `dependencies`: the **code-level class name** → e.g., `BingSearchAgent` +> +> **Always start from `requests`** when filtering by the Foundry agent name the user knows. + +## Response ID Formats + +| Agent Type | Prefix | Example | +|------------|--------|---------| +| Hosted agent (AgentServer) | `caresp_` | `caresp_d7ab624de92da637008Rhr4U4E1y9FSE...` | +| Prompt agent (Foundry Responses API) | `resp_` | `resp_4e2f8b016b5a0dad00697bd3c4c1b881...` | +| Azure OpenAI chat completions | `chatcmpl-` | `chatcmpl-abc123def456` | + +When searching by response ID, use the appropriate prefix to narrow results. The `gen_ai.response.id` attribute appears on `dependencies` spans (for `chat` operations) and in `customEvents` (for evaluation results). + +## Common Query Templates + +### Overview — Conversations in last 24h +```kql +dependencies +| where timestamp > ago(24h) +| where isnotempty(customDimensions["gen_ai.operation.name"]) +| summarize + spanCount = count(), + errorCount = countif(success == false), + avgDuration = avg(duration), + totalInputTokens = sum(toint(customDimensions["gen_ai.usage.input_tokens"])), + totalOutputTokens = sum(toint(customDimensions["gen_ai.usage.output_tokens"])) + by bin(timestamp, 1h) +| order by timestamp desc +``` + +### Error Rate by Operation +```kql +dependencies +| where timestamp > ago(24h) +| where isnotempty(customDimensions["gen_ai.operation.name"]) +| summarize + total = count(), + errors = countif(success == false), + errorRate = round(100.0 * countif(success == false) / count(), 1) + by operation = tostring(customDimensions["gen_ai.operation.name"]) +| order by errorRate desc +``` + +### Token Usage by Model +```kql +dependencies +| where timestamp > ago(24h) +| where customDimensions["gen_ai.operation.name"] == "chat" +| summarize + calls = count(), + totalInput = sum(toint(customDimensions["gen_ai.usage.input_tokens"])), + totalOutput = sum(toint(customDimensions["gen_ai.usage.output_tokens"])), + avgInput = avg(todouble(customDimensions["gen_ai.usage.input_tokens"])), + avgOutput = avg(todouble(customDimensions["gen_ai.usage.output_tokens"])) + by model = tostring(customDimensions["gen_ai.request.model"]) +| order by totalInput desc +``` + +### Tool Call Details +```kql +dependencies +| where operation_Id == "" +| where customDimensions["gen_ai.operation.name"] == "execute_tool" +| project timestamp, duration, success, + toolName = tostring(customDimensions["gen_ai.tool.name"]), + toolType = tostring(customDimensions["gen_ai.tool.type"]), + toolCallId = tostring(customDimensions["gen_ai.tool.call.id"]), + toolArgs = tostring(customDimensions["gen_ai.tool.call.arguments"]), + toolResult = tostring(customDimensions["gen_ai.tool.call.result"]) +| order by timestamp asc +``` + +Key tool attributes: + +| Attribute | Description | Example | +|-----------|-------------|---------| +| `gen_ai.tool.name` | Tool function name | `remote_functions.bing_grounding`, `python` | +| `gen_ai.tool.type` | Tool type | `extension`, `function` | +| `gen_ai.tool.call.id` | Unique call ID | `call_db64aa6a004a...` | +| `gen_ai.tool.call.arguments` | JSON arguments passed | `{"query": "latest AI news"}` | +| `gen_ai.tool.call.result` | Tool output (may be truncated) | `<>` | + +### Evaluation Results by Conversation +```kql +customEvents +| where timestamp > ago(24h) +| where name == "gen_ai.evaluation.result" +| extend + evalName = tostring(customDimensions["gen_ai.evaluation.name"]), + score = todouble(customDimensions["gen_ai.evaluation.score.value"]), + label = tostring(customDimensions["gen_ai.evaluation.score.label"]), + conversationId = tostring(customDimensions["gen_ai.conversation.id"]) +| summarize + evalCount = count(), + avgScore = avg(score), + failCount = countif(label == "fail" or label == "not_relevant" or label == "incorrect"), + evaluators = make_set(evalName) + by conversationId +| order by failCount desc +``` + +> For detailed eval queries by response ID or conversation ID, see [Eval Correlation](eval-correlation.md). + +## OTel Reference Links + +- [GenAI Spans](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/) +- [GenAI Agent Spans](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/) +- [GenAI Events](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-events/) +- [GenAI Metrics](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-metrics/) diff --git a/skills/microsoft-foundry/foundry-agent/trace/references/search-traces.md b/skills/microsoft-foundry/foundry-agent/trace/references/search-traces.md new file mode 100644 index 00000000..a663035e --- /dev/null +++ b/skills/microsoft-foundry/foundry-agent/trace/references/search-traces.md @@ -0,0 +1,141 @@ +# Search Traces — Conversation-Level Search + +Search agent traces at the conversation level. Returns summaries grouped by conversation or operation, not individual spans. + +## Prerequisites + +- App Insights resource resolved (see [trace.md](../trace.md) Before Starting) +- Time range confirmed with user (default: last 24 hours) + +## Search by Conversation ID + +```kql +dependencies +| where timestamp > ago(24h) +| where customDimensions["gen_ai.conversation.id"] == "" +| project timestamp, name, duration, resultCode, success, + operation = tostring(customDimensions["gen_ai.operation.name"]), + model = tostring(customDimensions["gen_ai.request.model"]), + inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]), + outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"]), + operation_Id, id, operation_ParentId +| order by timestamp asc +``` + +## Search by Response ID + +Auto-detect the response ID format to determine agent type: +- `caresp_...` → Hosted agent (AgentServer) +- `resp_...` → Prompt agent (Foundry Responses API) +- `chatcmpl-...` → Azure OpenAI chat completions + +```kql +dependencies +| where timestamp > ago(24h) +| where customDimensions["gen_ai.response.id"] == "" +| project timestamp, name, duration, resultCode, success, + operation = tostring(customDimensions["gen_ai.operation.name"]), + model = tostring(customDimensions["gen_ai.request.model"]), + inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]), + outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"]), + operation_Id, id, operation_ParentId +``` + +Then drill into the full conversation: + +> ⚠️ **STOP — read [Conversation Detail](conversation-detail.md) before writing your own drill-down query.** It contains the correct span tree reconstruction logic, event/exception queries, and eval correlation steps. + +Quick drill-down using the `operation_Id` from above: + +```kql +dependencies +| where operation_Id == "" +| project timestamp, name, duration, resultCode, success, + spanId = id, parentSpanId = operation_ParentId, + operation = tostring(customDimensions["gen_ai.operation.name"]), + model = tostring(customDimensions["gen_ai.request.model"]), + inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]), + outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"]), + responseId = tostring(customDimensions["gen_ai.response.id"]), + errorType = tostring(customDimensions["error.type"]), + toolName = tostring(customDimensions["gen_ai.tool.name"]) +| order by timestamp asc +``` + +Also check for eval results: see [Eval Correlation](eval-correlation.md). + +## Search by Agent Name + +> **Note:** For hosted agents, `gen_ai.agent.name` in `dependencies` refers to *sub-agents* (e.g., `BingSearchAgent`), not the top-level hosted agent. See "Search by Hosted Agent Name" below. + +```kql +dependencies +| where timestamp > ago(24h) +| where customDimensions["gen_ai.agent.name"] == "" + or customDimensions["gen_ai.agent.id"] == "" +| summarize + startTime = min(timestamp), + endTime = max(timestamp), + totalDuration = max(timestamp) - min(timestamp), + spanCount = count(), + errorCount = countif(success == false), + totalInputTokens = sum(toint(customDimensions["gen_ai.usage.input_tokens"])), + totalOutputTokens = sum(toint(customDimensions["gen_ai.usage.output_tokens"])) + by conversationId = tostring(customDimensions["gen_ai.conversation.id"]), + operation_Id +| order by startTime desc +| take 50 +``` + +## Search by Hosted Agent Name + +For hosted agents, the Foundry agent name (e.g., `hosted-agent-022-001`) appears on both `requests` and `traces` tables — NOT on `dependencies`. Use `requests` as the preferred entry point since it also has `gen_ai.agent.name`: + +```kql +let reqIds = requests +| where timestamp > ago(24h) +| where customDimensions["gen_ai.agent.name"] == "" +| distinct id; +dependencies +| where timestamp > ago(24h) +| where operation_ParentId in (reqIds) +| where isnotempty(customDimensions["gen_ai.operation.name"]) +| summarize + startTime = min(timestamp), + endTime = max(timestamp), + spanCount = count(), + errorCount = countif(success == false), + totalInputTokens = sum(toint(customDimensions["gen_ai.usage.input_tokens"])), + totalOutputTokens = sum(toint(customDimensions["gen_ai.usage.output_tokens"])) + by operation_ParentId +| order by startTime desc +| take 50 +``` + +## Conversation Summary Table + +Present results in this format: + +| Conversation ID | Start Time | Duration | Spans | Errors | Input Tokens | Output Tokens | +|----------------|------------|----------|-------|--------|-------------|---------------| +| conv_abc123 | 2025-01-15 10:30 | 4.2s | 12 | 0 | 850 | 320 | +| conv_def456 | 2025-01-15 10:25 | 8.7s | 18 | 2 | 1200 | 450 | + +Highlight rows with errors in the summary. Offer to drill into any conversation via [Conversation Detail](conversation-detail.md). + +## Free-Text Search + +When the user provides a general search term (e.g., agent name, error message): + +```kql +union dependencies, requests, exceptions, traces +| where timestamp > ago(24h) +| where * contains "" +| summarize count() by operation_Id +| order by count_ desc +| take 20 +``` + +## After Successful Query + +> 📝 **Reminder:** If this is the first trace query in this session, ensure App Insights connection info was persisted to `.env` (see [trace.md — Before Starting](../trace.md#before-starting--resolve-app-insights-connection)). diff --git a/skills/microsoft-foundry/foundry-agent/trace/trace.md b/skills/microsoft-foundry/foundry-agent/trace/trace.md new file mode 100644 index 00000000..271cb84b --- /dev/null +++ b/skills/microsoft-foundry/foundry-agent/trace/trace.md @@ -0,0 +1,60 @@ +# Foundry Agent Trace Analysis + +Analyze production traces for Foundry agents using Application Insights and GenAI OpenTelemetry semantic conventions. This skill provides **structured KQL-powered workflows** for searching conversations, diagnosing failures, and identifying latency bottlenecks. Use this skill instead of writing ad-hoc KQL queries against App Insights manually. + +## When to Use This Skill + +USE FOR: analyze agent traces, search agent conversations, find failing traces, slow traces, latency analysis, trace search, conversation history, agent errors in production, debug agent responses, App Insights traces, GenAI telemetry, trace correlation, span tree, production trace analysis, evaluation results, evaluation scores, eval run results, find by response ID, get agent trace by conversation ID, agent evaluation scores from App Insights. + +> **USE THIS SKILL INSTEAD OF** `azure-monitor` or `azure-applicationinsights` when querying Foundry agent traces, evaluations, or GenAI telemetry. This skill has correct GenAI OTel attribute mappings and tested KQL templates that those general tools lack. + +> ⚠️ **DO NOT manually write KQL queries** for GenAI trace analysis **without reading this skill first.** This skill provides tested query templates with correct GenAI OTel attribute mappings, proper span correlation logic, and conversation-level aggregation patterns. + +## Quick Reference + +| Property | Value | +|----------|-------| +| Data source | Application Insights (App Insights) | +| Query language | KQL (Kusto Query Language) | +| Related skills | `troubleshoot` (container logs) | +| Preferred query tool | `monitor_resource_log_query` (Azure MCP) — use for App Insights KQL queries | +| OTel conventions | [GenAI Spans](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/), [Agent Spans](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/) | + +## Entry Points + +| User Intent | Start At | +|-------------|----------| +| "Search agent conversations" / "Find traces" | [Search Traces](references/search-traces.md) | +| "Tell me about response ID X" / "Look up response ID" | [Search Traces — Search by Response ID](references/search-traces.md#search-by-response-id) | +| "Why is my agent failing?" / "Find errors" | [Analyze Failures](references/analyze-failures.md) | +| "My agent is slow" / "Latency analysis" | [Analyze Latency](references/analyze-latency.md) | +| "Show me this conversation" / "Trace detail" | [Conversation Detail](references/conversation-detail.md) | +| "Find eval results for response ID" / "eval scores from traces" | [Eval Correlation](references/eval-correlation.md) | +| "What KQL do I need?" | [KQL Templates](references/kql-templates.md) | + +## Before Starting — Resolve App Insights Connection + +1. Check `.env` (or the same config file hosting other project variables) for `APPLICATIONINSIGHTS_CONNECTION_STRING` or `AZURE_APPINSIGHTS_RESOURCE_ID` +2. If not found, use `project_connection_list` (foundry-mcp tool) to discover App Insights linked to the Foundry project — this is the most reliable way to find the correct App Insights resource. Filter results for Application Insights connection type. +3. **IMMEDIATELY write back to `.env`** — as soon as `project_connection_list` returns App Insights info, write it to `.env` (or the same config file where `AZURE_AI_PROJECT_ENDPOINT` etc. live) BEFORE running any queries. Do not defer this step. This ensures future sessions skip discovery entirely. + +| Variable | Purpose | Example | +|----------|---------|---------| +| `APPLICATIONINSIGHTS_CONNECTION_STRING` | App Insights connection string | `InstrumentationKey=...;IngestionEndpoint=...` | +| `AZURE_APPINSIGHTS_RESOURCE_ID` | ARM resource ID | `/subscriptions/.../Microsoft.Insights/components/...` | + +If a `.env` file already exists, read it first and merge — do not overwrite existing values without confirmation. + +4. Confirm the App Insights resource with the user before querying +5. Use **`monitor_resource_log_query`** (Azure MCP tool) to execute KQL queries against the App Insights resource. This is preferred over delegating to the `azure-kusto` skill. Pass the App Insights resource ID and the KQL query directly. + +> ⚠️ **Always pass `subscription` explicitly** to Azure MCP tools like `monitor_resource_log_query` — they don't extract it from resource IDs. + +## Behavioral Rules + +1. **ALWAYS display the KQL query.** Before executing ANY KQL query, display it in a code block. Never run a query silently. This is a hard requirement, not a suggestion. Showing queries builds trust and helps users learn KQL patterns. +2. **Start broad, then narrow.** Begin with conversation-level summaries, then drill into specific conversations or spans on user request. +3. **Use time ranges.** Always scope queries with a time range (default: last 24 hours). Ask user for the range if not specified. +4. **Explain GenAI attributes.** When displaying results, translate OTel attribute names to human-readable labels (e.g., `gen_ai.operation.name` → "Operation"). +5. **Link to conversation detail.** When showing search or failure results, offer to drill into any specific conversation. +6. **Scope to the target agent.** An App Insights resource may contain traces from multiple agents. For hosted agents, start from the `requests` table where `gen_ai.agent.name` holds the Foundry-level name, then join to `dependencies` via `operation_ParentId`. For prompt agents, filter `dependencies` directly by `gen_ai.agent.name`. When showing overview summaries, group by agent and warn the user if multiple agents are present. diff --git a/skills/microsoft-foundry/foundry-agent/troubleshoot/troubleshoot.md b/skills/microsoft-foundry/foundry-agent/troubleshoot/troubleshoot.md new file mode 100644 index 00000000..f79762ea --- /dev/null +++ b/skills/microsoft-foundry/foundry-agent/troubleshoot/troubleshoot.md @@ -0,0 +1,96 @@ +# Foundry Agent Troubleshoot + +Troubleshoot and debug Foundry agents by collecting container logs, discovering observability connections, and querying Application Insights telemetry. + +## Quick Reference + +| Property | Value | +|----------|-------| +| Agent types | Prompt (LLM-based), Hosted (container-based) | +| MCP servers | `foundry-mcp` | +| Key MCP tools | `agent_get`, `agent_container_status_get` | +| Related skills | `trace` (telemetry analysis) | +| Preferred query tool | `monitor_resource_log_query` (Azure MCP) — preferred over `azure-kusto` for App Insights | +| CLI references | `az cognitiveservices agent logs`, `az cognitiveservices account connection` | + +## When to Use This Skill + +- Agent is not responding or returning errors +- Hosted agent container is failing to start +- Need to view container logs for a hosted agent +- Diagnose latency or timeout issues +- Query Application Insights for agent traces and exceptions +- Investigate agent runtime failures + +## MCP Tools + +| Tool | Description | Parameters | +|------|-------------|------------| +| `agent_get` | Get agent details to determine type (prompt/hosted) | `projectEndpoint` (required), `agentName` (optional) | +| `agent_container_status_get` | Check hosted agent container status | `projectEndpoint`, `agentName` (required); `agentVersion` | + +## Workflow + +### Step 1: Collect Agent Information + +Use the project endpoint and agent name from the project context (see Common: Project Context Resolution). Ask the user only for values not already resolved: +- **Project endpoint** — AI Foundry project endpoint URL +- **Agent name** — Name of the agent to troubleshoot + +### Step 2: Determine Agent Type + +Use `agent_get` with `projectEndpoint` and `agentName` to retrieve the agent definition. Check the `kind` field: +- `"hosted"` → Proceed to Step 3 (Container Logs) +- `"prompt"` → Skip to Step 4 (Discover Observability Connections) + +### Step 3: Retrieve Container Logs (Hosted Agents Only) + +First check the container status using `agent_container_status_get`. Report the current status to the user. + +Retrieve container logs using the Azure CLI command documented at: +[az cognitiveservices agent logs show](https://learn.microsoft.com/en-us/cli/azure/cognitiveservices/agent/logs?view=azure-cli-latest#az-cognitiveservices-agent-logs-show) + +Refer to the documentation above for the exact command syntax and parameters. Present the logs to the user and highlight any errors or warnings found. + +### Step 4: Discover Observability Connections + +List the project connections to find Application Insights or Azure Monitor resources using the Azure CLI command documented at: +[az cognitiveservices account connection](https://learn.microsoft.com/en-us/cli/azure/cognitiveservices/account/connection?view=azure-cli-latest) + +Refer to the documentation above for the exact command syntax and parameters. Look for connections of type `ApplicationInsights` or `AzureMonitor` in the output. + +If no observability connection is found, inform the user and suggest setting up Application Insights for the project. Ask if they want to proceed without telemetry data. + +### Step 5: Query Application Insights Telemetry + +Use **`monitor_resource_log_query`** (Azure MCP tool) to run KQL queries against the Application Insights resource discovered in Step 4. This is preferred over delegating to the `azure-kusto` skill. Pass the App Insights resource ID and the KQL query directly. + +> ⚠️ **Always pass `subscription` explicitly** to Azure MCP tools like `monitor_resource_log_query` — they don't extract it from resource IDs. + +Use `* contains ""` or `* contains ""` filters to narrow down results to the specific agent instance. + +### Step 6: Summarize Findings + +Present a summary to the user including: +- **Agent type and status** — hosted/prompt, container status (if hosted) +- **Container log errors** — key errors from logs (hosted only) +- **Telemetry insights** — exceptions, failed requests, latency trends +- **Recommended actions** — specific steps to resolve identified issues + +## Error Handling + +| Error | Cause | Resolution | +|-------|-------|------------| +| Agent not found | Invalid agent name or project endpoint | Use `agent_get` to list available agents and verify name | +| Container logs unavailable | Agent is a prompt agent or container never started | Prompt agents don't have container logs — skip to telemetry | +| No observability connection | Application Insights not configured for the project | Suggest configuring Application Insights for the Foundry project | +| Kusto query failed | Invalid cluster/database or insufficient permissions | Verify Application Insights resource details and reader permissions | +| No telemetry data | Agent not instrumented or too recent | Check if Application Insights SDK is configured; data may take a few minutes to appear | + +## Additional Resources + +- [Foundry Hosted Agents](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/hosted-agents?view=foundry) +- [Agent Logs CLI Reference](https://learn.microsoft.com/en-us/cli/azure/cognitiveservices/agent/logs?view=azure-cli-latest) +- [Account Connection CLI Reference](https://learn.microsoft.com/en-us/cli/azure/cognitiveservices/account/connection?view=azure-cli-latest) +- [KQL Quick Reference](https://learn.microsoft.com/azure/data-explorer/kusto/query/kql-quick-reference) +- [Foundry Samples](https://github.com/azure-ai-foundry/foundry-samples) diff --git a/skills/microsoft-foundry/models/deploy-model/SKILL.md b/skills/microsoft-foundry/models/deploy-model/SKILL.md new file mode 100644 index 00000000..46b9f01e --- /dev/null +++ b/skills/microsoft-foundry/models/deploy-model/SKILL.md @@ -0,0 +1,144 @@ +--- +name: deploy-model +description: "Unified Azure OpenAI model deployment skill with intelligent intent-based routing. Handles quick preset deployments, fully customized deployments (version/SKU/capacity/RAI policy), and capacity discovery across regions and projects. USE FOR: deploy model, deploy gpt, create deployment, model deployment, deploy openai model, set up model, provision model, find capacity, check model availability, where can I deploy, best region for model, capacity analysis. DO NOT USE FOR: listing existing deployments (use foundry_models_deployments_list MCP tool), deleting deployments, agent creation (use agent/create), project creation (use project/create)." +license: MIT +metadata: + author: Microsoft + version: "1.0.0" +--- + +# Deploy Model + +Unified entry point for all Azure OpenAI model deployment workflows. Analyzes user intent and routes to the appropriate deployment mode. + +## Quick Reference + +| Mode | When to Use | Sub-Skill | +|------|-------------|-----------| +| **Preset** | Quick deployment, no customization needed | [preset/SKILL.md](preset/SKILL.md) | +| **Customize** | Full control: version, SKU, capacity, RAI policy | [customize/SKILL.md](customize/SKILL.md) | +| **Capacity Discovery** | Find where you can deploy with specific capacity | [capacity/SKILL.md](capacity/SKILL.md) | + +## Intent Detection + +Analyze the user's prompt and route to the correct mode: + +``` +User Prompt + │ + ├─ Simple deployment (no modifiers) + │ "deploy gpt-4o", "set up a model" + │ └─> PRESET mode + │ + ├─ Customization keywords present + │ "custom settings", "choose version", "select SKU", + │ "set capacity to X", "configure content filter", + │ "PTU deployment", "with specific quota" + │ └─> CUSTOMIZE mode + │ + ├─ Capacity/availability query + │ "find where I can deploy", "check capacity", + │ "which region has X capacity", "best region for 10K TPM", + │ "where is this model available" + │ └─> CAPACITY DISCOVERY mode + │ + └─ Ambiguous (has capacity target + deploy intent) + "deploy gpt-4o with 10K capacity to best region" + └─> CAPACITY DISCOVERY first → then PRESET or CUSTOMIZE +``` + +### Routing Rules + +| Signal in Prompt | Route To | Reason | +|------------------|----------|--------| +| Just model name, no options | **Preset** | User wants quick deployment | +| "custom", "configure", "choose", "select" | **Customize** | User wants control | +| "find", "check", "where", "which region", "available" | **Capacity** | User wants discovery | +| Specific capacity number + "best region" | **Capacity → Preset** | Discover then deploy quickly | +| Specific capacity number + "custom" keywords | **Capacity → Customize** | Discover then deploy with options | +| "PTU", "provisioned throughput" | **Customize** | PTU requires SKU selection | +| "optimal region", "best region" (no capacity target) | **Preset** | Region optimization is preset's specialty | + +### Multi-Mode Chaining + +Some prompts require two modes in sequence: + +**Pattern: Capacity → Deploy** +When a user specifies a capacity requirement AND wants deployment: +1. Run **Capacity Discovery** to find regions/projects with sufficient quota +2. Present findings to user +3. Ask: "Would you like to deploy with **quick defaults** or **customize settings**?" +4. Route to **Preset** or **Customize** based on answer + +> 💡 **Tip:** If unsure which mode the user wants, default to **Preset** (quick deployment). Users who want customization will typically use explicit keywords like "custom", "configure", or "with specific settings". + +## Project Selection (All Modes) + +Before any deployment, resolve which project to deploy to. This applies to **all** modes (preset, customize, and after capacity discovery). + +### Resolution Order + +1. **Check `PROJECT_RESOURCE_ID` env var** — if set, use it as the default +2. **Check user prompt** — if user named a specific project or region, use that +3. **If neither** — query the user's projects and suggest the current one + +### Confirmation Step (Required) + +**Always confirm the target before deploying.** Show the user what will be used and give them a chance to change it: + +``` +Deploying to: + Project: + Region: + Resource: + +Is this correct? Or choose a different project: + 1. ✅ Yes, deploy here (default) + 2. 📋 Show me other projects in this region + 3. 🌍 Choose a different region +``` + +If user picks option 2, show top 5 projects in that region: + +``` +Projects in : + 1. project-alpha (rg-alpha) + 2. project-beta (rg-beta) + 3. project-gamma (rg-gamma) + ... +``` + +> ⚠️ **Never deploy without showing the user which project will be used.** This prevents accidental deployments to the wrong resource. + +## Pre-Deployment Validation (All Modes) + +Before presenting any deployment options (SKU, capacity), always validate both of these: + +1. **Model supports the SKU** — query the model catalog to confirm the selected model+version supports the target SKU: + ```bash + az cognitiveservices model list --location --subscription -o json + ``` + Filter for the model, extract `.model.skus[].name` to get supported SKUs. + +2. **Subscription has available quota** — check that the user's subscription has unallocated quota for the SKU+model combination: + ```bash + az cognitiveservices usage list --location --subscription -o json + ``` + Match by usage name pattern `OpenAI..` (e.g., `OpenAI.GlobalStandard.gpt-4o`). Compute `available = limit - currentValue`. + +> ⚠️ **Warning:** Only present options that pass both checks. Do NOT show hardcoded SKU lists — always query dynamically. SKUs with 0 available quota should be shown as ❌ informational items, not selectable options. + +> 💡 **Quota management:** For quota increase requests, usage monitoring, and troubleshooting quota errors, defer to the [quota skill](../../quota/quota.md) instead of duplicating that guidance inline. + +## Prerequisites + +All deployment modes require: +- Azure CLI installed and authenticated (`az login`) +- Active Azure subscription with deployment permissions +- Azure AI Foundry project resource ID (or agent will help discover it via `PROJECT_RESOURCE_ID` env var) + +## Sub-Skills + +- **[preset/SKILL.md](preset/SKILL.md)** — Quick deployment to optimal region with sensible defaults +- **[customize/SKILL.md](customize/SKILL.md)** — Interactive guided flow with full configuration control +- **[capacity/SKILL.md](capacity/SKILL.md)** — Discover available capacity across regions and projects diff --git a/skills/microsoft-foundry/models/deploy-model/TEST_PROMPTS.md b/skills/microsoft-foundry/models/deploy-model/TEST_PROMPTS.md new file mode 100644 index 00000000..cbba68e1 --- /dev/null +++ b/skills/microsoft-foundry/models/deploy-model/TEST_PROMPTS.md @@ -0,0 +1,78 @@ +# Deploy Model — Test Prompts + +Test prompts for the unified `deploy-model` skill with router, preset, customize, and capacity sub-skills. + +## Preset Mode (Quick Deploy) + +| # | Prompt | Expected | +|---|--------|----------| +| 1 | Deploy gpt-4o | Preset — confirm project, deploy with defaults | +| 2 | Set up o3-mini for me | Preset — pick latest version automatically | +| 3 | I need a text-embedding-ada-002 deployment | Preset — non-chat model | +| 4 | Deploy gpt-4o to the best region | Preset — region scan, no capacity target | + +## Customize Mode (Guided Flow) + +| # | Prompt | Expected | +|---|--------|----------| +| 5 | Deploy gpt-4o with custom settings | Customize — walk through version → SKU → capacity → RAI | +| 6 | I want to choose the version and SKU for my o3-mini deployment | Customize — explicit keywords | +| 7 | Set up a PTU deployment for gpt-4o | Customize — PTU requires SKU selection | +| 8 | Deploy gpt-4o with a specific content filter | Customize — RAI policy flow | + +## Capacity Discovery + +| # | Prompt | Expected | +|---|--------|----------| +| 9 | Where can I deploy gpt-4o? | Capacity — show regions, no deploy | +| 10 | Which regions have o3-mini available? | Capacity — run script, show table | +| 11 | Check if I have enough quota for gpt-4o with 500K TPM | Capacity — high target, some regions may not qualify | + +## Chained (Capacity → Deploy) + +| # | Prompt | Expected | +|---|--------|----------| +| 12 | Find me the best region and project to deploy gpt-4o with 10K capacity | Capacity → Preset | +| 13 | Deploy o3-mini with 200K TPM to whatever region has it | Capacity → Preset | +| 14 | I want to deploy gpt-4o with 50K capacity and choose my own settings | Capacity → Customize | + +## Negative / Edge Cases + +| # | Prompt | Expected | +|---|--------|----------| +| 15 | Deploy unicorn-model-9000 | Fail gracefully — model doesn't exist | +| 16 | Deploy gpt-4o with 999999K TPM | Capacity shows no region qualifies | +| 17 | Deploy gpt-4o (with az login expired) | Auth error caught early | +| 18 | Delete my gpt-4o deployment | Should NOT trigger deploy-model | +| 19 | List my current deployments | Should NOT trigger deploy-model | +| 20 | Deploy gpt-4o to mars-region-1 | Fail gracefully — invalid region | + +## Project Selection + +| # | Prompt | Expected | +|---|--------|----------| +| 21 | Deploy gpt-4o (with PROJECT_RESOURCE_ID set) | Show current project, confirm before deploying | +| 22 | Deploy gpt-4o (no PROJECT_RESOURCE_ID) | Ask user to pick a project | +| 23 | Deploy gpt-4o to project my-special-project | Use named project directly | + +## Ambiguous / Routing Stress + +| # | Prompt | Expected | +|---|--------|----------| +| 24 | Help me with model deployment | Preset (default) — vague, no keywords | +| 25 | I need gpt-4o deployed fast with good capacity | Preset — "fast" + vague capacity | +| 26 | Can you configure a deployment? | Customize — "configure" keyword, should ask which model | +| 27 | What's the best way to deploy gpt-4o with 100K? | Capacity → Preset | + +## Automated Test Results (2026-02-09) + +All 18 tests passed. Deployments created during testing were cleaned up. + +| Category | Tests | Result | +|----------|-------|--------| +| Preset | 3/3 | ✅ | +| Customize | 2/2 | ✅ | +| Capacity | 3/3 | ✅ | +| Chained | 1/1 | ✅ | +| Negative | 5/5 | ✅ | +| Ambiguous | 4/4 | ✅ | diff --git a/skills/microsoft-foundry/models/deploy-model/capacity/SKILL.md b/skills/microsoft-foundry/models/deploy-model/capacity/SKILL.md new file mode 100644 index 00000000..46935315 --- /dev/null +++ b/skills/microsoft-foundry/models/deploy-model/capacity/SKILL.md @@ -0,0 +1,146 @@ +--- +name: capacity +description: "Discovers available Azure OpenAI model capacity across regions and projects. Analyzes quota limits, compares availability, and recommends optimal deployment locations based on capacity requirements. USE FOR: find capacity, check quota, where can I deploy, capacity discovery, best region for capacity, multi-project capacity search, quota analysis, model availability, region comparison, check TPM availability. DO NOT USE FOR: actual deployment (hand off to preset or customize after discovery), quota increase requests (direct user to Azure Portal), listing existing deployments." +license: MIT +metadata: + author: Microsoft + version: "1.0.0" +--- + +# Capacity Discovery + +Finds available Azure OpenAI model capacity across all accessible regions and projects. Recommends the best deployment location based on capacity requirements. + +## Quick Reference + +| Property | Description | +|----------|-------------| +| **Purpose** | Find where you can deploy a model with sufficient capacity | +| **Scope** | All regions and projects the user has access to | +| **Output** | Ranked table of regions/projects with available capacity | +| **Action** | Read-only analysis — does NOT deploy. Hands off to preset or customize | +| **Authentication** | Azure CLI (`az login`) | + +## When to Use This Skill + +- ✅ User asks "where can I deploy gpt-4o?" +- ✅ User specifies a capacity target: "find a region with 10K TPM for gpt-4o" +- ✅ User wants to compare availability: "which regions have gpt-4o available?" +- ✅ User got a quota error and needs to find an alternative location +- ✅ User asks "best region and project for deploying model X" + +**After discovery → hand off to [preset](../preset/SKILL.md) or [customize](../customize/SKILL.md) for actual deployment.** + +## Scripts + +Pre-built scripts handle the complex REST API calls and data processing. Use these instead of constructing commands manually. + +| Script | Purpose | Usage | +|--------|---------|-------| +| `scripts/discover_and_rank.ps1` | Full discovery: capacity + projects + ranking | Primary script for capacity discovery | +| `scripts/discover_and_rank.sh` | Same as above (bash) | Primary script for capacity discovery | +| `scripts/query_capacity.ps1` | Raw capacity query (no project matching) | Quick capacity check or version listing | +| `scripts/query_capacity.sh` | Same as above (bash) | Quick capacity check or version listing | + +## Workflow + +### Phase 1: Validate Prerequisites + +```bash +az account show --query "{Subscription:name, SubscriptionId:id}" --output table +``` + +### Phase 2: Identify Model and Version + +Extract model name from user prompt. If version is unknown, query available versions: + +```powershell +.\scripts\query_capacity.ps1 -ModelName +``` +```bash +./scripts/query_capacity.sh +``` + +This lists available versions. Use the latest version unless user specifies otherwise. + +### Phase 3: Run Discovery + +Run the full discovery script with model name, version, and minimum capacity target: + +```powershell +.\scripts\discover_and_rank.ps1 -ModelName -ModelVersion -MinCapacity +``` +```bash +./scripts/discover_and_rank.sh +``` + +> 💡 The script automatically queries capacity across ALL regions, cross-references with the user's existing projects, and outputs a ranked table sorted by: meets target → project count → available capacity. + +### Phase 3.5: Validate Subscription Quota + +After discovery identifies candidate regions, validate that the user's subscription actually has available quota in each region. Model capacity (from Phase 3) shows what the platform can support, but subscription quota limits what this specific user can deploy. + +```powershell +# For each candidate region from discovery results: +$usageData = az cognitiveservices usage list --location --subscription $SUBSCRIPTION_ID -o json 2>$null | ConvertFrom-Json + +# Check quota for each SKU the model supports +# Quota names follow pattern: OpenAI.. +$usageEntry = $usageData | Where-Object { $_.name.value -eq "OpenAI.." } + +if ($usageEntry) { + $quotaAvailable = $usageEntry.limit - $usageEntry.currentValue +} else { + $quotaAvailable = 0 # No quota allocated +} +``` +```bash +# For each candidate region from discovery results: +usage_json=$(az cognitiveservices usage list --location --subscription "$SUBSCRIPTION_ID" -o json 2>/dev/null) + +# Extract quota for specific SKU+model +quota_available=$(echo "$usage_json" | jq -r --arg name "OpenAI.." \ + '.[] | select(.name.value == $name) | .limit - .currentValue') +``` + +**Annotate discovery results:** + +Add a "Quota Available" column to the ranked output from Phase 3: + +| Region | Available Capacity | Meets Target | Projects | Quota Available | +|--------|-------------------|--------------|----------|-----------------| +| eastus2 | 120K TPM | ✅ | 3 | ✅ 80K | +| westus3 | 90K TPM | ✅ | 1 | ❌ 0 (at limit) | +| swedencentral | 100K TPM | ✅ | 0 | ✅ 100K | + +Regions/SKUs where `quotaAvailable = 0` should be marked with ❌ in the results. If no region has available quota, hand off to the [quota skill](../../../quota/quota.md) for increase requests and troubleshooting. + +### Phase 4: Present Results and Hand Off + +After the script outputs the ranked table (now annotated with quota info), present it to the user and ask: + +1. 🚀 **Quick deploy** to top recommendation with defaults → route to [preset](../preset/SKILL.md) +2. ⚙️ **Custom deploy** with version/SKU/capacity/RAI selection → route to [customize](../customize/SKILL.md) +3. 📊 **Check another model** or capacity target → re-run Phase 2 +4. ❌ Cancel + +### Phase 5: Confirm Project Before Deploying + +Before handing off to preset or customize, **always confirm the target project** with the user. See the [Project Selection](../SKILL.md#project-selection-all-modes) rules in the parent router. + +If the discovery table shows a sample project for the chosen region, suggest it as the default. Otherwise, query projects in that region and let the user pick. + +## Error Handling + +| Error | Cause | Resolution | +|-------|-------|------------| +| "No capacity found" | Model not available or all at quota | Hand off to [quota skill](../../../quota/quota.md) for increase requests and troubleshooting | +| Script auth error | `az login` expired | Re-run `az login` | +| Empty version list | Model not in region catalog | Try a different region: `./scripts/query_capacity.sh "" eastus` | +| "No projects found" | No AI Services resources | Guide to `project/create` skill or Azure Portal | + +## Related Skills + +- **[preset](../preset/SKILL.md)** — Quick deployment after capacity discovery +- **[customize](../customize/SKILL.md)** — Custom deployment after capacity discovery +- **[quota](../../../quota/quota.md)** — For quota viewing, increase requests, and troubleshooting quota errors, defer to this skill instead of duplicating guidance diff --git a/skills/microsoft-foundry/models/deploy-model/capacity/scripts/discover_and_rank.ps1 b/skills/microsoft-foundry/models/deploy-model/capacity/scripts/discover_and_rank.ps1 new file mode 100644 index 00000000..4b86363f --- /dev/null +++ b/skills/microsoft-foundry/models/deploy-model/capacity/scripts/discover_and_rank.ps1 @@ -0,0 +1,113 @@ +<# +.SYNOPSIS + Discovers available capacity for an Azure OpenAI model across all regions, + cross-references with existing projects and subscription quota, and outputs a ranked table. +.PARAMETER ModelName + The model name (e.g., "gpt-4o", "o3-mini") +.PARAMETER ModelVersion + The model version (e.g., "2025-01-31") +.PARAMETER MinCapacity + Minimum required capacity in K TPM units (default: 0, shows all) +.EXAMPLE + .\discover_and_rank.ps1 -ModelName o3-mini -ModelVersion 2025-01-31 -MinCapacity 200 +#> +param( + [Parameter(Mandatory)][string]$ModelName, + [Parameter(Mandatory)][string]$ModelVersion, + [int]$MinCapacity = 0 +) + +$ErrorActionPreference = "Stop" + +$subId = az account show --query id -o tsv + +# Query model capacity across all regions +$capRaw = az rest --method GET ` + --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/modelCapacities" ` + --url-parameters api-version=2024-10-01 modelFormat=OpenAI modelName=$ModelName modelVersion=$ModelVersion ` + 2>$null | Out-String | ConvertFrom-Json + +# Query all AI Foundry projects (AIProject kind) +$projRaw = az rest --method GET ` + --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/accounts" ` + --url-parameters api-version=2024-10-01 ` + --query "value[?kind=='AIProject'].{Name:name, Location:location}" ` + 2>$null | Out-String | ConvertFrom-Json + +# Build capacity map (GlobalStandard only, pick max per region) +$capMap = @{} +foreach ($item in $capRaw.value) { + $sku = $item.properties.skuName + $avail = [int]$item.properties.availableCapacity + $region = $item.location + if ($sku -eq "GlobalStandard" -and $avail -gt 0) { + if (-not $capMap[$region] -or $avail -gt $capMap[$region]) { + $capMap[$region] = $avail + } + } +} + +# Build project map +$projMap = @{} +$projSample = @{} +foreach ($p in $projRaw) { + $loc = $p.Location + if (-not $projMap[$loc]) { $projMap[$loc] = 0 } + $projMap[$loc]++ + if (-not $projSample[$loc]) { $projSample[$loc] = $p.Name } +} + +# Check subscription quota per region +$quotaMap = @{} +$checkedRegions = @{} +foreach ($region in $capMap.Keys) { + if ($checkedRegions[$region]) { continue } + $checkedRegions[$region] = $true + try { + $usageData = az cognitiveservices usage list --location $region --subscription $subId -o json 2>$null | Out-String | ConvertFrom-Json + $usageEntry = $usageData | Where-Object { $_.name.value -eq "OpenAI.GlobalStandard.$ModelName" } + if ($usageEntry) { + $quotaMap[$region] = [int]$usageEntry.limit - [int]$usageEntry.currentValue + } else { + $quotaMap[$region] = 0 + } + } catch { + $quotaMap[$region] = -1 # Unable to check + } +} + +# Combine and rank +$results = foreach ($region in $capMap.Keys) { + $avail = $capMap[$region] + $meets = $avail -ge $MinCapacity + $quota = if ($quotaMap[$region]) { $quotaMap[$region] } else { 0 } + $quotaDisplay = if ($quota -eq -1) { "?" } elseif ($quota -gt 0) { "${quota}K" } else { "0" } + $quotaOk = $quota -gt 0 -or $quota -eq -1 + [PSCustomObject]@{ + Region = $region + AvailableTPM = "${avail}K" + AvailableRaw = $avail + MeetsTarget = if ($meets) { "YES" } else { "no" } + Projects = if ($projMap[$region]) { $projMap[$region] } else { 0 } + SampleProject = if ($projSample[$region]) { $projSample[$region] } else { "(none)" } + QuotaAvailable = $quotaDisplay + QuotaOk = $quotaOk + } +} + +$results = $results | Sort-Object @{Expression={$_.MeetsTarget -eq "YES"}; Descending=$true}, + @{Expression={$_.QuotaOk}; Descending=$true}, + @{Expression={$_.Projects}; Descending=$true}, + @{Expression={$_.AvailableRaw}; Descending=$true} + +# Output summary +$total = ($results | Measure-Object).Count +$matching = ($results | Where-Object { $_.MeetsTarget -eq "YES" } | Measure-Object).Count +$withQuota = ($results | Where-Object { $_.MeetsTarget -eq "YES" -and $_.QuotaOk } | Measure-Object).Count +$withProjects = ($results | Where-Object { $_.MeetsTarget -eq "YES" -and $_.Projects -gt 0 } | Measure-Object).Count + +Write-Host "Model: $ModelName v$ModelVersion | SKU: GlobalStandard | Min Capacity: ${MinCapacity}K TPM" +Write-Host "Regions with capacity: $total | Meets target: $matching | With quota: $withQuota | With projects: $withProjects" +Write-Host "" + +$results | Select-Object Region, AvailableTPM, MeetsTarget, QuotaAvailable, Projects, SampleProject | Format-Table -AutoSize diff --git a/skills/microsoft-foundry/models/deploy-model/capacity/scripts/discover_and_rank.sh b/skills/microsoft-foundry/models/deploy-model/capacity/scripts/discover_and_rank.sh new file mode 100644 index 00000000..83c771b4 --- /dev/null +++ b/skills/microsoft-foundry/models/deploy-model/capacity/scripts/discover_and_rank.sh @@ -0,0 +1,113 @@ +#!/bin/bash +# discover_and_rank.sh +# Discovers available capacity for an Azure OpenAI model across all regions, +# cross-references with existing projects and subscription quota, and outputs a ranked table. +# +# Usage: ./discover_and_rank.sh [min-capacity] +# Example: ./discover_and_rank.sh o3-mini 2025-01-31 200 +# +# Output: Ranked table of regions with capacity, quota, project counts, and match status + +set -euo pipefail + +MODEL_NAME="${1:?Usage: $0 [min-capacity]}" +MODEL_VERSION="${2:?Usage: $0 [min-capacity]}" +MIN_CAPACITY="${3:-0}" + +SUB_ID=$(az account show --query id -o tsv) + +# Query model capacity across all regions (GlobalStandard SKU) +CAPACITY_JSON=$(az rest --method GET \ + --url "https://management.azure.com/subscriptions/${SUB_ID}/providers/Microsoft.CognitiveServices/modelCapacities" \ + --url-parameters api-version=2024-10-01 modelFormat=OpenAI modelName="$MODEL_NAME" modelVersion="$MODEL_VERSION" \ + 2>/dev/null) + +# Query all AI Services projects +PROJECTS_JSON=$(az rest --method GET \ + --url "https://management.azure.com/subscriptions/${SUB_ID}/providers/Microsoft.CognitiveServices/accounts" \ + --url-parameters api-version=2024-10-01 \ + --query "value[?kind=='AIServices'].{name:name, location:location}" \ + 2>/dev/null) + +# Get unique regions from capacity results for quota checking +REGIONS=$(echo "$CAPACITY_JSON" | jq -r '.value[] | select(.properties.skuName=="GlobalStandard" and .properties.availableCapacity > 0) | .location' | sort -u) + +# Build quota map: check subscription quota per region +declare -A QUOTA_MAP +for region in $REGIONS; do + usage_json=$(az cognitiveservices usage list --location "$region" --subscription "$SUB_ID" -o json 2>/dev/null || echo "[]") + quota_avail=$(echo "$usage_json" | jq -r --arg name "OpenAI.GlobalStandard.$MODEL_NAME" \ + '[.[] | select(.name.value == $name)] | if length > 0 then .[0].limit - .[0].currentValue else 0 end') + QUOTA_MAP[$region]="${quota_avail:-0}" +done + +# Export quota map as JSON for Python +QUOTA_JSON="{" +first=true +for region in "${!QUOTA_MAP[@]}"; do + if [ "$first" = true ]; then first=false; else QUOTA_JSON+=","; fi + QUOTA_JSON+="\"$region\":${QUOTA_MAP[$region]}" +done +QUOTA_JSON+="}" + +# Combine, rank, and output using inline Python (available on all Azure CLI installs) +python3 -c " +import json, sys + +capacity = json.loads('''${CAPACITY_JSON}''') +projects = json.loads('''${PROJECTS_JSON}''') +quota = json.loads('''${QUOTA_JSON}''') +min_cap = int('${MIN_CAPACITY}') + +# Build capacity map (GlobalStandard only) +cap_map = {} +for item in capacity.get('value', []): + props = item.get('properties', {}) + if props.get('skuName') == 'GlobalStandard' and props.get('availableCapacity', 0) > 0: + region = item.get('location', '') + cap_map[region] = max(cap_map.get(region, 0), props['availableCapacity']) + +# Build project count map +proj_map = {} +proj_sample = {} +for p in (projects if isinstance(projects, list) else []): + loc = p.get('location', '') + proj_map[loc] = proj_map.get(loc, 0) + 1 + if loc not in proj_sample: + proj_sample[loc] = p.get('name', '') + +# Combine and rank +results = [] +for region, cap in cap_map.items(): + meets = cap >= min_cap + q = quota.get(region, 0) + quota_ok = q > 0 + results.append({ + 'region': region, + 'available': cap, + 'meets': meets, + 'projects': proj_map.get(region, 0), + 'sample': proj_sample.get(region, '(none)'), + 'quota': q, + 'quota_ok': quota_ok + }) + +# Sort: meets target first, then quota available, then by project count, then by capacity +results.sort(key=lambda x: (-x['meets'], -x['quota_ok'], -x['projects'], -x['available'])) + +# Output +total = len(results) +matching = sum(1 for r in results if r['meets']) +with_quota = sum(1 for r in results if r['meets'] and r['quota_ok']) +with_projects = sum(1 for r in results if r['meets'] and r['projects'] > 0) + +print(f'Model: {\"${MODEL_NAME}\"} v{\"${MODEL_VERSION}\"} | SKU: GlobalStandard | Min Capacity: {min_cap}K TPM') +print(f'Regions with capacity: {total} | Meets target: {matching} | With quota: {with_quota} | With projects: {with_projects}') +print() +print(f'{\"Region\":<22} {\"Available\":<12} {\"Meets Target\":<14} {\"Quota\":<12} {\"Projects\":<10} {\"Sample Project\"}') +print('-' * 100) +for r in results: + mark = 'YES' if r['meets'] else 'no' + q_display = f'{r[\"quota\"]}K' if r['quota'] > 0 else '0 (none)' + print(f'{r[\"region\"]:<22} {r[\"available\"]}K{\"\":.<10} {mark:<14} {q_display:<12} {r[\"projects\"]:<10} {r[\"sample\"]}') +" diff --git a/skills/microsoft-foundry/models/deploy-model/capacity/scripts/query_capacity.ps1 b/skills/microsoft-foundry/models/deploy-model/capacity/scripts/query_capacity.ps1 new file mode 100644 index 00000000..f65dff5a --- /dev/null +++ b/skills/microsoft-foundry/models/deploy-model/capacity/scripts/query_capacity.ps1 @@ -0,0 +1,80 @@ +<# +.SYNOPSIS + Queries available capacity for an Azure OpenAI model and validates if a target is achievable. +.PARAMETER ModelName + The model name (e.g., "gpt-4o", "o3-mini") +.PARAMETER ModelVersion + The model version (e.g., "2025-01-31"). If omitted, lists available versions. +.PARAMETER Region + Optional. Check capacity in a specific region only. +.PARAMETER SKU + SKU to check (default: GlobalStandard) +.EXAMPLE + .\query_capacity.ps1 -ModelName o3-mini + .\query_capacity.ps1 -ModelName o3-mini -ModelVersion 2025-01-31 -Region eastus2 +#> +param( + [Parameter(Mandatory)][string]$ModelName, + [string]$ModelVersion, + [string]$Region, + [string]$SKU = "GlobalStandard" +) + +$ErrorActionPreference = "Stop" + +$subId = az account show --query id -o tsv + +# If no version provided, list available versions first +if (-not $ModelVersion) { + Write-Host "Available versions for $ModelName`:" + $loc = if ($Region) { $Region } else { "eastus" } + az cognitiveservices model list --location $loc ` + --query "[?model.name=='$ModelName'].{Version:model.version, Format:model.format}" ` + --output table 2>$null + return +} + +# Build URL parameters +$urlParams = @("api-version=2024-10-01", "modelFormat=OpenAI", "modelName=$ModelName", "modelVersion=$ModelVersion") + +if ($Region) { + $url = "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$Region/modelCapacities" +} else { + $url = "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/modelCapacities" +} + +$raw = az rest --method GET --url $url --url-parameters @urlParams 2>$null | Out-String | ConvertFrom-Json + +# Filter by SKU +$filtered = $raw.value | Where-Object { $_.properties.skuName -eq $SKU -and $_.properties.availableCapacity -gt 0 } + +if (-not $filtered) { + Write-Host "No capacity found for $ModelName v$ModelVersion ($SKU)" -ForegroundColor Red + Write-Host "Try a different SKU or version." + return +} + +Write-Host "Capacity: $ModelName v$ModelVersion ($SKU)" +Write-Host "" +$filtered | ForEach-Object { + # Check subscription quota for this region + $quotaDisplay = "?" + try { + $usageData = az cognitiveservices usage list --location $_.location --subscription $subId -o json 2>$null | Out-String | ConvertFrom-Json + $usageEntry = $usageData | Where-Object { $_.name.value -eq "OpenAI.$SKU.$ModelName" } + if ($usageEntry) { + $quotaAvail = [int]$usageEntry.limit - [int]$usageEntry.currentValue + $quotaDisplay = if ($quotaAvail -gt 0) { "${quotaAvail}K" } else { "0 (at limit)" } + } else { + $quotaDisplay = "0 (none)" + } + } catch { + $quotaDisplay = "?" + } + [PSCustomObject]@{ + Region = $_.location + SKU = $_.properties.skuName + Available = "$($_.properties.availableCapacity)K TPM" + Quota = $quotaDisplay + } +} | Sort-Object { [int]($_.Available -replace '[^\d]','') } -Descending | Format-Table -AutoSize diff --git a/skills/microsoft-foundry/models/deploy-model/capacity/scripts/query_capacity.sh b/skills/microsoft-foundry/models/deploy-model/capacity/scripts/query_capacity.sh new file mode 100644 index 00000000..7869f690 --- /dev/null +++ b/skills/microsoft-foundry/models/deploy-model/capacity/scripts/query_capacity.sh @@ -0,0 +1,75 @@ +#!/bin/bash +# query_capacity.sh +# Queries available capacity for an Azure OpenAI model. +# +# Usage: +# ./query_capacity.sh [model-version] [region] [sku] +# Examples: +# ./query_capacity.sh o3-mini # List versions +# ./query_capacity.sh o3-mini 2025-01-31 # All regions +# ./query_capacity.sh o3-mini 2025-01-31 eastus2 # Specific region +# ./query_capacity.sh o3-mini 2025-01-31 "" Standard # Different SKU + +set -euo pipefail + +MODEL_NAME="${1:?Usage: $0 [model-version] [region] [sku]}" +MODEL_VERSION="${2:-}" +REGION="${3:-}" +SKU="${4:-GlobalStandard}" + +SUB_ID=$(az account show --query id -o tsv) + +# If no version, list available versions +if [ -z "$MODEL_VERSION" ]; then + LOC="${REGION:-eastus}" + echo "Available versions for $MODEL_NAME:" + az cognitiveservices model list --location "$LOC" \ + --query "[?model.name=='$MODEL_NAME'].{Version:model.version, Format:model.format}" \ + --output table 2>/dev/null + exit 0 +fi + +# Build URL +if [ -n "$REGION" ]; then + URL="https://management.azure.com/subscriptions/${SUB_ID}/providers/Microsoft.CognitiveServices/locations/${REGION}/modelCapacities" +else + URL="https://management.azure.com/subscriptions/${SUB_ID}/providers/Microsoft.CognitiveServices/modelCapacities" +fi + +# Query capacity +CAPACITY_RESULT=$(az rest --method GET --url "$URL" \ + --url-parameters api-version=2024-10-01 modelFormat=OpenAI modelName="$MODEL_NAME" modelVersion="$MODEL_VERSION" \ + 2>/dev/null) + +# Get regions with capacity +REGIONS_WITH_CAP=$(echo "$CAPACITY_RESULT" | jq -r ".value[] | select(.properties.skuName==\"$SKU\" and .properties.availableCapacity > 0) | .location" 2>/dev/null | sort -u) + +if [ -z "$REGIONS_WITH_CAP" ]; then + echo "No capacity found for $MODEL_NAME v$MODEL_VERSION ($SKU)" + echo "Try a different SKU or version." + exit 0 +fi + +echo "Capacity: $MODEL_NAME v$MODEL_VERSION ($SKU)" +echo "" +printf "%-22s %-12s %-15s %s\n" "Region" "Available" "Quota" "SKU" +printf -- '-%.0s' {1..60}; echo "" + +for region in $REGIONS_WITH_CAP; do + avail=$(echo "$CAPACITY_RESULT" | jq -r ".value[] | select(.location==\"$region\" and .properties.skuName==\"$SKU\") | .properties.availableCapacity" 2>/dev/null | head -1) + + # Check subscription quota + usage_json=$(az cognitiveservices usage list --location "$region" --subscription "$SUB_ID" -o json 2>/dev/null || echo "[]") + quota_avail=$(echo "$usage_json" | jq -r --arg name "OpenAI.$SKU.$MODEL_NAME" \ + '[.[] | select(.name.value == $name)] | if length > 0 then .[0].limit - .[0].currentValue else 0 end' 2>/dev/null || echo "?") + + if [ "$quota_avail" = "0" ]; then + quota_display="0 (none)" + elif [ "$quota_avail" = "?" ]; then + quota_display="?" + else + quota_display="${quota_avail}K" + fi + + printf "%-22s %-12s %-15s %s\n" "$region" "${avail}K TPM" "$quota_display" "$SKU" +done diff --git a/skills/microsoft-foundry/models/deploy-model/customize/EXAMPLES.md b/skills/microsoft-foundry/models/deploy-model/customize/EXAMPLES.md new file mode 100644 index 00000000..a3f25848 --- /dev/null +++ b/skills/microsoft-foundry/models/deploy-model/customize/EXAMPLES.md @@ -0,0 +1,90 @@ +# customize Examples + +## Example 1: Basic Deployment with Defaults + +**Scenario:** Deploy gpt-4o accepting all defaults for quick setup. +**Config:** gpt-4o / GlobalStandard / 10K TPM / Dynamic Quota enabled +**Result:** Deployment `gpt-4o` created in ~2-3 min with auto-upgrade enabled. + +## Example 2: Production Deployment with Custom Capacity + +**Scenario:** Deploy gpt-4o for production with high throughput. +**Config:** gpt-4o / GlobalStandard / 50K TPM / Dynamic Quota / Name: `gpt-4o-production` +**Result:** 50K TPM (500 req/10s). Suitable for moderate-to-high traffic production apps. + +## Example 3: PTU Deployment for High-Volume Workload + +**Scenario:** Deploy gpt-4o with reserved capacity (PTU) for predictable workload. +**Config:** gpt-4o / ProvisionedManaged / 200 PTU (min 50, max 1000) / Priority Processing enabled +**PTU sizing:** 40K input + 20K output tokens/min → ~100 PTU estimated → 200 PTU recommended (2x headroom) +**Result:** Guaranteed throughput, fixed monthly cost. Use case: customer service bots, document pipelines. + +## Example 4: Development Deployment with Standard SKU + +**Scenario:** Deploy gpt-4o-mini for dev/testing with minimal cost. +**Config:** gpt-4o-mini / Standard / 1K TPM / Name: `gpt-4o-mini-dev` +**Result:** 1K TPM, 10 req/10s. Minimal pay-per-use cost for development and prototyping. + +## Example 5: Spillover Configuration + +**Scenario:** Deploy gpt-4o with spillover to handle peak load overflow. +**Config:** gpt-4o / GlobalStandard / 20K TPM / Dynamic Quota / Spillover → `gpt-4o-backup` +**Result:** Primary handles up to 20K TPM; overflow auto-redirects to backup deployment. + +## Example 6: Anthropic Model Deployment (claude-sonnet-4-6) + +**Scenario:** Deploy claude-sonnet-4-6 with customized settings. +**Config:** claude-sonnet-4-6 / GlobalStandard / capacity 1 (MaaS) / Industry: Healthcare / No RAI policy (Anthropic manages content filtering) +**Result:** User selected "Healthcare" as industry → tenant country code (US) and org name fetched automatically → deployed via ARM REST API with `modelProviderData` in ~2 min. + +--- + +## Comparison Matrix + +| Scenario | Model | SKU | Capacity | Dynamic Quota | Priority | Spillover | Use Case | +|----------|-------|-----|----------|:---:|:---:|:---:|----------| +| Ex 1 | gpt-4o | GlobalStandard | 10K TPM | ✓ | - | - | Quick setup | +| Ex 2 | gpt-4o | GlobalStandard | 50K TPM | ✓ | - | - | Production | +| Ex 3 | gpt-4o | ProvisionedManaged | 200 PTU | - | ✓ | - | Predictable workload | +| Ex 4 | gpt-4o-mini | Standard | 1K TPM | - | - | - | Dev/testing | +| Ex 5 | gpt-4o | GlobalStandard | 20K TPM | ✓ | - | ✓ | Peak load | +| Ex 6 | claude-sonnet-4-6 | GlobalStandard | 1 (MaaS) | - | - | - | Anthropic model | + +## Common Patterns + +### Dev → Staging → Production + +| Stage | Model | SKU | Capacity | Extras | +|-------|-------|-----|----------|--------| +| Dev | gpt-4o-mini | Standard | 1K TPM | — | +| Staging | gpt-4o | GlobalStandard | 10K TPM | — | +| Production | gpt-4o | GlobalStandard | 50K TPM | Dynamic Quota + Spillover | + +### Cost Optimization + +- **High priority:** gpt-4o, ProvisionedManaged, 100 PTU, Priority Processing +- **Low priority:** gpt-4o-mini, Standard, 5K TPM + +--- + +## Tips and Best Practices + +**Capacity:** Start conservative → monitor with Azure Monitor → scale gradually → use spillover for peaks. + +**SKU Selection:** Standard for dev → GlobalStandard + dynamic quota for variable production → ProvisionedManaged (PTU) for predictable load. + +**Cost:** Right-size capacity; use gpt-4o-mini where possible (80-90% accuracy at lower cost); enable dynamic quota; consider PTU for consistent high-volume. + +**Versions:** Auto-upgrade recommended; test new versions in staging first; pin only if compatibility requires it. + +**Content Filtering:** Start with DefaultV2; use custom policies only for specific needs; monitor filtered requests. + +--- + +## Troubleshooting + +| Problem | Solution | +|---------|----------| +| `QuotaExceeded` | Check usage with `az cognitiveservices usage list`, reduce capacity, try different SKU, check other regions, or use the [quota skill](../../../quota/quota.md) to request an increase | +| Version not available for SKU | Check `az cognitiveservices account list-models --query "[?name=='gpt-4o'].version"`, use latest | +| Deployment name exists | Skill auto-generates unique name (e.g., `gpt-4o-2`), or specify custom name | diff --git a/skills/microsoft-foundry/models/deploy-model/customize/SKILL.md b/skills/microsoft-foundry/models/deploy-model/customize/SKILL.md new file mode 100644 index 00000000..7c94d561 --- /dev/null +++ b/skills/microsoft-foundry/models/deploy-model/customize/SKILL.md @@ -0,0 +1,168 @@ +--- +name: customize +description: "Interactive guided deployment flow for Azure OpenAI models with full customization control. Step-by-step selection of model version, SKU (GlobalStandard/Standard/ProvisionedManaged), capacity, RAI policy (content filter), and advanced options (dynamic quota, priority processing, spillover). USE FOR: custom deployment, customize model deployment, choose version, select SKU, set capacity, configure content filter, RAI policy, deployment options, detailed deployment, advanced deployment, PTU deployment, provisioned throughput. DO NOT USE FOR: quick deployment to optimal region (use preset)." +license: MIT +metadata: + author: Microsoft + version: "1.0.1" +--- + +# Customize Model Deployment + +Interactive guided workflow for deploying Azure OpenAI models with full customization control over version, SKU, capacity, content filtering, and advanced options. + +## Quick Reference + +| Property | Description | +|----------|-------------| +| **Flow** | Interactive step-by-step guided deployment | +| **Customization** | Version, SKU, Capacity, RAI Policy, Advanced Options | +| **SKU Support** | GlobalStandard, Standard, ProvisionedManaged, DataZoneStandard | +| **Best For** | Precise control over deployment configuration | +| **Authentication** | Azure CLI (`az login`) | +| **Tools** | Azure CLI, MCP tools (optional) | + +## When to Use This Skill + +Use this skill when you need **precise control** over deployment configuration: + +- ✅ **Choose specific model version** (not just latest) +- ✅ **Select deployment SKU** (GlobalStandard vs Standard vs PTU) +- ✅ **Set exact capacity** within available range +- ✅ **Configure content filtering** (RAI policy selection) +- ✅ **Enable advanced features** (dynamic quota, priority processing, spillover) +- ✅ **PTU deployments** (Provisioned Throughput Units) + +**Alternative:** Use `preset` for quick deployment to the best available region with automatic configuration. + +### Comparison: customize vs preset + +| Feature | customize | preset | +|---------|---------------------|----------------------------| +| **Focus** | Full customization control | Optimal region selection | +| **Version Selection** | User chooses from available | Uses latest automatically | +| **SKU Selection** | User chooses (GlobalStandard/Standard/PTU) | GlobalStandard only | +| **Capacity** | User specifies exact value | Auto-calculated (50% of available) | +| **RAI Policy** | User selects from options | Default policy only | +| **Region** | Current region first, falls back to all regions if no capacity | Checks capacity across all regions upfront | +| **Use Case** | Precise deployment requirements | Quick deployment to best region | + +## Prerequisites + +- Azure subscription with Cognitive Services Contributor or Owner role +- Azure AI Foundry project resource ID (format: `/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.CognitiveServices/accounts/{account}/projects/{project}`) +- Azure CLI installed and authenticated (`az login`) +- Optional: Set `PROJECT_RESOURCE_ID` environment variable + +## Workflow Overview + +### Complete Flow (14 Phases) + +``` +1. Verify Authentication +2. Get Project Resource ID +3. Verify Project Exists +4. Get Model Name (if not provided) +5. List Model Versions → User Selects +6. List SKUs for Version → User Selects +7. Get Capacity Range → User Configures + 7b. If no capacity: Cross-Region Fallback → Query all regions → User selects region/project +8. List RAI Policies → User Selects +9. Configure Advanced Options (if applicable) +10. Configure Version Upgrade Policy +11. Generate Deployment Name +12. Review Configuration +13. Execute Deployment & Monitor +``` + +### Fast Path (Defaults) + +If user accepts all defaults (latest version, GlobalStandard SKU, recommended capacity, default RAI policy, standard upgrade policy), deployment completes in ~5 interactions. + +--- + +## Phase Summaries + +> ⚠️ **MUST READ:** Before executing any phase, load [references/customize-workflow.md](references/customize-workflow.md) for the full scripts and implementation details. The summaries below describe *what* each phase does — the reference file contains the *how* (CLI commands, quota patterns, capacity formulas, cross-region fallback logic). + +| Phase | Action | Key Details | +|-------|--------|-------------| +| **1. Verify Auth** | Check `az account show`; prompt `az login` if needed | Verify correct subscription is active | +| **2. Get Project ID** | Read `PROJECT_RESOURCE_ID` env var or prompt user | ARM resource ID format required | +| **3. Verify Project** | Parse resource ID, call `az cognitiveservices account show` | Extracts subscription, RG, account, project, region | +| **4. Get Model** | List models via `az cognitiveservices account list-models` | User selects from available or enters custom name | +| **5. Select Version** | Query versions for chosen model | Recommend latest; user picks from list | +| **6. Select SKU** | Query model catalog + subscription quota, show only deployable SKUs | ⚠️ Never hardcode SKU lists — always query live data | +| **7. Configure Capacity** | Query capacity API, validate min/max/step, user enters value | Cross-region fallback if no capacity in current region | +| **8. Select RAI Policy** | Present content filter options | Default: `Microsoft.DefaultV2` | +| **9. Advanced Options** | Dynamic quota (GlobalStandard), priority processing (PTU), spillover | SKU-dependent availability | +| **10. Upgrade Policy** | Choose: OnceNewDefaultVersionAvailable / OnceCurrentVersionExpired / NoAutoUpgrade | Default: auto-upgrade on new default | +| **11. Deployment Name** | Auto-generate unique name, allow custom override | Validates format: `^[\w.-]{2,64}$` | +| **12. Review** | Display full config summary, confirm before proceeding | User approves or cancels | +| **13. Deploy & Monitor** | `az cognitiveservices account deployment create`, poll status | Timeout after 5 min; show endpoint + portal link | + + +--- + +## Error Handling + +### Common Issues and Resolutions + +| Error | Cause | Resolution | +|-------|-------|------------| +| **Model not found** | Invalid model name | List available models with `az cognitiveservices account list-models` | +| **Version not available** | Version not supported for SKU | Select different version or SKU | +| **Insufficient quota** | Capacity > available quota | Skill auto-searches all regions; fails only if no region has quota | +| **SKU not supported** | SKU not available in region | Cross-region fallback searches other regions automatically | +| **Capacity out of range** | Invalid capacity value | **PREVENTED**: Skill validates min/max/step at input (Phase 7) | +| **Deployment name exists** | Name conflict | Auto-incremented name generation | +| **Authentication failed** | Not logged in | Run `az login` | +| **Permission denied** | Insufficient permissions | Assign Cognitive Services Contributor role | +| **Capacity query fails** | API/permissions/network error | **DEPLOYMENT BLOCKED**: Will not proceed without valid quota data | + +### Troubleshooting Commands + +```bash +# Check deployment status +az cognitiveservices account deployment show --name --resource-group --deployment-name + +# List all deployments +az cognitiveservices account deployment list --name --resource-group -o table + +# Check quota usage +az cognitiveservices usage list --name --resource-group + +# Delete failed deployment +az cognitiveservices account deployment delete --name --resource-group --deployment-name +``` + +--- + +## Selection Guides & Advanced Topics + +> For SKU comparison tables, PTU sizing formulas, and advanced option details, load [references/customize-guides.md](references/customize-guides.md). + +**SKU selection:** GlobalStandard (production/HA) → Standard (dev/test) → ProvisionedManaged (high-volume/guaranteed throughput) → DataZoneStandard (data residency). + +**Capacity:** TPM-based SKUs range from 1K (dev) to 100K+ (large production). PTU-based use formula: `(Input TPM × 0.001) + (Output TPM × 0.002) + (Requests/min × 0.1)`. + +**Advanced options:** Dynamic quota (GlobalStandard only), priority processing (PTU only, extra cost), spillover (overflow to backup deployment). + +--- + +## Related Skills + +- **preset** - Quick deployment to best region with automatic configuration +- **microsoft-foundry** - Parent skill for all Azure AI Foundry operations +- **[quota](../../../quota/quota.md)** — For quota viewing, increase requests, and troubleshooting quota errors, defer to this skill instead of duplicating guidance +- **rbac** - Manage permissions and access control + +--- + +## Notes + +- Set `PROJECT_RESOURCE_ID` environment variable to skip prompt +- Not all SKUs available in all regions; capacity varies by subscription/region/model +- Custom RAI policies can be configured in Azure Portal +- Automatic version upgrades occur during maintenance windows +- Use Azure Monitor and Application Insights for production deployments \ No newline at end of file diff --git a/skills/microsoft-foundry/models/deploy-model/customize/references/customize-guides.md b/skills/microsoft-foundry/models/deploy-model/customize/references/customize-guides.md new file mode 100644 index 00000000..1009adb4 --- /dev/null +++ b/skills/microsoft-foundry/models/deploy-model/customize/references/customize-guides.md @@ -0,0 +1,126 @@ +# Customize Guides — Selection Guides & Advanced Topics + +> Reference for: `models/deploy-model/customize/SKILL.md` + +**Table of Contents:** [Selection Guides](#selection-guides) · [Advanced Topics](#advanced-topics) + +## Selection Guides + +### How to Choose SKU + +| SKU | Best For | Cost | Availability | +|-----|----------|------|--------------| +| **GlobalStandard** | Production, high availability | Medium | Multi-region | +| **Standard** | Development, testing | Low | Single region | +| **ProvisionedManaged** | High-volume, predictable workloads | Fixed (PTU) | Reserved capacity | +| **DataZoneStandard** | Data residency requirements | Medium | Specific zones | + +**Decision Tree:** +``` +Do you need guaranteed throughput? +├─ Yes → ProvisionedManaged (PTU) +└─ No → Do you need high availability? + ├─ Yes → GlobalStandard + └─ No → Standard +``` + +### How to Choose Capacity + +**For TPM-based SKUs (GlobalStandard, Standard):** + +| Workload | Recommended Capacity | +|----------|---------------------| +| Development/Testing | 1K - 5K TPM | +| Small Production | 5K - 20K TPM | +| Medium Production | 20K - 100K TPM | +| Large Production | 100K+ TPM | + +**For PTU-based SKUs (ProvisionedManaged):** + +Use the PTU calculator based on: +- Input tokens per minute +- Output tokens per minute +- Requests per minute + +**Capacity Planning Tips:** +- Start with recommended capacity +- Monitor usage and adjust +- Enable dynamic quota for flexibility +- Consider spillover for peak loads + +### How to Choose RAI Policy + +| Policy | Filtering Level | Use Case | +|--------|----------------|----------| +| **Microsoft.DefaultV2** | Balanced | Most applications | +| **Microsoft.Prompt-Shield** | Enhanced | Security-sensitive apps | +| **Custom** | Configurable | Specific requirements | + +**Recommendation:** Start with `Microsoft.DefaultV2` and adjust based on application needs. + +--- + +## Advanced Topics + +### PTU (Provisioned Throughput Units) Deployments + +**What is PTU?** +- Reserved capacity with guaranteed throughput +- Measured in PTU units, not TPM +- Fixed cost regardless of usage +- Best for high-volume, predictable workloads + +**PTU Calculator:** + +``` +Estimated PTU = (Input TPM × 0.001) + (Output TPM × 0.002) + (Requests/min × 0.1) + +Example: +- Input: 10,000 tokens/min +- Output: 5,000 tokens/min +- Requests: 100/min + +PTU = (10,000 × 0.001) + (5,000 × 0.002) + (100 × 0.1) + = 10 + 10 + 10 + = 30 PTU +``` + +**PTU Deployment:** +```bash +az cognitiveservices account deployment create \ + --name \ + --resource-group \ + --deployment-name \ + --model-name \ + --model-version \ + --model-format "OpenAI" \ + --sku-name "ProvisionedManaged" \ + --sku-capacity 100 # PTU units +``` + +### Spillover Configuration + +**Spillover Workflow:** +1. Primary deployment receives requests +2. When capacity reached, requests overflow to spillover target +3. Spillover target must be same model or compatible +4. Configure via deployment properties + +**Best Practices:** +- Use spillover for peak load handling +- Spillover target should have sufficient capacity +- Monitor both deployments +- Test failover behavior + +### Priority Processing + +**What is Priority Processing?** +- Prioritizes your requests during high load +- Available for ProvisionedManaged SKU +- Additional charges apply +- Ensures consistent performance + +**When to Use:** +- Mission-critical applications +- SLA requirements +- High-concurrency scenarios diff --git a/skills/microsoft-foundry/models/deploy-model/customize/references/customize-workflow.md b/skills/microsoft-foundry/models/deploy-model/customize/references/customize-workflow.md new file mode 100644 index 00000000..750ae56e --- /dev/null +++ b/skills/microsoft-foundry/models/deploy-model/customize/references/customize-workflow.md @@ -0,0 +1,410 @@ +# Customize Workflow — Detailed Phase Instructions + +> Reference for: `models/deploy-model/customize/SKILL.md` + +## Phase 1: Verify Authentication + +```bash +az account show --query "{Subscription:name, User:user.name}" -o table +``` + +If not logged in: `az login` + +Set subscription if needed: +```bash +az account list --query "[].[name,id,state]" -o table +az account set --subscription +``` + +--- + +## Phase 2: Get Project Resource ID + +Check `PROJECT_RESOURCE_ID` env var. If not set, prompt user. + +**Format:** `/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.CognitiveServices/accounts/{account}/projects/{project}` + +--- + +## Phase 3: Parse and Verify Project + +Parse ARM resource ID to extract components: + +```powershell +$SUBSCRIPTION_ID = ($PROJECT_RESOURCE_ID -split '/')[2] +$RESOURCE_GROUP = ($PROJECT_RESOURCE_ID -split '/')[4] +$ACCOUNT_NAME = ($PROJECT_RESOURCE_ID -split '/')[8] +$PROJECT_NAME = ($PROJECT_RESOURCE_ID -split '/')[10] +``` + +Verify project exists and get region: +```bash +az account set --subscription $SUBSCRIPTION_ID +az cognitiveservices account show \ + --name $ACCOUNT_NAME \ + --resource-group $RESOURCE_GROUP \ + --query location -o tsv +``` + +--- + +## Phase 4: Get Model Name + +List available models if not provided: +```bash +az cognitiveservices account list-models \ + --name $ACCOUNT_NAME \ + --resource-group $RESOURCE_GROUP \ + --query "[].name" -o json +``` + +Present sorted unique list. Allow custom model name entry. + +**Detect model format:** + +```bash +# Get model format (e.g., OpenAI, Anthropic, Meta-Llama, Mistral, Cohere) +MODEL_FORMAT=$(az cognitiveservices account list-models \ + --name "$ACCOUNT_NAME" \ + --resource-group "$RESOURCE_GROUP" \ + --query "[?name=='$MODEL_NAME'].format" -o tsv | head -1) + +MODEL_FORMAT=${MODEL_FORMAT:-"OpenAI"} +echo "Model format: $MODEL_FORMAT" +``` + +> 💡 **Model format determines the deployment path:** +> - `OpenAI` — Standard CLI, TPM-based capacity, RAI policies, version upgrade policies +> - `Anthropic` — REST API with `modelProviderData`, capacity=1, no RAI, no version upgrade +> - All other formats (`Meta-Llama`, `Mistral`, `Cohere`, etc.) — Standard CLI, capacity=1 (MaaS), no RAI, no version upgrade + +--- + +## Phase 5: List and Select Model Version + +```bash +az cognitiveservices account list-models \ + --name $ACCOUNT_NAME \ + --resource-group $RESOURCE_GROUP \ + --query "[?name=='$MODEL_NAME'].version" -o json +``` + +Recommend latest version (first in list). Default to `"latest"` if no versions found. + +--- + +## Phase 6: List and Select SKU + +> ⚠️ **Warning:** Never hardcode SKU lists — always query live data. + +**Step A — Query model-supported SKUs:** +```bash +az cognitiveservices model list \ + --location $PROJECT_REGION \ + --subscription $SUBSCRIPTION_ID -o json +``` + +Filter: `model.name == $MODEL_NAME && model.version == $MODEL_VERSION`, extract `model.skus[].name`. + +**Step B — Check subscription quota per SKU:** +```bash +az cognitiveservices usage list \ + --location $PROJECT_REGION \ + --subscription $SUBSCRIPTION_ID -o json +``` + +Quota key pattern: `OpenAI..`. Calculate `available = limit - currentValue`. + +**Step C — Present only deployable SKUs** (available > 0). If no SKUs have quota, direct user to the [quota skill](../../../../quota/quota.md). + +--- + +## Phase 7: Configure Capacity + +> ⚠️ **Non-OpenAI models (MaaS):** If `MODEL_FORMAT != "OpenAI"`, capacity is always `1` (pay-per-token billing). Skip capacity configuration and set `DEPLOY_CAPACITY=1`. Proceed to Phase 7c (Anthropic) or Phase 8. + +**For OpenAI models only — query capacity via REST API:** +```bash +# Current region capacity +az rest --method GET --url \ + "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/providers/Microsoft.CognitiveServices/locations/$PROJECT_REGION/modelCapacities?api-version=2024-10-01&modelFormat=$MODEL_FORMAT&modelName=$MODEL_NAME&modelVersion=$MODEL_VERSION" +``` + +Filter result for `properties.skuName == $SELECTED_SKU`. Read `properties.availableCapacity`. + +**Capacity defaults by SKU (OpenAI only):** + +| SKU | Unit | Min | Max | Step | Default | +|-----|------|-----|-----|------|---------| +| ProvisionedManaged | PTU | 50 | 1000 | 50 | 100 | +| Others (TPM-based) | TPM | 1000 | min(available, 300000) | 1000 | min(10000, available/2) | + +Validate user input: must be >= min, <= max, multiple of step. On invalid input, explain constraints. + +### Phase 7b: Cross-Region Fallback + +If no capacity in current region, query ALL regions: +```bash +az rest --method GET --url \ + "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/providers/Microsoft.CognitiveServices/modelCapacities?api-version=2024-10-01&modelFormat=$MODEL_FORMAT&modelName=$MODEL_NAME&modelVersion=$MODEL_VERSION" +``` + +Filter: `properties.skuName == $SELECTED_SKU && properties.availableCapacity > 0`. Sort descending by capacity. + +Present available regions. After user selects region, find existing projects there: +```bash +az cognitiveservices account list \ + --query "[?kind=='AIProject' && location=='$PROJECT_REGION'].{Name:name, ResourceGroup:resourceGroup}" \ + -o json +``` + +If projects exist, let user select one and update `$ACCOUNT_NAME`, `$RESOURCE_GROUP`. If none, direct to project/create skill. + +Re-run capacity configuration with new region's available capacity. + +If no region has capacity: fail with guidance to request quota increase, check existing deployments, or try different model/SKU. + +--- + +## Phase 7c: Anthropic Model Provider Data (Anthropic models only) + +> ⚠️ **Only execute this phase if `MODEL_FORMAT == "Anthropic"`.** For OpenAI and other models, skip to Phase 8. + +Anthropic models require `modelProviderData` in the deployment payload. Collect this before deployment. + +**Step 1: Prompt user to select industry** + +Present the following list and ask the user to choose one: + +``` + 1. None (API value: none) + 2. Biotechnology (API value: biotechnology) + 3. Consulting (API value: consulting) + 4. Education (API value: education) + 5. Finance (API value: finance) + 6. Food & Beverage (API value: food_and_beverage) + 7. Government (API value: government) + 8. Healthcare (API value: healthcare) + 9. Insurance (API value: insurance) +10. Law (API value: law) +11. Manufacturing (API value: manufacturing) +12. Media (API value: media) +13. Nonprofit (API value: nonprofit) +14. Technology (API value: technology) +15. Telecommunications (API value: telecommunications) +16. Sport & Recreation (API value: sport_and_recreation) +17. Real Estate (API value: real_estate) +18. Retail (API value: retail) +19. Other (API value: other) +``` + +> ⚠️ **Do NOT pick a default industry or hardcode a value. Always ask the user.** This is required by Anthropic's terms of service. The industry list is static — there is no REST API that provides it. + +Store selection as `SELECTED_INDUSTRY` (use the API value, e.g., `technology`). + +**Step 2: Fetch tenant info (country code and organization name)** + +```bash +TENANT_INFO=$(az rest --method GET \ + --url "https://management.azure.com/tenants?api-version=2024-11-01" \ + --query "value[0].{countryCode:countryCode, displayName:displayName}" -o json) + +COUNTRY_CODE=$(echo "$TENANT_INFO" | jq -r '.countryCode') +ORG_NAME=$(echo "$TENANT_INFO" | jq -r '.displayName') +``` + +*PowerShell version:* +```powershell +$tenantInfo = az rest --method GET ` + --url "https://management.azure.com/tenants?api-version=2024-11-01" ` + --query "value[0].{countryCode:countryCode, displayName:displayName}" -o json | ConvertFrom-Json + +$countryCode = $tenantInfo.countryCode +$orgName = $tenantInfo.displayName +``` + +Store `COUNTRY_CODE` and `ORG_NAME` for use in Phase 13. + +--- + +## Phase 8: Select RAI Policy (Content Filter) + +> ⚠️ **Note:** RAI policies only apply to OpenAI models. Skip this phase if `MODEL_FORMAT != "OpenAI"` (Anthropic, Meta-Llama, Mistral, Cohere, etc. do not use RAI policies). + +Present options: +1. `Microsoft.DefaultV2` — Balanced filtering (recommended). Filters hate, violence, sexual, self-harm. +2. `Microsoft.Prompt-Shield` — Enhanced prompt injection/jailbreak protection. +3. Custom policies — Organization-specific (configured in Azure Portal). + +Default: `Microsoft.DefaultV2`. + +--- + +## Phase 9: Configure Advanced Options + +Options are SKU-dependent: + +**A. Dynamic Quota** (GlobalStandard only) +- Auto-scales beyond base allocation when capacity available +- Default: enabled + +**B. Priority Processing** (ProvisionedManaged only) +- Prioritizes requests during high load; additional charges apply +- Default: disabled + +**C. Spillover** (any SKU) +- Redirects requests to backup deployment at capacity +- Requires existing deployment; list with: +```bash +az cognitiveservices account deployment list \ + --name $ACCOUNT_NAME \ + --resource-group $RESOURCE_GROUP \ + --query "[].name" -o json +``` +- Default: disabled + +--- + +## Phase 10: Configure Version Upgrade Policy + +> ⚠️ **Note:** Version upgrade policies only apply to OpenAI models. Skip this phase if `MODEL_FORMAT != "OpenAI"`. + +| Policy | Description | +|--------|-------------| +| `OnceNewDefaultVersionAvailable` | Auto-upgrade to new default (Recommended) | +| `OnceCurrentVersionExpired` | Upgrade only when current expires | +| `NoAutoUpgrade` | Manual upgrade only | + +Default: `OnceNewDefaultVersionAvailable`. + +--- + +## Phase 11: Generate Deployment Name + +List existing deployments to avoid conflicts: +```bash +az cognitiveservices account deployment list \ + --name $ACCOUNT_NAME \ + --resource-group $RESOURCE_GROUP \ + --query "[].name" -o json +``` + +Auto-generate: use model name as base, append `-2`, `-3` etc. if taken. Allow custom override. Validate: `^[\w.-]{2,64}$`. + +--- + +## Phase 12: Review Configuration + +Display summary of all selections for user confirmation before proceeding: +- Model, version, deployment name +- SKU, capacity (with unit), region +- RAI policy, version upgrade policy +- Advanced options (dynamic quota, priority, spillover) +- Account, resource group, project + +User confirms or cancels. + +--- + +## Phase 13: Execute Deployment + +> 💡 `MODEL_FORMAT` was already detected in Phase 4. Use the stored value here. + +### Standard CLI deployment (non-Anthropic models): + +**Create deployment:** +```bash +az cognitiveservices account deployment create \ + --name $ACCOUNT_NAME \ + --resource-group $RESOURCE_GROUP \ + --deployment-name $DEPLOYMENT_NAME \ + --model-name $MODEL_NAME \ + --model-version $MODEL_VERSION \ + --model-format "$MODEL_FORMAT" \ + --sku-name $SELECTED_SKU \ + --sku-capacity $DEPLOY_CAPACITY +``` + +> 💡 **Note:** For non-OpenAI MaaS models, `$DEPLOY_CAPACITY` is `1` (set in Phase 7). + +### Anthropic model deployment (requires modelProviderData): + +The Azure CLI does not support `--model-provider-data`. Use the ARM REST API directly. + +> ⚠️ Industry, country code, and organization name should have been collected in Phase 7c. + +```bash +echo "Creating Anthropic model deployment via REST API..." + +az rest --method PUT \ + --url "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.CognitiveServices/accounts/$ACCOUNT_NAME/deployments/$DEPLOYMENT_NAME?api-version=2024-10-01" \ + --body "{ + \"sku\": { + \"name\": \"$SELECTED_SKU\", + \"capacity\": 1 + }, + \"properties\": { + \"model\": { + \"format\": \"Anthropic\", + \"name\": \"$MODEL_NAME\", + \"version\": \"$MODEL_VERSION\" + }, + \"modelProviderData\": { + \"industry\": \"$SELECTED_INDUSTRY\", + \"countryCode\": \"$COUNTRY_CODE\", + \"organizationName\": \"$ORG_NAME\" + } + } + }" +``` + +*PowerShell version:* +```powershell +Write-Host "Creating Anthropic model deployment via REST API..." + +$body = @{ + sku = @{ + name = $SELECTED_SKU + capacity = 1 + } + properties = @{ + model = @{ + format = "Anthropic" + name = $MODEL_NAME + version = $MODEL_VERSION + } + modelProviderData = @{ + industry = $SELECTED_INDUSTRY + countryCode = $countryCode + organizationName = $orgName + } + } +} | ConvertTo-Json -Depth 5 + +az rest --method PUT ` + --url "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.CognitiveServices/accounts/$ACCOUNT_NAME/deployments/${DEPLOYMENT_NAME}?api-version=2024-10-01" ` + --body $body +``` + +> 💡 **Note:** Anthropic models use `capacity: 1` (MaaS billing model), not TPM-based capacity. RAI policy is not applicable for Anthropic models. + +### Monitor deployment status: +```bash +az cognitiveservices account deployment show \ + --name $ACCOUNT_NAME \ + --resource-group $RESOURCE_GROUP \ + --deployment-name $DEPLOYMENT_NAME \ + --query "properties.provisioningState" -o tsv +``` + +Poll until `Succeeded` or `Failed`. Timeout after 5 minutes. + +**Get endpoint:** +```bash +az cognitiveservices account show \ + --name $ACCOUNT_NAME \ + --resource-group $RESOURCE_GROUP \ + --query "properties.endpoint" -o tsv +``` + +On success, display deployment name, model, version, SKU, capacity, region, RAI policy, rate limits, endpoint, and Azure AI Foundry portal link. diff --git a/skills/microsoft-foundry/models/deploy-model/preset/EXAMPLES.md b/skills/microsoft-foundry/models/deploy-model/preset/EXAMPLES.md new file mode 100644 index 00000000..0a97a6d6 --- /dev/null +++ b/skills/microsoft-foundry/models/deploy-model/preset/EXAMPLES.md @@ -0,0 +1,68 @@ +# Examples: preset + +## Example 1: Fast Path — Current Region Has Capacity + +**Scenario:** Deploy gpt-4o to project in East US, which has capacity. +**Result:** Deployed in ~45s. No region selection needed. 100K TPM default, GlobalStandard SKU. + +## Example 2: Alternative Region — No Capacity in Current Region + +**Scenario:** Deploy gpt-4-turbo to dev project in West US 2 (no capacity). +**Result:** Queried all regions → user selected East US 2 (120K available) → deployed in ~2 min. + +## Example 3: Create New Project in Optimal Region + +**Scenario:** Deploy gpt-4o-mini in Europe for data residency; no existing European project. +**Result:** Created AI Services hub + project in Sweden Central → deployed in ~4 min with 150K TPM. + +## Example 4: Insufficient Quota Everywhere + +**Scenario:** Deploy gpt-4 but all regions have exhausted quota. +**Result:** Graceful failure with actionable guidance: +1. Request quota increase via the [quota skill](../../../quota/quota.md) +2. List existing deployments consuming quota +3. Suggest alternative models (gpt-4o, gpt-4o-mini) + +## Example 5: First-Time User — No Project + +**Scenario:** Deploy gpt-4o with no existing AI Foundry project. +**Result:** Full onboarding in ~5 min — created resource group, AI Services hub, project, then deployed. + +## Example 6: Deployment Name Conflict + +**Scenario:** Auto-generated deployment name already exists. +**Result:** Appended random hex suffix (e.g., `-7b9e`) and retried automatically. + +## Example 7: Multi-Version Model Selection + +**Scenario:** Deploy "latest gpt-4o" when multiple versions exist. +**Result:** Latest stable version auto-selected. Capacity aggregated across versions. + +## Example 8: Anthropic Model (claude-sonnet-4-6) + +**Scenario:** Deploy claude-sonnet-4-6 (Anthropic model requiring modelProviderData). +**Result:** User prompted for industry selection → tenant country code and org name fetched automatically → deployed via ARM REST API with `modelProviderData` payload in ~2 min. Capacity set to 1 (MaaS billing). + +--- + +## Summary of Scenarios + +| Scenario | Duration | Key Features | +|----------|----------|--------------| +| **1: Fast Path** | ~45s | Current region has capacity, direct deploy | +| **2: Alt Region** | ~2m | Region selection, project switch | +| **3: New Project** | ~4m | Project creation in optimal region | +| **4: No Quota** | N/A | Graceful failure, actionable guidance | +| **5: First-Time** | ~5m | Complete onboarding | +| **6: Name Conflict** | ~1m | Auto-retry with suffix | +| **7: Multi-Version** | ~1m | Latest version auto-selected | +| **8: Anthropic** | ~2m | Industry prompt, tenant info, REST API deploy | + +## Common Patterns + +``` +A: Quick Deploy Auth → Get Project → Check Region (✓) → Deploy +B: Region Select Auth → Get Project → Region (✗) → Query All → Select → Deploy +C: Full Onboarding Auth → No Projects → Create Project → Deploy +D: Error Recovery Deploy (✗) → Analyze → Fix → Retry +``` diff --git a/skills/microsoft-foundry/models/deploy-model/preset/SKILL.md b/skills/microsoft-foundry/models/deploy-model/preset/SKILL.md new file mode 100644 index 00000000..09fcc94c --- /dev/null +++ b/skills/microsoft-foundry/models/deploy-model/preset/SKILL.md @@ -0,0 +1,103 @@ +--- +name: preset +description: "Intelligently deploys Azure OpenAI models to optimal regions by analyzing capacity across all available regions. Automatically checks current region first and shows alternatives if needed. USE FOR: quick deployment, optimal region, best region, automatic region selection, fast setup, multi-region capacity check, high availability deployment, deploy to best location. DO NOT USE FOR: custom SKU selection (use customize), specific version selection (use customize), custom capacity configuration (use customize), PTU deployments (use customize)." +license: MIT +metadata: + author: Microsoft + version: "1.0.1" +--- + +# Deploy Model to Optimal Region + +Automates intelligent Azure OpenAI model deployment by checking capacity across regions and deploying to the best available option. + +## What This Skill Does + +1. Verifies Azure authentication and project scope +2. Checks capacity in current project's region +3. If no capacity: analyzes all regions and shows available alternatives +4. Filters projects by selected region +5. Supports creating new projects if needed +6. Deploys model with GlobalStandard SKU +7. Monitors deployment progress + +## Prerequisites + +- Azure CLI installed and configured +- Active Azure subscription with Cognitive Services read/create permissions +- Azure AI Foundry project resource ID (`PROJECT_RESOURCE_ID` env var or provided interactively) + - Format: `/subscriptions/{sub-id}/resourceGroups/{rg}/providers/Microsoft.CognitiveServices/accounts/{account}/projects/{project}` + - Found in: Azure AI Foundry portal → Project → Overview → Resource ID + +## Quick Workflow + +### Fast Path (Current Region Has Capacity) +``` +1. Check authentication → 2. Get project → 3. Check current region capacity +→ 4. Deploy immediately +``` + +### Alternative Region Path (No Capacity) +``` +1. Check authentication → 2. Get project → 3. Check current region (no capacity) +→ 4. Query all regions → 5. Show alternatives → 6. Select region + project +→ 7. Deploy +``` + +--- + +## Deployment Phases + +| Phase | Action | Key Commands | +|-------|--------|-------------| +| 1. Verify Auth | Check Azure CLI login and subscription | `az account show`, `az login` | +| 2. Get Project | Parse `PROJECT_RESOURCE_ID` ARM ID, verify exists | `az cognitiveservices account show` | +| 3. Get Model | List available models, user selects model + version | `az cognitiveservices account list-models` | +| 4. Check Current Region | Query capacity using GlobalStandard SKU | `az rest --method GET .../modelCapacities` | +| 5. Multi-Region Query | If no local capacity, query all regions | Same capacity API without location filter | +| 6. Select Region + Project | User picks region; find or create project | `az cognitiveservices account list`, `az cognitiveservices account create` | +| 7. Deploy | Generate unique name, calculate capacity (50% available, min 50 TPM), create deployment | `az cognitiveservices account deployment create` | + +For detailed step-by-step instructions, see [workflow reference](references/workflow.md). + +--- + +## Error Handling + +| Error | Symptom | Resolution | +|-------|---------|------------| +| Auth failure | `az account show` returns error | Run `az login` then `az account set --subscription ` | +| No quota | All regions show 0 capacity | Defer to the [quota skill](../../../quota/quota.md) for increase requests and troubleshooting; check existing deployments; try alternative models | +| Model not found | Empty capacity list | Verify model name with `az cognitiveservices account list-models`; check case sensitivity | +| Name conflict | "deployment already exists" | Append suffix to deployment name (handled automatically by `generate_deployment_name` script) | +| Region unavailable | Region doesn't support model | Select a different region from the available list | +| Permission denied | "Forbidden" or "Unauthorized" | Verify Cognitive Services Contributor role: `az role assignment list --assignee ` | + +--- + +## Advanced Usage + +```bash +# Custom capacity +az cognitiveservices account deployment create ... --sku-capacity + +# Check deployment status +az cognitiveservices account deployment show --name --resource-group --deployment-name --query "{Status:properties.provisioningState}" + +# Delete deployment +az cognitiveservices account deployment delete --name --resource-group --deployment-name +``` + +## Notes + +- **SKU:** GlobalStandard only — **API Version:** 2024-10-01 (GA stable) + +--- + +## Related Skills + +- **microsoft-foundry** - Parent skill for Azure AI Foundry operations +- **[quota](../../../quota/quota.md)** — For quota viewing, increase requests, and troubleshooting quota errors, defer to this skill +- **azure-quick-review** - Review Azure resources for compliance +- **azure-cost-estimation** - Estimate costs for Azure deployments +- **azure-validate** - Validate Azure infrastructure before deployment diff --git a/skills/microsoft-foundry/models/deploy-model/preset/references/preset-workflow.md b/skills/microsoft-foundry/models/deploy-model/preset/references/preset-workflow.md new file mode 100644 index 00000000..2598904e --- /dev/null +++ b/skills/microsoft-foundry/models/deploy-model/preset/references/preset-workflow.md @@ -0,0 +1,694 @@ +# Preset Deployment Workflow - Detailed Implementation + +This file contains the full step-by-step bash/PowerShell scripts for preset (optimal region) model deployment. Referenced from the main [SKILL.md](../SKILL.md). + +--- + +## Phase 1: Verify Authentication + +Check if user is logged into Azure CLI: + +```bash +az account show --query "{Subscription:name, User:user.name}" -o table +``` + +**If not logged in:** +```bash +az login +``` + +**Verify subscription is correct:** +```bash +# List all subscriptions +az account list --query "[].[name,id,state]" -o table + +# Set active subscription if needed +az account set --subscription +``` + +--- + +## Phase 2: Get Current Project + +**Check for PROJECT_RESOURCE_ID environment variable first:** + +```bash +if [ -n "$PROJECT_RESOURCE_ID" ]; then + echo "Using project resource ID from environment: $PROJECT_RESOURCE_ID" +else + echo "PROJECT_RESOURCE_ID not set. Please provide your Azure AI Foundry project resource ID." + echo "" + echo "You can find this in:" + echo " • Azure AI Foundry portal → Project → Overview → Resource ID" + echo " • Format: /subscriptions/{sub-id}/resourceGroups/{rg}/providers/Microsoft.CognitiveServices/accounts/{account}/projects/{project}" + echo "" + echo "Example: /subscriptions/abc123.../resourceGroups/rg-prod/providers/Microsoft.CognitiveServices/accounts/my-account/projects/my-project" + echo "" + read -p "Enter project resource ID: " PROJECT_RESOURCE_ID +fi +``` + +**Parse the ARM resource ID to extract components:** + +```bash +# Extract components from ARM resource ID +# Format: /subscriptions/{sub-id}/resourceGroups/{rg}/providers/Microsoft.CognitiveServices/accounts/{account}/projects/{project} + +SUBSCRIPTION_ID=$(echo "$PROJECT_RESOURCE_ID" | sed -n 's|.*/subscriptions/\([^/]*\).*|\1|p') +RESOURCE_GROUP=$(echo "$PROJECT_RESOURCE_ID" | sed -n 's|.*/resourceGroups/\([^/]*\).*|\1|p') +ACCOUNT_NAME=$(echo "$PROJECT_RESOURCE_ID" | sed -n 's|.*/accounts/\([^/]*\)/projects.*|\1|p') +PROJECT_NAME=$(echo "$PROJECT_RESOURCE_ID" | sed -n 's|.*/projects/\([^/?]*\).*|\1|p') + +if [ -z "$SUBSCRIPTION_ID" ] || [ -z "$RESOURCE_GROUP" ] || [ -z "$ACCOUNT_NAME" ] || [ -z "$PROJECT_NAME" ]; then + echo "❌ Invalid project resource ID format" + echo "Expected format: /subscriptions/{sub-id}/resourceGroups/{rg}/providers/Microsoft.CognitiveServices/accounts/{account}/projects/{project}" + exit 1 +fi + +echo "Parsed project details:" +echo " Subscription: $SUBSCRIPTION_ID" +echo " Resource Group: $RESOURCE_GROUP" +echo " Account: $ACCOUNT_NAME" +echo " Project: $PROJECT_NAME" +``` + +**Verify the project exists and get its region:** + +```bash +# Set active subscription +az account set --subscription "$SUBSCRIPTION_ID" + +# Get project details to verify it exists and extract region +PROJECT_REGION=$(az cognitiveservices account show \ + --name "$PROJECT_NAME" \ + --resource-group "$RESOURCE_GROUP" \ + --query location -o tsv 2>/dev/null) + +if [ -z "$PROJECT_REGION" ]; then + echo "❌ Project '$PROJECT_NAME' not found in resource group '$RESOURCE_GROUP'" + echo "" + echo "Please verify the resource ID is correct." + echo "" + echo "List available projects:" + echo " az cognitiveservices account list --query \"[?kind=='AIProject'].{Name:name, Location:location, ResourceGroup:resourceGroup}\" -o table" + exit 1 +fi + +echo "✓ Project found" +echo " Region: $PROJECT_REGION" +``` + +--- + +## Phase 3: Get Model Name + +**If model name provided as skill parameter, skip this phase.** + +Ask user which model to deploy. **Fetch available models dynamically** from the account rather than using a hardcoded list: + +```bash +# List available models in the account +az cognitiveservices account list-models \ + --name "$PROJECT_NAME" \ + --resource-group "$RESOURCE_GROUP" \ + --query "[].name" -o tsv | sort -u +``` + +Present the results to the user and let them choose, or enter a custom model name. + +**Store model:** +```bash +MODEL_NAME="" +``` + +**Get model version (latest stable):** +```bash +# List available models and versions in the account +az cognitiveservices account list-models \ + --name "$PROJECT_NAME" \ + --resource-group "$RESOURCE_GROUP" \ + --query "[?name=='$MODEL_NAME'].{Name:name, Version:version, Format:format}" \ + -o table +``` + +**Use latest version or let user specify:** +```bash +MODEL_VERSION="" +``` + +**Detect model format:** + +```bash +# Get model format from model catalog (e.g., OpenAI, Anthropic, Meta-Llama, Mistral, Cohere) +MODEL_FORMAT=$(az cognitiveservices account list-models \ + --name "$ACCOUNT_NAME" \ + --resource-group "$RESOURCE_GROUP" \ + --query "[?name=='$MODEL_NAME'].format" -o tsv | head -1) + +# Default to OpenAI if not found +MODEL_FORMAT=${MODEL_FORMAT:-"OpenAI"} + +echo "Model format: $MODEL_FORMAT" +``` + +> 💡 **Model format determines the deployment path:** +> - `OpenAI` — Standard CLI deployment, TPM-based capacity, RAI policies apply +> - `Anthropic` — REST API deployment with `modelProviderData`, capacity=1, no RAI +> - All other formats (`Meta-Llama`, `Mistral`, `Cohere`, etc.) — Standard CLI deployment, capacity=1 (MaaS), no RAI + +--- + +## Phase 4: Check Current Region Capacity + +Before checking other regions, see if the current project's region has capacity: + +```bash +# Query capacity for current region +CAPACITY_JSON=$(az rest --method GET \ + --url "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/providers/Microsoft.CognitiveServices/locations/$PROJECT_REGION/modelCapacities?api-version=2024-10-01&modelFormat=$MODEL_FORMAT&modelName=$MODEL_NAME&modelVersion=$MODEL_VERSION") + +# Extract available capacity for GlobalStandard SKU +CURRENT_CAPACITY=$(echo "$CAPACITY_JSON" | jq -r '.value[] | select(.properties.skuName=="GlobalStandard") | .properties.availableCapacity') +``` + +**Check result:** +```bash +if [ -n "$CURRENT_CAPACITY" ] && [ "$CURRENT_CAPACITY" -gt 0 ]; then + echo "✓ Current region ($PROJECT_REGION) has capacity: $CURRENT_CAPACITY TPM" + echo "Proceeding with deployment..." + # Skip to Phase 7 (Deploy) +else + echo "⚠ Current region ($PROJECT_REGION) has no available capacity" + echo "Checking alternative regions..." + # Continue to Phase 5 +fi +``` + +--- + +## Phase 5: Query Multi-Region Capacity (If Needed) + +Only execute this phase if current region has no capacity. + +**Query capacity across all regions:** +```bash +# Get capacity for all regions in subscription +ALL_REGIONS_JSON=$(az rest --method GET \ + --url "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/providers/Microsoft.CognitiveServices/modelCapacities?api-version=2024-10-01&modelFormat=$MODEL_FORMAT&modelName=$MODEL_NAME&modelVersion=$MODEL_VERSION") + +# Save to file for processing +echo "$ALL_REGIONS_JSON" > /tmp/capacity_check.json +``` + +**Parse and categorize regions:** +```bash +# Extract available regions (capacity > 0) +AVAILABLE_REGIONS=$(jq -r '.value[] | select(.properties.skuName=="GlobalStandard" and .properties.availableCapacity > 0) | "\(.location)|\(.properties.availableCapacity)"' /tmp/capacity_check.json) + +# Extract unavailable regions (capacity = 0 or undefined) +UNAVAILABLE_REGIONS=$(jq -r '.value[] | select(.properties.skuName=="GlobalStandard" and (.properties.availableCapacity == 0 or .properties.availableCapacity == null)) | "\(.location)|0"' /tmp/capacity_check.json) +``` + +**Format and display regions:** +```bash +# Format capacity (e.g., 120000 -> 120K) +format_capacity() { + local capacity=$1 + if [ "$capacity" -ge 1000000 ]; then + echo "$(awk "BEGIN {printf \"%.1f\", $capacity/1000000}")M TPM" + elif [ "$capacity" -ge 1000 ]; then + echo "$(awk "BEGIN {printf \"%.0f\", $capacity/1000}")K TPM" + else + echo "$capacity TPM" + fi +} + +echo "" +echo "⚠ No Capacity in Current Region" +echo "" +echo "The current project's region ($PROJECT_REGION) does not have available capacity for $MODEL_NAME." +echo "" +echo "Available Regions (with capacity):" +echo "" + +# Display available regions with formatted capacity +echo "$AVAILABLE_REGIONS" | while IFS='|' read -r region capacity; do + formatted_capacity=$(format_capacity "$capacity") + # Get region display name (capitalize and format) + region_display=$(echo "$region" | sed 's/\([a-z]\)\([a-z]*\)/\U\1\L\2/g; s/\([a-z]\)\([0-9]\)/\1 \2/g') + echo " • $region_display - $formatted_capacity" +done + +echo "" +echo "Unavailable Regions:" +echo "" + +# Display unavailable regions +echo "$UNAVAILABLE_REGIONS" | while IFS='|' read -r region capacity; do + region_display=$(echo "$region" | sed 's/\([a-z]\)\([a-z]*\)/\U\1\L\2/g; s/\([a-z]\)\([0-9]\)/\1 \2/g') + if [ "$capacity" = "0" ]; then + echo " ✗ $region_display (Insufficient quota - 0 TPM available)" + else + echo " ✗ $region_display (Model not supported)" + fi +done +``` + +**Handle no capacity anywhere:** +```bash +if [ -z "$AVAILABLE_REGIONS" ]; then + echo "" + echo "❌ No Available Capacity in Any Region" + echo "" + echo "No regions have available capacity for $MODEL_NAME with GlobalStandard SKU." + echo "" + echo "Next Steps:" + echo "1. Request quota increase — use the quota skill (../../../quota/quota.md)" + echo "" + echo "2. Check existing deployments (may be using quota):" + echo " az cognitiveservices account deployment list \\" + echo " --name $PROJECT_NAME \\" + echo " --resource-group $RESOURCE_GROUP" + echo "" + echo "3. Consider alternative models with lower capacity requirements:" + echo " • gpt-4o-mini (cost-effective, lower capacity requirements)" + echo " List available models: az cognitiveservices account list-models --name \$PROJECT_NAME --resource-group \$RESOURCE_GROUP --output table" + exit 1 +fi +``` + +--- + +## Phase 6: Select Region and Project + +**Ask user to select region from available options.** + +Example using AskUserQuestion: +- Present available regions as options +- Show capacity for each +- User selects preferred region + +**Store selection:** +```bash +SELECTED_REGION="" # e.g., "eastus2" +``` + +**Find projects in selected region:** +```bash +PROJECTS_IN_REGION=$(az cognitiveservices account list \ + --query "[?kind=='AIProject' && location=='$SELECTED_REGION'].{Name:name, ResourceGroup:resourceGroup}" \ + --output json) + +PROJECT_COUNT=$(echo "$PROJECTS_IN_REGION" | jq '. | length') + +if [ "$PROJECT_COUNT" -eq 0 ]; then + echo "No projects found in $SELECTED_REGION" + echo "Would you like to create a new project? (yes/no)" + # If yes, continue to project creation + # If no, exit or select different region +else + echo "Projects in $SELECTED_REGION:" + echo "$PROJECTS_IN_REGION" | jq -r '.[] | " • \(.Name) (\(.ResourceGroup))"' + echo "" + echo "Select a project or create new project" +fi +``` + +**Option A: Use existing project** +```bash +PROJECT_NAME="" +RESOURCE_GROUP="" +``` + +**Option B: Create new project** +```bash +# Generate project name +USER_ALIAS=$(az account show --query user.name -o tsv | cut -d'@' -f1 | tr '.' '-') +RANDOM_SUFFIX=$(openssl rand -hex 2) +NEW_PROJECT_NAME="${USER_ALIAS}-aiproject-${RANDOM_SUFFIX}" + +# Prompt for resource group +echo "Resource group for new project:" +echo " 1. Use existing resource group: $RESOURCE_GROUP" +echo " 2. Create new resource group" + +# If existing resource group +NEW_RESOURCE_GROUP="$RESOURCE_GROUP" + +# Create AI Services account (hub) +HUB_NAME="${NEW_PROJECT_NAME}-hub" + +echo "Creating AI Services hub: $HUB_NAME in $SELECTED_REGION..." + +az cognitiveservices account create \ + --name "$HUB_NAME" \ + --resource-group "$NEW_RESOURCE_GROUP" \ + --location "$SELECTED_REGION" \ + --kind "AIServices" \ + --sku "S0" \ + --yes + +# Create AI Foundry project +echo "Creating AI Foundry project: $NEW_PROJECT_NAME..." + +az cognitiveservices account create \ + --name "$NEW_PROJECT_NAME" \ + --resource-group "$NEW_RESOURCE_GROUP" \ + --location "$SELECTED_REGION" \ + --kind "AIProject" \ + --sku "S0" \ + --yes + +echo "✓ Project created successfully" +PROJECT_NAME="$NEW_PROJECT_NAME" +RESOURCE_GROUP="$NEW_RESOURCE_GROUP" +``` + +--- + +## Phase 7: Deploy Model + +**Generate unique deployment name:** + +The deployment name should match the model name (e.g., "gpt-4o"), but if a deployment with that name already exists, append a numeric suffix (e.g., "gpt-4o-2", "gpt-4o-3"). This follows the same UX pattern as Azure AI Foundry portal. + +Use the `generate_deployment_name` script to check existing deployments and generate a unique name: + +*Bash version:* +```bash +DEPLOYMENT_NAME=$(bash scripts/generate_deployment_name.sh \ + "$ACCOUNT_NAME" \ + "$RESOURCE_GROUP" \ + "$MODEL_NAME") + +echo "Generated deployment name: $DEPLOYMENT_NAME" +``` + +*PowerShell version:* +```powershell +$DEPLOYMENT_NAME = & .\scripts\generate_deployment_name.ps1 ` + -AccountName $ACCOUNT_NAME ` + -ResourceGroup $RESOURCE_GROUP ` + -ModelName $MODEL_NAME + +Write-Host "Generated deployment name: $DEPLOYMENT_NAME" +``` + +**Calculate deployment capacity:** + +Follow UX capacity calculation logic. For OpenAI models, use 50% of available capacity (minimum 50 TPM). For all other models (MaaS), capacity is always 1: + +```bash +if [ "$MODEL_FORMAT" = "OpenAI" ]; then + # OpenAI models: TPM-based capacity (50% of available, minimum 50) + SELECTED_CAPACITY=$(echo "$ALL_REGIONS_JSON" | jq -r ".value[] | select(.location==\"$SELECTED_REGION\" and .properties.skuName==\"GlobalStandard\") | .properties.availableCapacity") + + if [ "$SELECTED_CAPACITY" -gt 50 ]; then + DEPLOY_CAPACITY=$((SELECTED_CAPACITY / 2)) + if [ "$DEPLOY_CAPACITY" -lt 50 ]; then + DEPLOY_CAPACITY=50 + fi + else + DEPLOY_CAPACITY=$SELECTED_CAPACITY + fi + + echo "Deploying with capacity: $DEPLOY_CAPACITY TPM (50% of available: $SELECTED_CAPACITY TPM)" +else + # Non-OpenAI models (MaaS): capacity is always 1 + DEPLOY_CAPACITY=1 + echo "MaaS model — deploying with capacity: 1 (pay-per-token billing)" +fi +``` + +### If MODEL_FORMAT is NOT "Anthropic" — Standard CLI Deployment + +> 💡 **Note:** The Azure CLI supports all non-Anthropic model formats directly. + +*Bash version:* +```bash +echo "Creating deployment..." + +az cognitiveservices account deployment create \ + --name "$ACCOUNT_NAME" \ + --resource-group "$RESOURCE_GROUP" \ + --deployment-name "$DEPLOYMENT_NAME" \ + --model-name "$MODEL_NAME" \ + --model-version "$MODEL_VERSION" \ + --model-format "$MODEL_FORMAT" \ + --sku-name "GlobalStandard" \ + --sku-capacity "$DEPLOY_CAPACITY" +``` + +*PowerShell version:* +```powershell +Write-Host "Creating deployment..." + +az cognitiveservices account deployment create ` + --name $ACCOUNT_NAME ` + --resource-group $RESOURCE_GROUP ` + --deployment-name $DEPLOYMENT_NAME ` + --model-name $MODEL_NAME ` + --model-version $MODEL_VERSION ` + --model-format $MODEL_FORMAT ` + --sku-name "GlobalStandard" ` + --sku-capacity $DEPLOY_CAPACITY +``` + +> 💡 **Note:** For non-OpenAI MaaS models (Meta-Llama, Mistral, Cohere, etc.), `$DEPLOY_CAPACITY` is `1` (set in capacity calculation above). + +### If MODEL_FORMAT is "Anthropic" — REST API Deployment with modelProviderData + +The Azure CLI does not support `--model-provider-data`. You must use the ARM REST API directly. + +**Step 1: Prompt user to select industry** + +Present the following list and ask the user to choose one: + +``` + 1. None (API value: none) + 2. Biotechnology (API value: biotechnology) + 3. Consulting (API value: consulting) + 4. Education (API value: education) + 5. Finance (API value: finance) + 6. Food & Beverage (API value: food_and_beverage) + 7. Government (API value: government) + 8. Healthcare (API value: healthcare) + 9. Insurance (API value: insurance) +10. Law (API value: law) +11. Manufacturing (API value: manufacturing) +12. Media (API value: media) +13. Nonprofit (API value: nonprofit) +14. Technology (API value: technology) +15. Telecommunications (API value: telecommunications) +16. Sport & Recreation (API value: sport_and_recreation) +17. Real Estate (API value: real_estate) +18. Retail (API value: retail) +19. Other (API value: other) +``` + +> ⚠️ **Do NOT pick a default industry or hardcode a value. Always ask the user.** This is required by Anthropic's terms of service. The industry list is static — there is no REST API that provides it. + +Store selection as `SELECTED_INDUSTRY` (use the API value, e.g., `technology`). + +**Step 2: Fetch tenant info (country code and organization name)** + +```bash +TENANT_INFO=$(az rest --method GET \ + --url "https://management.azure.com/tenants?api-version=2024-11-01" \ + --query "value[0].{countryCode:countryCode, displayName:displayName}" -o json) + +COUNTRY_CODE=$(echo "$TENANT_INFO" | jq -r '.countryCode') +ORG_NAME=$(echo "$TENANT_INFO" | jq -r '.displayName') +``` + +*PowerShell version:* +```powershell +$tenantInfo = az rest --method GET ` + --url "https://management.azure.com/tenants?api-version=2024-11-01" ` + --query "value[0].{countryCode:countryCode, displayName:displayName}" -o json | ConvertFrom-Json + +$countryCode = $tenantInfo.countryCode +$orgName = $tenantInfo.displayName +``` + +**Step 3: Deploy via ARM REST API** + +*Bash version:* +```bash +echo "Creating Anthropic model deployment via REST API..." + +az rest --method PUT \ + --url "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.CognitiveServices/accounts/$ACCOUNT_NAME/deployments/$DEPLOYMENT_NAME?api-version=2024-10-01" \ + --body "{ + \"sku\": { + \"name\": \"GlobalStandard\", + \"capacity\": 1 + }, + \"properties\": { + \"model\": { + \"format\": \"Anthropic\", + \"name\": \"$MODEL_NAME\", + \"version\": \"$MODEL_VERSION\" + }, + \"modelProviderData\": { + \"industry\": \"$SELECTED_INDUSTRY\", + \"countryCode\": \"$COUNTRY_CODE\", + \"organizationName\": \"$ORG_NAME\" + } + } + }" +``` + +*PowerShell version:* +```powershell +Write-Host "Creating Anthropic model deployment via REST API..." + +$body = @{ + sku = @{ + name = "GlobalStandard" + capacity = 1 + } + properties = @{ + model = @{ + format = "Anthropic" + name = $MODEL_NAME + version = $MODEL_VERSION + } + modelProviderData = @{ + industry = $SELECTED_INDUSTRY + countryCode = $countryCode + organizationName = $orgName + } + } +} | ConvertTo-Json -Depth 5 + +az rest --method PUT ` + --url "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.CognitiveServices/accounts/$ACCOUNT_NAME/deployments/${DEPLOYMENT_NAME}?api-version=2024-10-01" ` + --body $body +``` + +> 💡 **Note:** Anthropic models use `capacity: 1` (MaaS billing model), not TPM-based capacity. + +**Monitor deployment progress:** +```bash +echo "Monitoring deployment status..." + +MAX_WAIT=300 # 5 minutes +ELAPSED=0 +INTERVAL=10 + +while [ $ELAPSED -lt $MAX_WAIT ]; do + STATUS=$(az cognitiveservices account deployment show \ + --name "$ACCOUNT_NAME" \ + --resource-group "$RESOURCE_GROUP" \ + --deployment-name "$DEPLOYMENT_NAME" \ + --query "properties.provisioningState" -o tsv 2>/dev/null) + + case "$STATUS" in + "Succeeded") + echo "✓ Deployment successful!" + break + ;; + "Failed") + echo "❌ Deployment failed" + # Get error details + az cognitiveservices account deployment show \ + --name "$ACCOUNT_NAME" \ + --resource-group "$RESOURCE_GROUP" \ + --deployment-name "$DEPLOYMENT_NAME" \ + --query "properties" + exit 1 + ;; + "Creating"|"Accepted"|"Running") + echo "Status: $STATUS... (${ELAPSED}s elapsed)" + sleep $INTERVAL + ELAPSED=$((ELAPSED + INTERVAL)) + ;; + *) + echo "Unknown status: $STATUS" + sleep $INTERVAL + ELAPSED=$((ELAPSED + INTERVAL)) + ;; + esac +done + +if [ $ELAPSED -ge $MAX_WAIT ]; then + echo "⚠ Deployment timeout after ${MAX_WAIT}s" + echo "Check status manually:" + echo " az cognitiveservices account deployment show \\" + echo " --name $ACCOUNT_NAME \\" + echo " --resource-group $RESOURCE_GROUP \\" + echo " --deployment-name $DEPLOYMENT_NAME" + exit 1 +fi +``` + +--- + +## Phase 8: Display Deployment Details + +**Show deployment information:** +```bash +echo "" +echo "═══════════════════════════════════════════" +echo "✓ Deployment Successful!" +echo "═══════════════════════════════════════════" +echo "" + +# Get endpoint information +ENDPOINT=$(az cognitiveservices account show \ + --name "$ACCOUNT_NAME" \ + --resource-group "$RESOURCE_GROUP" \ + --query "properties.endpoint" -o tsv) + +# Get deployment details +DEPLOYMENT_INFO=$(az cognitiveservices account deployment show \ + --name "$ACCOUNT_NAME" \ + --resource-group "$RESOURCE_GROUP" \ + --deployment-name "$DEPLOYMENT_NAME" \ + --query "properties.model") + +echo "Deployment Name: $DEPLOYMENT_NAME" +echo "Model: $MODEL_NAME" +echo "Version: $MODEL_VERSION" +echo "Region: $SELECTED_REGION" +echo "SKU: GlobalStandard" +echo "Capacity: $(format_capacity $DEPLOY_CAPACITY)" +echo "Endpoint: $ENDPOINT" +echo "" + +# Generate direct link to deployment in Azure AI Foundry portal +DEPLOYMENT_URL=$(bash "$(dirname "$0")/scripts/generate_deployment_url.sh" \ + --subscription "$SUBSCRIPTION_ID" \ + --resource-group "$RESOURCE_GROUP" \ + --foundry-resource "$ACCOUNT_NAME" \ + --project "$PROJECT_NAME" \ + --deployment "$DEPLOYMENT_NAME") + +echo "🔗 View in Azure AI Foundry Portal:" +echo "" +echo "$DEPLOYMENT_URL" +echo "" +echo "═══════════════════════════════════════════" +echo "" + +echo "Test your deployment:" +echo "" +echo "# View deployment details" +echo "az cognitiveservices account deployment show \\" +echo " --name $ACCOUNT_NAME \\" +echo " --resource-group $RESOURCE_GROUP \\" +echo " --deployment-name $DEPLOYMENT_NAME" +echo "" +echo "# List all deployments" +echo "az cognitiveservices account deployment list \\" +echo " --name $ACCOUNT_NAME \\" +echo " --resource-group $RESOURCE_GROUP \\" +echo " --output table" +echo "" + +echo "Next steps:" +echo "• Click the link above to test in Azure AI Foundry playground" +echo "• Integrate into your application" +echo "• Set up monitoring and alerts" +``` diff --git a/skills/microsoft-foundry/models/deploy-model/preset/references/workflow.md b/skills/microsoft-foundry/models/deploy-model/preset/references/workflow.md new file mode 100644 index 00000000..109b2fc6 --- /dev/null +++ b/skills/microsoft-foundry/models/deploy-model/preset/references/workflow.md @@ -0,0 +1,174 @@ +# Preset Deployment Workflow — Step-by-Step + +Condensed implementation reference for preset (optimal region) model deployment. See [SKILL.md](../SKILL.md) for overview. + +**Table of Contents:** [Phase 1: Verify Authentication](#phase-1-verify-authentication) · [Phase 2: Get Current Project](#phase-2-get-current-project) · [Phase 3: Get Model Name](#phase-3-get-model-name) · [Phase 4: Check Current Region Capacity](#phase-4-check-current-region-capacity) · [Phase 5: Query Multi-Region Capacity](#phase-5-query-multi-region-capacity) · [Phase 6: Select Region and Project](#phase-6-select-region-and-project) · [Phase 7: Deploy Model](#phase-7-deploy-model) + +--- + +## Phase 1: Verify Authentication + +```bash +az account show --query "{Subscription:name, User:user.name}" -o table +``` + +If not logged in: `az login` + +Switch subscription: + +```bash +az account list --query "[].[name,id,state]" -o table +az account set --subscription +``` + +--- + +## Phase 2: Get Current Project + +Read `PROJECT_RESOURCE_ID` from env or prompt user. Format: +`/subscriptions/{sub-id}/resourceGroups/{rg}/providers/Microsoft.CognitiveServices/accounts/{account}/projects/{project}` + +Parse ARM ID components: + +```bash +SUBSCRIPTION_ID=$(echo "$PROJECT_RESOURCE_ID" | sed -n 's|.*/subscriptions/\([^/]*\).*|\1|p') +RESOURCE_GROUP=$(echo "$PROJECT_RESOURCE_ID" | sed -n 's|.*/resourceGroups/\([^/]*\).*|\1|p') +ACCOUNT_NAME=$(echo "$PROJECT_RESOURCE_ID" | sed -n 's|.*/accounts/\([^/]*\)/projects.*|\1|p') +PROJECT_NAME=$(echo "$PROJECT_RESOURCE_ID" | sed -n 's|.*/projects/\([^/?]*\).*|\1|p') +``` + +Verify project exists and get region: + +```bash +az account set --subscription "$SUBSCRIPTION_ID" + +PROJECT_REGION=$(az cognitiveservices account show \ + --name "$PROJECT_NAME" \ + --resource-group "$RESOURCE_GROUP" \ + --query location -o tsv) +``` + +--- + +## Phase 3: Get Model Name + +If model not provided as parameter, list available models: + +```bash +az cognitiveservices account list-models \ + --name "$PROJECT_NAME" \ + --resource-group "$RESOURCE_GROUP" \ + --query "[].name" -o tsv | sort -u +``` + +Get versions for selected model: + +```bash +az cognitiveservices account list-models \ + --name "$PROJECT_NAME" \ + --resource-group "$RESOURCE_GROUP" \ + --query "[?name=='$MODEL_NAME'].{Name:name, Version:version, Format:format}" \ + -o table +``` + +--- + +## Phase 4: Check Current Region Capacity + +```bash +CAPACITY_JSON=$(az rest --method GET \ + --url "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/providers/Microsoft.CognitiveServices/locations/$PROJECT_REGION/modelCapacities?api-version=2024-10-01&modelFormat=OpenAI&modelName=$MODEL_NAME&modelVersion=$MODEL_VERSION") + +CURRENT_CAPACITY=$(echo "$CAPACITY_JSON" | jq -r '.value[] | select(.properties.skuName=="GlobalStandard") | .properties.availableCapacity') +``` + +If `CURRENT_CAPACITY > 0` → skip to Phase 7. Otherwise continue to Phase 5. + +--- + +## Phase 5: Query Multi-Region Capacity + +```bash +ALL_REGIONS_JSON=$(az rest --method GET \ + --url "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/providers/Microsoft.CognitiveServices/modelCapacities?api-version=2024-10-01&modelFormat=OpenAI&modelName=$MODEL_NAME&modelVersion=$MODEL_VERSION") +``` + +Extract available regions (capacity > 0): + +```bash +AVAILABLE_REGIONS=$(echo "$ALL_REGIONS_JSON" | jq -r '.value[] | select(.properties.skuName=="GlobalStandard" and .properties.availableCapacity > 0) | "\(.location)|\(.properties.availableCapacity)"') +``` + +Extract unavailable regions: + +```bash +UNAVAILABLE_REGIONS=$(echo "$ALL_REGIONS_JSON" | jq -r '.value[] | select(.properties.skuName=="GlobalStandard" and (.properties.availableCapacity == 0 or .properties.availableCapacity == null)) | "\(.location)|0"') +``` + +If no regions have capacity, defer to the [quota skill](../../../../quota/quota.md) for increase requests. Suggest checking existing deployments or trying alternative models like `gpt-4o-mini`. + +--- + +## Phase 6: Select Region and Project + +Present available regions to user. Store selection as `SELECTED_REGION`. + +Find projects in selected region: + +```bash +PROJECTS_IN_REGION=$(az cognitiveservices account list \ + --query "[?kind=='AIProject' && location=='$SELECTED_REGION'].{Name:name, ResourceGroup:resourceGroup}" \ + --output json) +``` + +**If no projects exist — create new:** + +```bash +az cognitiveservices account create \ + --name "$HUB_NAME" \ + --resource-group "$RESOURCE_GROUP" \ + --location "$SELECTED_REGION" \ + --kind "AIServices" \ + --sku "S0" --yes + +az cognitiveservices account create \ + --name "$NEW_PROJECT_NAME" \ + --resource-group "$RESOURCE_GROUP" \ + --location "$SELECTED_REGION" \ + --kind "AIProject" \ + --sku "S0" --yes +``` + +--- + +## Phase 7: Deploy Model + +Generate unique deployment name using `scripts/generate_deployment_name.sh`: + +```bash +DEPLOYMENT_NAME=$(bash scripts/generate_deployment_name.sh "$ACCOUNT_NAME" "$RESOURCE_GROUP" "$MODEL_NAME") +``` + +Calculate capacity — 50% of available, minimum 50 TPM: + +```bash +SELECTED_CAPACITY=$(echo "$ALL_REGIONS_JSON" | jq -r ".value[] | select(.location==\"$SELECTED_REGION\" and .properties.skuName==\"GlobalStandard\") | .properties.availableCapacity") +DEPLOY_CAPACITY=$(( SELECTED_CAPACITY / 2 )) +[ "$DEPLOY_CAPACITY" -lt 50 ] && DEPLOY_CAPACITY=50 +``` + +Create deployment: + +```bash +az cognitiveservices account deployment create \ + --name "$ACCOUNT_NAME" \ + --resource-group "$RESOURCE_GROUP" \ + --deployment-name "$DEPLOYMENT_NAME" \ + --model-name "$MODEL_NAME" \ + --model-version "$MODEL_VERSION" \ + --model-format "OpenAI" \ + --sku-name "GlobalStandard" \ + --sku-capacity "$DEPLOY_CAPACITY" +``` + +Monitor with `az cognitiveservices account deployment show ... --query "properties.provisioningState"` until `Succeeded` or `Failed`. diff --git a/skills/microsoft-foundry/models/deploy-model/scripts/generate_deployment_url.ps1 b/skills/microsoft-foundry/models/deploy-model/scripts/generate_deployment_url.ps1 new file mode 100644 index 00000000..668949c9 --- /dev/null +++ b/skills/microsoft-foundry/models/deploy-model/scripts/generate_deployment_url.ps1 @@ -0,0 +1,73 @@ +# Generate Azure AI Foundry portal URL for a model deployment +# This script creates a direct clickable link to view a deployment in the Azure AI Foundry portal +# +# NOTE: The encoding scheme for the subscription ID portion is proprietary to Azure AI Foundry. +# This script uses a GUID byte encoding approach, but may need adjustment based on the actual encoding used. + +param( + [Parameter(Mandatory=$true)] + [string]$SubscriptionId, + + [Parameter(Mandatory=$true)] + [string]$ResourceGroup, + + [Parameter(Mandatory=$true)] + [string]$FoundryResource, + + [Parameter(Mandatory=$true)] + [string]$ProjectName, + + [Parameter(Mandatory=$true)] + [string]$DeploymentName +) + +function Get-SubscriptionIdEncoded { + param([string]$SubscriptionId) + + # Parse GUID and convert to bytes in string order (big-endian) + # Not using ToByteArray() because it uses little-endian format + $guidString = $SubscriptionId.Replace('-', '') + $bytes = New-Object byte[] 16 + for ($i = 0; $i -lt 16; $i++) { + $bytes[$i] = [Convert]::ToByte($guidString.Substring($i * 2, 2), 16) + } + + # Encode as base64url + $base64 = [Convert]::ToBase64String($bytes) + $urlSafe = $base64.Replace('+', '-').Replace('/', '_').TrimEnd('=') + return $urlSafe +} + +function Get-FoundryDeploymentUrl { + param( + [string]$SubscriptionId, + [string]$ResourceGroup, + [string]$FoundryResource, + [string]$ProjectName, + [string]$DeploymentName + ) + + # Encode subscription ID + $encodedSubId = Get-SubscriptionIdEncoded -SubscriptionId $SubscriptionId + + # Build the encoded resource path + # Format: {encoded-sub-id},{resource-group},,{foundry-resource},{project-name} + # Note: Two commas between resource-group and foundry-resource + $encodedPath = "$encodedSubId,$ResourceGroup,,$FoundryResource,$ProjectName" + + # Build the full URL + $baseUrl = "https://ai.azure.com/nextgen/r/" + $deploymentPath = "/build/models/deployments/$DeploymentName/details" + + return "$baseUrl$encodedPath$deploymentPath" +} + +# Generate and output the URL +$url = Get-FoundryDeploymentUrl ` + -SubscriptionId $SubscriptionId ` + -ResourceGroup $ResourceGroup ` + -FoundryResource $FoundryResource ` + -ProjectName $ProjectName ` + -DeploymentName $DeploymentName + +Write-Output $url diff --git a/skills/microsoft-foundry/models/deploy-model/scripts/generate_deployment_url.sh b/skills/microsoft-foundry/models/deploy-model/scripts/generate_deployment_url.sh new file mode 100644 index 00000000..3d01ee10 --- /dev/null +++ b/skills/microsoft-foundry/models/deploy-model/scripts/generate_deployment_url.sh @@ -0,0 +1,90 @@ +#!/bin/bash +# Generate Azure AI Foundry portal URL for a model deployment +# This script creates a direct clickable link to view a deployment in the Azure AI Foundry portal + +set -e + +# Function to display usage +usage() { + cat << EOF +Usage: $0 --subscription SUBSCRIPTION_ID --resource-group RESOURCE_GROUP \\ + --foundry-resource FOUNDRY_RESOURCE --project PROJECT_NAME \\ + --deployment DEPLOYMENT_NAME + +Generate Azure AI Foundry deployment URL + +Required arguments: + --subscription Azure subscription ID (GUID) + --resource-group Resource group name + --foundry-resource Foundry resource (account) name + --project Project name + --deployment Deployment name + +Example: + $0 --subscription d5320f9a-73da-4a74-b639-83efebc7bb6f \\ + --resource-group bani-host \\ + --foundry-resource banide-host-resource \\ + --project banide-host \\ + --deployment text-embedding-ada-002 +EOF + exit 1 +} + +# Parse command line arguments +while [[ $# -gt 0 ]]; do + case $1 in + --subscription) + SUBSCRIPTION_ID="$2" + shift 2 + ;; + --resource-group) + RESOURCE_GROUP="$2" + shift 2 + ;; + --foundry-resource) + FOUNDRY_RESOURCE="$2" + shift 2 + ;; + --project) + PROJECT_NAME="$2" + shift 2 + ;; + --deployment) + DEPLOYMENT_NAME="$2" + shift 2 + ;; + -h|--help) + usage + ;; + *) + echo "Unknown option: $1" + usage + ;; + esac +done + +# Validate required arguments +if [ -z "$SUBSCRIPTION_ID" ] || [ -z "$RESOURCE_GROUP" ] || [ -z "$FOUNDRY_RESOURCE" ] || \ + [ -z "$PROJECT_NAME" ] || [ -z "$DEPLOYMENT_NAME" ]; then + echo "Error: Missing required arguments" + usage +fi + +# Convert subscription GUID to bytes (big-endian/string order) and encode as base64url +# Remove hyphens from GUID +GUID_HEX=$(echo "$SUBSCRIPTION_ID" | tr -d '-') + +# Convert hex string to bytes and base64 encode +# Using xxd to convert hex to binary, then base64 encode +ENCODED_SUB=$(echo "$GUID_HEX" | xxd -r -p | base64 | tr '+' '-' | tr '/' '_' | tr -d '=') + +# Build the encoded resource path +# Format: {encoded-sub-id},{resource-group},,{foundry-resource},{project-name} +# Note: Two commas between resource-group and foundry-resource +ENCODED_PATH="${ENCODED_SUB},${RESOURCE_GROUP},,${FOUNDRY_RESOURCE},${PROJECT_NAME}" + +# Build the full URL +BASE_URL="https://ai.azure.com/nextgen/r/" +DEPLOYMENT_PATH="/build/models/deployments/${DEPLOYMENT_NAME}/details" + +echo "${BASE_URL}${ENCODED_PATH}${DEPLOYMENT_PATH}" diff --git a/skills/microsoft-foundry/project/connections.md b/skills/microsoft-foundry/project/connections.md new file mode 100644 index 00000000..d4f78be6 --- /dev/null +++ b/skills/microsoft-foundry/project/connections.md @@ -0,0 +1,58 @@ +# Foundry Project Connections + +Connections authenticate and link external resources to a Foundry project. Many agent tools (Azure AI Search, Bing Grounding, MCP) require a project connection before use. + +## Managing Connections via MCP + +Use the Foundry MCP server for all connection operations. The MCP tools handle authentication, validation, and project scoping automatically. + +| Operation | MCP Tool | Description | +|-----------|----------|-------------| +| List all connections | `foundry_connections_list` | Lists all connections in the current project | +| Get connection details | `foundry_connections_get` | Retrieves a specific connection by name, including its ID | +| Create a connection | `foundry_connections_create` | Creates a new connection to an external resource | +| Delete a connection | `foundry_connections_delete` | Removes a connection from the project | + +> 💡 **Tip:** The `connection_id` returned by `foundry_connections_get` is the value you pass as `project_connection_id` when configuring agent tools. + +## Create Connection via Portal + +1. Open [Microsoft Foundry portal](https://ai.azure.com) +2. Navigate to **Operate** → **Admin** → select your project +3. Select **Add connection** → choose service type +4. Browse for resource, select auth method, click **Add connection** + +## Connection ID Format + +For REST and TypeScript samples, the full connection ID format is: + +``` +/subscriptions/{subId}/resourceGroups/{rg}/providers/Microsoft.CognitiveServices/accounts/{account}/projects/{project}/connections/{connectionName} +``` + +Python and C# SDKs resolve this automatically from the connection name. + +## Common Connection Types + +| Type | Resource | Used By | +|------|----------|---------| +| `azure_ai_search` | Azure AI Search | AI Search tool | +| `bing` | Grounding with Bing Search | Bing grounding tool | +| `bing_custom_search` | Grounding with Bing Custom Search | Bing Custom Search tool | +| `api_key` | Any API-key resource | MCP servers, custom tools | +| `azure_openai` | Azure OpenAI | Model access | + +## RBAC for Connection Management + +| Role | Scope | Permission | +|------|-------|------------| +| **Azure AI Project Manager** | Project | Create/manage project connections | +| **Contributor** or **Owner** | Subscription/RG | Create Bing/Search resources, get keys | + +## Troubleshooting + +| Error | Cause | Fix | +|-------|-------|-----| +| `Connection not found` | Name mismatch or wrong project | Use `foundry_connections_list` to find correct name | +| `Unauthorized` creating connection | Missing Azure AI Project Manager role | Assign role on the Foundry project | +| `Invalid connection ID format` | Using name instead of full resource ID | Use `foundry_connections_get` to resolve the full ID | diff --git a/skills/microsoft-foundry/project/create/create-foundry-project.md b/skills/microsoft-foundry/project/create/create-foundry-project.md new file mode 100644 index 00000000..dcf61d26 --- /dev/null +++ b/skills/microsoft-foundry/project/create/create-foundry-project.md @@ -0,0 +1,134 @@ +--- +name: foundry-create-project +description: | + Create a new Azure AI Foundry project using Azure Developer CLI (azd) to provision infrastructure for hosting AI agents and models. + USE FOR: create Foundry project, new AI Foundry project, set up Foundry, azd init Foundry, provision Foundry infrastructure, onboard to Foundry, create Azure AI project, set up AI project. + DO NOT USE FOR: deploying agents to existing projects (use agent/deploy), creating agent code (use agent/create), deploying AI models from catalog (use microsoft-foundry main skill), Azure Functions (use azure-functions). +allowed-tools: Read, Write, Bash, AskUserQuestion +--- + +# Create Azure AI Foundry Project + +Create a new Azure AI Foundry project using azd. Provisions: Foundry account, project, Application Insights, managed identity, and RBAC permissions. Optionally enables hosted agents (capability host + Container Registry). + +**Table of Contents:** [Prerequisites](#prerequisites) · [Workflow](#workflow) · [Best Practices](#best-practices) · [Troubleshooting](#troubleshooting) · [Related Skills](#related-skills) · [Resources](#resources) + +## Prerequisites + +Run checks in order. STOP on any failure and resolve before proceeding. + +**1. Azure CLI** — `az version` → expects version output. If missing: https://aka.ms/installazurecli + +**2. Azure login & subscription:** + +```bash +az account show --query "{Name:name, SubscriptionId:id, State:state}" -o table +``` + +If not logged in, run `az login`. If no active subscription: https://azure.microsoft.com/free/ — STOP. + +If multiple subscriptions, ask which to use, then `az account set --subscription ""`. + +**3. Role permissions:** + +```bash +az role assignment list --assignee "$(az ad signed-in-user show --query id -o tsv)" --query "[?contains(roleDefinitionName, 'Owner') || contains(roleDefinitionName, 'Contributor') || contains(roleDefinitionName, 'Azure AI')].{Role:roleDefinitionName, Scope:scope}" -o table +``` + +Requires Owner, Contributor, or Azure AI Owner. If insufficient — STOP, request elevated access from admin. + +**4. Azure Developer CLI** — `azd version`. If missing: https://aka.ms/azure-dev/install + +## Workflow + +### Step 1: Verify azd login + +```bash +azd auth login --check-status +``` + +If not logged in, run `azd auth login` and complete browser auth. + +### Step 2: Ask User for Project Details + +Use AskUserQuestion for: + +1. **Project name** — used as azd environment name and resource group (`rg-`). Must contain only alphanumeric characters and hyphens. Examples: `my-ai-project`, `dev-agents` +2. **Azure location** (optional) — defaults to North Central US (required for hosted agents preview) +3. **Enable hosted agents?** (yes/no) — provisions a capability host and Container Registry for deploying hosted agents. Defaults to no. + +### Step 3: Create Directory and Initialize + +```bash +mkdir "" && cd "" +azd init -t https://github.com/Azure-Samples/azd-ai-starter-basic -e --no-prompt +``` + +- `-t` — Azure AI starter template (Foundry infrastructure) +- `-e` — environment name +- `--no-prompt` — non-interactive, use defaults +- **IMPORTANT:** `azd init` requires an empty directory + +If user specified a non-default location: + +```bash +azd config set defaults.location +``` + +If user chose to enable hosted agents: + +```bash +azd env set ENABLE_HOSTED_AGENTS true +``` + +This provisions a capability host (`capabilityHosts/agents`) on the Foundry account and auto-adds an Azure Container Registry for hosted agent deployments. + +### Step 4: Provision Infrastructure + +```bash +azd provision --no-prompt +``` + +Takes 5–10 minutes. Creates resource group, Foundry account/project, Application Insights, managed identity, and RBAC roles. If hosted agents enabled, also creates Container Registry and capability host. + +### Step 5: Retrieve Project Details + +```bash +azd env get-values +``` + +Capture `AZURE_AI_PROJECT_ID`, `AZURE_AI_PROJECT_ENDPOINT`, and `AZURE_RESOURCE_GROUP`. Direct user to verify at https://ai.azure.com. + +### Step 6: Next Steps + +- Deploy an agent → `agent/deploy` skill +- Browse models → `foundry_models_list` MCP tool +- Manage project → https://ai.azure.com + +## Best Practices + +- Use North Central US for hosted agents (preview requirement) +- Name must be alphanumeric + hyphens only — no spaces, underscores, or special characters +- Delete unused projects with `azd down` to avoid ongoing costs +- `azd down` deletes ALL resources — Foundry account, agents, models, Container Registry, and Application Insights data +- `azd provision` is safe to re-run on failure + +## Troubleshooting + +| Problem | Solution | +|---------|----------| +| `azd: command not found` | Install from https://aka.ms/azure-dev/install | +| `ERROR: Failed to authenticate` | Run `azd auth login`; verify subscription with `az account list` | +| `environment name '' is invalid` | Name must be alphanumeric + hyphens only | +| `ERROR: Insufficient permissions` | Request Contributor or Azure AI Owner role from admin | +| Region not supported for hosted agents | Use `azd config set defaults.location northcentralus` | +| Provisioning timeout | Check region availability, verify connectivity, retry `azd provision` | + +## Related Skills + +- **agent/deploy** — Deploy agents to the created project +- **agent/create** — Create a new agent for deployment + +## Resources + +- [Azure Developer CLI](https://aka.ms/azure-dev/install) · [AI Foundry Portal](https://ai.azure.com) · [Foundry Docs](https://learn.microsoft.com/azure/ai-foundry/) · [azd-ai-starter-basic template](https://github.com/Azure-Samples/azd-ai-starter-basic) diff --git a/skills/microsoft-foundry/quota/quota.md b/skills/microsoft-foundry/quota/quota.md new file mode 100644 index 00000000..57a8580f --- /dev/null +++ b/skills/microsoft-foundry/quota/quota.md @@ -0,0 +1,186 @@ +# Microsoft Foundry Quota Management + +Quota and capacity management for Microsoft Foundry. Quotas are **subscription + region** level. + +> ⚠️ **Important:** This is the **authoritative skill** for all Foundry quota operations. When a user asks about quota, capacity, TPM, PTU, quota errors, or deployment limits, **always invoke this skill** rather than using MCP tools (azure-quota, azure-documentation, azure-foundry) directly. This skill provides structured workflows and error handling that direct tool calls lack. + +> **Important:** All quota operations are **control plane (management)** operations. Use **Azure CLI commands** as the primary method. MCP tools are optional convenience wrappers around the same control plane APIs. + +## Quota Types + +| Type | Description | +|------|-------------| +| **TPM** | Tokens Per Minute, pay-per-token, subject to rate limits | +| **PTU** | Provisioned Throughput Units, monthly commitment, no rate limits | +| **Region** | Max capacity per region, shared across subscription | +| **Slots** | 10-20 deployment slots per resource | + +**When to use PTU:** Consistent high-volume production workloads where monthly commitment is cost-effective. + +--- + +Use this sub-skill when the user needs to: + +- **View quota usage** — check current TPM/PTU allocation and available capacity +- **Check quota limits** — show quota limits for a subscription, region, or model +- **Find optimal regions** — compare quota availability across regions for deployment +- **Plan deployments** — verify sufficient quota before deploying models +- **Request quota increases** — navigate quota increase process through Azure Portal +- **Troubleshoot deployment failures** — diagnose QuotaExceeded, InsufficientQuota, DeploymentLimitReached, 429 rate limit errors +- **Optimize allocation** — monitor and consolidate quota across deployments +- **Monitor quota across deployments** — track capacity by model and region +- **Explain quota concepts** — explain TPM, PTU, capacity units, regional quotas +- **Free up quota** — identify and delete unused deployments + +**Key Points:** +1. Isolated by region (East US ≠ West US) +2. Regional capacity varies by model +3. Multi-region enables failover and load distribution +4. Quota requests specify target region + +See [detailed guide](./references/workflows.md#regional-quota). + +--- + +## Core Workflows + +### 1. Check Regional Quota + +```bash +subId=$(az account show --query id -o tsv) +az rest --method get \ + --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/eastus/usages?api-version=2023-05-01" \ + --query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit}" -o table +``` + +**Output interpretation:** +- **Used**: Current TPM consumed (10000 = 10K TPM) +- **Limit**: Maximum TPM quota (15000 = 15K TPM) +- **Available**: Limit - Used (5K TPM available) + +Change region: `eastus`, `eastus2`, `westus`, `westus2`, `swedencentral`, `uksouth`. + +--- + +### 2. Find Best Region for Deployment + +Check specific regions for available quota: + +```bash +subId=$(az account show --query id -o tsv) +region="eastus" +az rest --method get \ + --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \ + --query "value[?name.value=='OpenAI.Standard.gpt-4o'].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table +``` + +See [workflows reference](./references/workflows.md#multi-region-check) for multi-region comparison. + +--- + +### 3. Check Quota Before Deployment + +Verify available quota for your target model: + +```bash +subId=$(az account show --query id -o tsv) +region="eastus" +model="OpenAI.Standard.gpt-4o" + +az rest --method get \ + --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \ + --query "value[?name.value=='$model'].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table +``` + +- **Available > 0**: Yes, you have quota +- **Available = 0**: Delete unused deployments or try different region + +--- + +### 4. Monitor Quota by Model + +Show quota allocation grouped by model: + +```bash +subId=$(az account show --query id -o tsv) +region="eastus" +az rest --method get \ + --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \ + --query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table +``` + +Shows aggregate usage across ALL deployments by model type. + +**Optional:** List individual deployments: +```bash +az cognitiveservices account list --query "[?kind=='AIServices'].{Name:name,RG:resourceGroup}" -o table + +az cognitiveservices account deployment list --name --resource-group \ + --query "[].{Name:name,Model:properties.model.name,Capacity:sku.capacity}" -o table +``` + +--- + +### 5. Delete Deployment (Free Quota) + +```bash +az cognitiveservices account deployment delete --name --resource-group \ + --deployment-name +``` + +Quota freed **immediately**. Re-run Workflow #1 to verify. + +--- + +### 6. Request Quota Increase + +**Azure Portal Process:** +1. Navigate to [Azure Portal - All Resources](https://portal.azure.com/#view/HubsExtension/BrowseAll) → Filter "AI Services" → Click resource +2. Select **Quotas** in left navigation +3. Click **Request quota increase** +4. Fill form: Model, Current Limit, Requested Limit, Region, **Business Justification** +5. Wait for approval: **3-5 business days typically, up to 10 business days** ([source](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/quota)) + +**Justification template:** +``` +Production [workload type] using [model] in [region]. +Expected traffic: [X requests/day] with [Y tokens/request]. +Requires [Z TPM] capacity. Current [N TPM] insufficient. +Request increase to [M TPM]. Deployment target: [date]. +``` + +See [detailed quota request guide](./references/workflows.md#request-quota-increase) for complete steps. + +--- + +## Quick Troubleshooting + +| Error | Quick Fix | Detailed Guide | +|-------|-----------|----------------| +| `QuotaExceeded` | Delete unused deployments or request increase | [Error Resolution](./references/error-resolution.md#quotaexceeded) | +| `InsufficientQuota` | Reduce capacity or try different region | [Error Resolution](./references/error-resolution.md#insufficientquota) | +| `DeploymentLimitReached` | Delete unused deployments (10-20 slot limit) | [Error Resolution](./references/error-resolution.md#deploymentlimitreached) | +| `429 Rate Limit` | Increase TPM or migrate to PTU | [Error Resolution](./references/error-resolution.md#429-errors) | + +--- + +## References + +**Detailed Guides:** +- [Error Resolution Workflows](./references/error-resolution.md) - Detailed workflows for quota exhausted, 429 errors, insufficient quota, deployment limits +- [Troubleshooting Guide](./references/troubleshooting.md) - Quick error fixes and diagnostic commands +- [Quota Optimization Strategies](./references/optimization.md) - 5 strategies for freeing quota and reducing costs +- [Capacity Planning Guide](./references/capacity-planning.md) - TPM vs PTU comparison, model selection, workload calculations +- [Workflows Reference](./references/workflows.md) - Complete workflow steps and multi-region checks +- [PTU Guide](./references/ptu-guide.md) - Provisioned throughput capacity planning + +**Official Microsoft Documentation:** +- [Azure OpenAI Service Pricing](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/) - Official pay-per-token rates +- [PTU Costs and Billing](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding) - PTU hourly rates +- [Azure OpenAI Models](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models) - Model capabilities and regions +- [Quota Management Guide](https://learn.microsoft.com/azure/ai-services/openai/how-to/quota) - Official quota procedures +- [Quotas and Limits](https://learn.microsoft.com/azure/ai-services/openai/quotas-limits) - Rate limits and quota details + +**Calculators:** +- [Azure Pricing Calculator](https://azure.microsoft.com/pricing/calculator/) - Official pricing estimator +- Azure AI Foundry PTU calculator (Microsoft Foundry → Operate → Quota → Provisioned Throughput Unit tab) - PTU capacity sizing diff --git a/skills/microsoft-foundry/quota/references/capacity-planning.md b/skills/microsoft-foundry/quota/references/capacity-planning.md new file mode 100644 index 00000000..029702a3 --- /dev/null +++ b/skills/microsoft-foundry/quota/references/capacity-planning.md @@ -0,0 +1,126 @@ +# Capacity Planning Guide + +Comprehensive guide for planning Azure AI Foundry capacity, including cost analysis, model selection, and workload calculations. + +**Table of Contents:** [Cost Comparison: TPM vs PTU](#cost-comparison-tpm-vs-ptu) · [Production Workload Examples](#production-workload-examples) · [Model Selection and Deployment Type Guidance](#model-selection-and-deployment-type-guidance) + +## Cost Comparison: TPM vs PTU + +> **Official Pricing Sources:** +> - [Azure OpenAI Service Pricing](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/) - Official pay-per-token rates +> - [PTU Costs and Billing Guide](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding) - PTU hourly rates and capacity planning + +**TPM (Standard) Pricing:** +- Pay-per-token for input/output +- No upfront commitment +- **Rates**: See [Azure OpenAI Pricing](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/) + - GPT-4o: ~$0.0025-$0.01/1K tokens + - GPT-4 Turbo: ~$0.01-$0.03/1K + - GPT-3.5 Turbo: ~$0.0005-$0.0015/1K +- **Best for**: Variable workloads, unpredictable traffic + +**PTU (Provisioned) Pricing:** +- Hourly billing: `$/PTU/hr × PTUs × 730 hrs/month` +- Monthly commitment with Reservations discounts +- **Rates**: See [PTU Billing Guide](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding) +- Use PTU calculator to determine requirements (Microsoft Foundry → Operate → Quota → Provisioned Throughput Unit tab) +- **Best for**: High-volume (>1M tokens/day), predictable traffic, guaranteed throughput + +**Cost Decision Framework** (Analytical Guidance): + +``` +Step 1: Calculate monthly TPM cost + Monthly TPM cost = (Daily tokens × 30 days × $price per 1K tokens) / 1000 + +Step 2: Calculate monthly PTU cost + Monthly PTU cost = Required PTUs × 730 hours/month × $PTU-hour rate + (Get Required PTUs from Azure AI Foundry portal: Microsoft Foundry → Operate → Quota → Provisioned Throughput Unit tab) + +Step 3: Compare + Use PTU when: Monthly PTU cost < (Monthly TPM cost × 0.7) + (Use 70% threshold to account for commitment risk) +``` + +**Example Calculation** (Analytical): + +Scenario: 1M requests/day, average 1,000 tokens per request + +- **Daily tokens**: 1,000,000 × 1,000 = 1B tokens/day +- **TPM Cost** (using GPT-4o at $0.005/1K avg): (1B × 30 × $0.005) / 1000 = ~$150,000/month +- **PTU Cost** (estimated 100 PTU at ~$5/PTU-hour): 100 PTU × 730 hours × $5 = ~$365,000/month +- **Decision**: Use TPM (significantly lower cost for this workload) + +> **Important**: Always use the official [Azure Pricing Calculator](https://azure.microsoft.com/pricing/calculator/) and Azure AI Foundry portal PTU calculator (Microsoft Foundry → Operate → Quota → Provisioned Throughput Unit tab) for exact pricing by model, region, and workload. Prices vary by region and are subject to change. + +--- + +## Production Workload Examples + +Real-world production scenarios with capacity calculations for gpt-4, version 0613 (from Azure Foundry Portal calculator): + +| Workload Type | Calls/Min | Prompt Tokens | Response Tokens | Cache Hit % | Total Tokens/Min | PTU Required | TPM Equivalent | +|---------------|-----------|---------------|-----------------|-------------|------------------|--------------|----------------| +| **RAG Chat** | 10 | 3,500 | 300 | 20% | 38,000 | 100 | 38K TPM | +| **Basic Chat** | 10 | 500 | 100 | 20% | 6,000 | 100 | 6K TPM | +| **Summarization** | 10 | 5,000 | 300 | 20% | 53,000 | 100 | 53K TPM | +| **Classification** | 10 | 3,800 | 10 | 20% | 38,100 | 100 | 38K TPM | + +**How to Calculate Your Needs:** + +1. **Determine your peak calls per minute**: Monitor or estimate maximum concurrent requests +2. **Measure token usage**: Average prompt size + response size +3. **Account for cache hits**: Prompt caching can reduce effective token count by 20-50% +4. **Calculate total tokens/min**: (Calls/min × (Prompt tokens + Response tokens)) × (1 - Cache %) +5. **Choose deployment type**: + - **TPM (Standard)**: Allocate 1.5-2× your calculated tokens/min for headroom + - **PTU (Provisioned)**: Use Azure AI Foundry portal PTU calculator for exact PTU count (Microsoft Foundry → Operate → Quota → Provisioned Throughput Unit tab) + +**Example Calculation (RAG Chat Production):** +- Peak: 10 calls/min +- Prompt: 3,500 tokens (context + question) +- Response: 300 tokens (answer) +- Cache: 20% hit rate (reduces prompt tokens by 20%) +- **Total TPM needed**: (10 × (3,500 × 0.8 + 300)) = 31,000 TPM +- **With 50% headroom**: 46,500 TPM → Round to **50K TPM deployment** + +**PTU Recommendation:** +For the combined workload (40 calls/min, 135K tokens/min total), use **200 PTU** (from calculator above). + +--- + +## Model Selection and Deployment Type Guidance + +> **Official Documentation:** +> - [Choose the Right AI Model for Your Workload](https://learn.microsoft.com/en-us/azure/architecture/ai-ml/guide/choose-ai-model) - Microsoft Architecture Center +> - [Azure OpenAI Models](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models) - Model capabilities, regions, and quotas +> - [Understanding Deployment Types](https://learn.microsoft.com/en-us/azure/ai-foundry/foundry-models/concepts/deployment-types) - Standard vs Provisioned guidance + +**Model Characteristics** (from [official Azure OpenAI documentation](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models)): + +| Model | Key Characteristics | Best For | +|-------|---------------------|----------| +| **GPT-4o** | Matches GPT-4 Turbo performance in English text/coding, superior in non-English and vision tasks. Cheaper and faster than GPT-4 Turbo. | Multimodal tasks, cost-effective general purpose, high-volume production workloads | +| **GPT-4 Turbo** | Superior reasoning capabilities, larger context window (128K tokens) | Complex reasoning tasks, long-context analysis | +| **GPT-3.5 Turbo** | Most cost-effective, optimized for chat and completions, fast response time | Simple tasks, customer service, high-volume low-cost scenarios | +| **GPT-4o mini** | Fastest response time, low latency | Latency-sensitive applications requiring immediate responses | +| **text-embedding-3-large** | Purpose-built for vector embeddings | RAG applications, semantic search, document similarity | + +**Deployment Type Selection** (from [official deployment types guide](https://learn.microsoft.com/en-us/azure/ai-foundry/foundry-models/concepts/deployment-types)): + +| Traffic Pattern | Recommended Deployment Type | Reason | +|-----------------|---------------------------|---------| +| **Variable, bursty traffic** | Standard or Global Standard (pay-per-token) | No commitment, pay only for usage | +| **Consistent high volume** | Provisioned types (PTU) | Reserved capacity, predictable costs | +| **Large batch jobs (non-time-sensitive)** | Global Batch or DataZone Batch | 50% cost savings vs Standard | +| **Low latency variance required** | Provisioned types | Guaranteed throughput, no rate limits | +| **No regional restrictions** | Global Standard or Global Provisioned | Access to best available capacity | + +**Capacity Planning Approach** (from [PTU onboarding guide](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding)): + +1. **Understand your TPM requirements**: Calculate expected tokens per minute based on workload +2. **Use the built-in capacity planner**: Available in Azure AI Foundry portal (Microsoft Foundry → Operate → Quota → Provisioned Throughput Unit tab) +3. **Input your metrics**: Enter input TPM and output TPM based on your workload characteristics +4. **Get PTU recommendation**: The calculator provides PTU allocation recommendation +5. **Compare costs**: Evaluate Standard (TPM) vs Provisioned (PTU) using the official pricing calculator + +> **Note**: Microsoft does not publish specific "X requests/day = Y TPM" recommendations as capacity requirements vary significantly based on prompt size, response length, cache hit rates, and model choice. Use the built-in capacity planner with your actual workload characteristics. diff --git a/skills/microsoft-foundry/quota/references/error-resolution.md b/skills/microsoft-foundry/quota/references/error-resolution.md new file mode 100644 index 00000000..3ecdef85 --- /dev/null +++ b/skills/microsoft-foundry/quota/references/error-resolution.md @@ -0,0 +1,145 @@ +# Error Resolution Workflows + +**Table of Contents:** [Workflow 7: Quota Exhausted Recovery](#workflow-7-quota-exhausted-recovery) · [Workflow 8: Resolve 429 Rate Limit Errors](#workflow-8-resolve-429-rate-limit-errors) · [Workflow 9: Resolve DeploymentLimitReached](#workflow-9-resolve-deploymentlimitreached) · [Workflow 10: Resolve InsufficientQuota](#workflow-10-resolve-insufficientquota) · [Workflow 11: Resolve QuotaExceeded](#workflow-11-resolve-quotaexceeded) + +## Workflow 7: Quota Exhausted Recovery + +**A. Deploy to Different Region** +```bash +subId=$(az account show --query id -o tsv) +for region in eastus westus eastus2 westus2 swedencentral uksouth; do + az rest --method get --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \ + --query "value[?name.value=='OpenAI.Standard.gpt-4o'].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table & +done; wait +``` + +**B. Delete Unused Deployments** +```bash +az cognitiveservices account deployment delete --name --resource-group --deployment-name +``` + +**C. Request Quota Increase (3-5 days)** + +**D. Migrate to PTU** - See capacity-planning.md + +--- + +## Workflow 8: Resolve 429 Rate Limit Errors + +**Identify Deployment:** +```bash +az cognitiveservices account deployment list --name --resource-group \ + --query "[].{Name:name,Model:properties.model.name,TPM:sku.capacity*1000}" -o table +``` + +**Solutions:** + +**A. Increase Capacity** +```bash +az cognitiveservices account deployment update --name --resource-group --deployment-name --sku-capacity 100 +``` + +**B. Add Retry Logic** - Exponential backoff in code + +**C. Load Balance** +```bash +az cognitiveservices account deployment create --name --resource-group --deployment-name gpt-4o-2 \ + --model-name gpt-4o --model-version "2024-05-13" --model-format OpenAI --sku-name Standard --sku-capacity 100 +``` + +**D. Migrate to PTU** - No rate limits + +--- + +## Workflow 9: Resolve DeploymentLimitReached + +**Root Cause:** 10-20 slots per resource. + +**Check Count:** +```bash +deployment_count=$(az cognitiveservices account deployment list --name --resource-group --query "length(@)") +echo "Deployments: $deployment_count / ~20 slots" +``` + +**Find Test Deployments:** +```bash +az cognitiveservices account deployment list --name --resource-group \ + --query "[?contains(name,'test') || contains(name,'demo')].{Name:name}" -o table +``` + +**Delete:** +```bash +az cognitiveservices account deployment delete --name --resource-group --deployment-name +``` + +**Or Create New Resource (fresh 10-20 slots):** +```bash +az cognitiveservices account create --name "my-foundry-2" --resource-group --location eastus --kind AIServices --sku S0 --yes +``` + +--- + +## Workflow 10: Resolve InsufficientQuota + +**Root Cause:** Requested capacity exceeds available quota. + +**Check Quota:** +```bash +subId=$(az account show --query id -o tsv) +az rest --method get --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/eastus/usages?api-version=2023-05-01" \ + --query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table +``` + +**Solutions:** + +**A. Reduce Capacity** +```bash +az cognitiveservices account deployment create --name --resource-group --deployment-name gpt-4o \ + --model-name gpt-4o --model-version "2024-05-13" --model-format OpenAI --sku-name Standard --sku-capacity 20 +``` + +**B. Delete Unused Deployments** +```bash +az cognitiveservices account deployment delete --name --resource-group --deployment-name +``` + +**C. Different Region** - Check quota with multi-region script (Workflow 7) + +**D. Request Increase (3-5 days)** + +--- + +## Workflow 11: Resolve QuotaExceeded + +**Root Cause:** Deployment exceeds regional quota. + +**Check Quota:** +```bash +subId=$(az account show --query id -o tsv) +az rest --method get --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/eastus/usages?api-version=2023-05-01" \ + --query "value[?contains(name.value,'OpenAI')]" -o table +``` + +**Multi-Region Check:** (Use Workflow 7 script) + +**Solutions:** + +**A. Delete Unused Deployments** +```bash +az cognitiveservices account deployment delete --name --resource-group --deployment-name +``` + +**B. Different Region** +```bash +az cognitiveservices account deployment create --name --resource-group --deployment-name gpt-4o \ + --model-name gpt-4o --model-version "2024-05-13" --model-format OpenAI --sku-name Standard --sku-capacity 50 +``` + +**C. Request Increase (3-5 days)** + +**D. Reduce Capacity** + +**Decision:** Available < 10% → Different region; 10-50% → Delete/reduce; > 50% → Delete one deployment + +--- + diff --git a/skills/microsoft-foundry/quota/references/optimization.md b/skills/microsoft-foundry/quota/references/optimization.md new file mode 100644 index 00000000..ea4dbd12 --- /dev/null +++ b/skills/microsoft-foundry/quota/references/optimization.md @@ -0,0 +1,168 @@ +# Quota Optimization Strategies + +Comprehensive strategies for optimizing Azure AI Foundry quota allocation and reducing costs. + +**Table of Contents:** [1. Identify and Delete Unused Deployments](#1-identify-and-delete-unused-deployments) · [2. Right-Size Over-Provisioned Deployments](#2-right-size-over-provisioned-deployments) · [3. Consolidate Multiple Small Deployments](#3-consolidate-multiple-small-deployments) · [4. Cost Optimization Strategies](#4-cost-optimization-strategies) · [5. Regional Quota Rebalancing](#5-regional-quota-rebalancing) + +## 1. Identify and Delete Unused Deployments + +**Step 1: Discovery with Quota Context** + +Get quota limits FIRST to understand how close you are to capacity: + +```bash +# Check current quota usage vs limits (run this FIRST) +subId=$(az account show --query id -o tsv) +region="eastus" # Change to your region +az rest --method get \ + --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \ + --query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit, Available:'(Limit - Used)'}" -o table +``` + +**Step 2: Parallel Deployment Enumeration** + +List all deployments across resources efficiently: + +```bash +# Get all Foundry resources +resources=$(az cognitiveservices account list --query "[?kind=='AIServices'].{name:name,rg:resourceGroup}" -o json) + +# Parallel deployment enumeration (faster than sequential) +echo "$resources" | jq -r '.[] | "\(.name) \(.rg)"' | while read name rg; do + echo "=== $name ($rg) ===" + az cognitiveservices account deployment list --name "$name" --resource-group "$rg" \ + --query "[].{Deployment:name,Model:properties.model.name,Capacity:sku.capacity,Created:systemData.createdAt}" -o table & +done +wait # Wait for all background jobs to complete +``` + +**Step 3: Identify Stale Deployments** + +Criteria for deletion candidates: + +- **Test/temporary naming**: Contains "test", "demo", "temp", "dev" in deployment name +- **Old timestamps**: Created >90 days ago with timestamp-based naming (e.g., "gpt4-20231015") +- **High capacity consumers**: Deployments with >100K TPM capacity that haven't been referenced in recent logs +- **Duplicate models**: Multiple deployments of same model/version in same region + +**Example pattern matching for stale deployments:** +```bash +# Find deployments with test/temp naming +az cognitiveservices account deployment list --name --resource-group \ + --query "[?contains(name,'test') || contains(name,'demo') || contains(name,'temp')].{Name:name,Capacity:sku.capacity}" -o table +``` + +**Step 4: Delete and Verify Quota Recovery** + +```bash +# Delete unused deployment (quota freed IMMEDIATELY) +az cognitiveservices account deployment delete --name --resource-group --deployment-name + +# Verify quota freed (re-run Step 1 quota check) +# You should see "Used" decrease by the deployment's capacity +``` + +**Cost Impact Analysis:** + +| Deployment Type | Capacity (TPM) | Quota Freed | Cost Impact (TPM) | Cost Impact (PTU) | +|-----------------|----------------|-------------|-------------------|-------------------| +| Test deployment | 10K TPM | 10K TPM | $0 (pay-per-use) | N/A | +| Unused production | 100K TPM | 100K TPM | $0 (pay-per-use) | N/A | +| Abandoned PTU deployment | 100 PTU | ~40K TPM equivalent | $0 TPM | **$3,650/month saved** (100 PTU × 730h × $0.05/h) | +| High-capacity test | 450K TPM | 450K TPM | $0 (pay-per-use) | N/A | + +**Key Insight:** For TPM (Standard) deployments, deletion frees quota but has no direct cost impact (you pay per token used). For PTU (Provisioned) deployments, deletion **immediately stops hourly charges** and can save thousands per month. + +--- + +## 2. Right-Size Over-Provisioned Deployments + +**Identify over-provisioned deployments:** +- Check Azure Monitor metrics for actual token usage +- Compare allocated TPM vs. peak usage +- Look for deployments with <50% utilization + +**Right-sizing example:** +```bash +# Update deployment to lower capacity +az cognitiveservices account deployment update --name --resource-group \ + --deployment-name --sku-capacity 30 # Reduce from 50K to 30K TPM +``` + +**Cost Optimization:** +- **TPM (Standard)**: Reduces regional quota consumption (no direct cost savings, pay-per-token) +- **PTU (Provisioned)**: Direct cost reduction (40% capacity reduction = 40% cost reduction) + +--- + +## 3. Consolidate Multiple Small Deployments + +**Pattern:** Multiple 10K TPM deployments → One 30-50K TPM deployment + +**Benefits:** +- Fewer deployment slots consumed +- Simpler management +- Same total capacity, better utilization + +**Example:** +- **Before**: 3 deployments @ 10K TPM each = 30K TPM total, 3 slots used +- **After**: 1 deployment @ 30K TPM = 30K TPM total, 1 slot used +- **Savings**: 2 deployment slots freed for other models + +--- + +## 4. Cost Optimization Strategies + +> **Official Documentation**: [Plan to manage costs for Azure OpenAI](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/manage-costs) and [Fine-tuning cost management](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/fine-tuning-cost-management) + +**A. Use Fine-Tuned Smaller Models** (from [Microsoft Transparency Note](https://learn.microsoft.com/en-us/azure/ai-foundry/responsible-ai/openai/transparency-note)): + +You can reduce costs or latency by swapping a fine-tuned version of a smaller/faster model (e.g., fine-tuned GPT-3.5-Turbo) for a more general-purpose model (e.g., GPT-4). + +```bash +# Deploy fine-tuned GPT-3.5 Turbo as cost-effective alternative to GPT-4 +az cognitiveservices account deployment create --name --resource-group \ + --deployment-name gpt-35-tuned --model-name \ + --model-format OpenAI --sku-name Standard --sku-capacity 10 +``` + +**B. Remove Unused Fine-Tuned Deployments** (from [Fine-tuning cost management](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/fine-tuning-cost-management)): + +Fine-tuned model deployments incur **hourly hosting costs** even when not in use. Remove unused deployments promptly to control costs. + +- Inactive deployments unused for **15 consecutive days** are automatically deleted +- Proactively delete unused fine-tuned deployments to avoid hourly charges + +```bash +# Delete unused fine-tuned deployment +az cognitiveservices account deployment delete --name --resource-group \ + --deployment-name +``` + +**C. Batch Multiple Requests** (from [Cost optimization Q&A](https://learn.microsoft.com/en-us/answers/questions/1689253/how-to-optimize-costs-per-request-azure-openai-gpt)): + +Batch multiple requests together to reduce the total number of API calls and lower overall costs. + +**D. Use Commitment Tiers for Predictable Costs** (from [Managing costs guide](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/manage-costs)): + +- **Pay-as-you-go**: Bills according to usage (variable costs) +- **Commitment tiers**: Commit to using service features for a fixed fee (predictable costs, potential savings for consistent usage) + +--- + +## 5. Regional Quota Rebalancing + +If you have quota spread across multiple regions but only use some: + +```bash +# Check quota across regions +for region in eastus westus uksouth; do + echo "=== $region ===" + subId=$(az account show --query id -o tsv) + az rest --method get \ + --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \ + --query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit}" -o table +done +``` + +**Optimization:** Concentrate deployments in fewer regions to maximize quota utilization per region. diff --git a/skills/microsoft-foundry/quota/references/ptu-guide.md b/skills/microsoft-foundry/quota/references/ptu-guide.md new file mode 100644 index 00000000..6dec8b8d --- /dev/null +++ b/skills/microsoft-foundry/quota/references/ptu-guide.md @@ -0,0 +1,152 @@ +# Provisioned Throughput Units (PTU) Guide + +**Table of Contents:** [Understanding PTU vs Standard TPM](#understanding-ptu-vs-standard-tpm) · [When to Use PTU](#when-to-use-ptu) · [PTU Capacity Planning](#ptu-capacity-planning) · [Deploy Model with PTU](#deploy-model-with-ptu) · [Request PTU Quota Increase](#request-ptu-quota-increase) · [Understanding Region and Deployment Quotas](#understanding-region-and-deployment-quotas) · [External Resources](#external-resources) + +## Understanding PTU vs Standard TPM + +Microsoft Foundry offers two quota types: + +### Standard TPM (Tokens Per Minute) +- Pay-as-you-go model, charged per token +- Each deployment consumes capacity units (e.g., 10K TPM, 50K TPM) +- Total regional quota shared across all deployments +- Subject to rate limiting during high demand (429 errors possible) +- Best for: Variable workloads, development, testing, bursty traffic + +### Provisioned Throughput Units (PTU) +- Monthly commitment for guaranteed throughput +- No rate limiting, consistent latency +- Measured in PTU units (not TPM) +- Best for: Predictable, high-volume production workloads +- More cost-effective when consistent token usage justifies monthly commitment + +## When to Use PTU + +| Factor | Standard (TPM) | Provisioned (PTU) | +|--------|----------------|-------------------| +| **Best For** | Variable workloads, development, testing | Predictable production workloads | +| **Pricing** | Pay-per-token | Monthly commitment (hourly rate per PTU) | +| **Rate Limits** | Yes (429 errors possible) | No (guaranteed throughput) | +| **Latency** | Variable | Consistent | +| **Cost Decision** | Lower upfront commitment | More economical for consistent, high-volume usage | +| **Flexibility** | Scale up/down instantly | Requires planning and commitment | +| **Use Case** | Prototyping, bursty traffic | Production apps, high-volume APIs | + +**Use PTU when:** +- Consistent, predictable token usage where monthly commitment is cost-effective +- Need guaranteed throughput (no 429 rate limit errors) +- Require consistent latency with performance SLA +- High-volume production workloads with stable traffic patterns + +**Decision Guidance:** +Compare your current pay-as-you-go costs with PTU pricing. PTU may be more economical when consistent usage justifies the monthly commitment. + +## PTU Capacity Planning + +### Official Calculation Methods + +> **Agent Instruction:** Only present official Azure capacity calculator methods below. Do NOT generate or suggest estimated PTU formulas, TPM-per-PTU conversion tables, or reference deprecated calculators (oai.azure.com/portal/calculator). + +Calculate PTU requirements using these official methods: + +**Method 1: Microsoft Foundry Portal** +1. Navigate to Microsoft Foundry portal +2. Go to **Operate** → **Quota** +3. Select **Provisioned throughput unit** tab +4. Click **Capacity calculator** button +5. Enter workload parameters (model, tokens/call, RPM, latency target) +6. Calculator returns exact PTU count needed + +**Method 2: Using Azure REST API** +```bash +# Calculate required PTU capacity +curl -X POST "https://management.azure.com/subscriptions//providers/Microsoft.CognitiveServices/calculateModelCapacity?api-version=2024-10-01" \ + -H "Authorization: Bearer " \ + -H "Content-Type: application/json" \ + -d '{ + "model": { + "format": "OpenAI", + "name": "gpt-4o", + "version": "2024-05-13" + }, + "workload": { + "requestPerMin": 100, + "tokensPerMin": 50000, + "peakRequestsPerMin": 150 + } + }' +``` + +## Deploy Model with PTU + +### Step 1: Calculate PTU Requirements + +Use the official capacity calculator methods above to determine required PTU capacity. + +### Step 2: Deploy with PTU + +```bash +# Deploy model with calculated PTU capacity +az cognitiveservices account deployment create \ + --name \ + --resource-group \ + --deployment-name gpt-4o-ptu-deployment \ + --model-name gpt-4o \ + --model-version "2024-05-13" \ + --model-format OpenAI \ + --sku-name ProvisionedManaged \ + --sku-capacity 100 + +# Check PTU deployment status +az cognitiveservices account deployment show \ + --name \ + --resource-group \ + --deployment-name gpt-4o-ptu-deployment +``` + +**Key Differences from Standard TPM:** +- SKU name: `ProvisionedManaged` (not `Standard`) +- Capacity: Measured in PTU units (not K TPM) +- Billing: Monthly commitment regardless of usage +- No rate limiting (guaranteed throughput) + +## Request PTU Quota Increase + +PTU quota is separate from TPM quota and requires specific justification: + +1. Navigate to Azure Portal → Foundry resource → **Quotas** +2. Select **Provisioned throughput unit** tab +3. Identify model needing PTU increase (e.g., "GPT-4o PTU") +4. Click **Request quota increase** +5. Fill form: + - Model name + - Requested PTU quota + - Include capacity calculator results in business justification + - Explain workload characteristics (volume, latency requirements) +6. Submit and monitor status + +**Processing Time:** Typically 3-5 business days (longer than standard quota requests) +**Note:** PTU quota requests typically require stronger business justification due to commitment nature + +**Alternative:** Deploy to different region with available PTU quota + +## Understanding Region and Deployment Quotas + +### Region Quota +- Maximum PTU capacity available in an Azure region +- Varies by model type (GPT-4, GPT-4o, etc.) +- Shared across subscription resources in same region +- Separate from TPM quota (you have both TPM and PTU quotas) + +### Deployment Slots +- Number of concurrent model deployments allowed +- Typically 10-20 slots per resource +- Each PTU deployment uses one slot (same as TPM deployments) +- Deployment count limit is independent of capacity + +## External Resources + +- [Understanding PTU Costs](https://learn.microsoft.com/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding) +- [What Is Provisioned Throughput](https://learn.microsoft.com/azure/ai-foundry/openai/concepts/provisioned-throughput) +- [Calculate Model Capacity API](https://learn.microsoft.com/rest/api/aiservices/accountmanagement/calculate-model-capacity/calculate-model-capacity?view=rest-aiservices-accountmanagement-2024-10-01&tabs=HTTP) +- [PTU Overview](https://learn.microsoft.com/azure/ai-services/openai/concepts/provisioned-throughput) diff --git a/skills/microsoft-foundry/quota/references/troubleshooting.md b/skills/microsoft-foundry/quota/references/troubleshooting.md new file mode 100644 index 00000000..8f88752c --- /dev/null +++ b/skills/microsoft-foundry/quota/references/troubleshooting.md @@ -0,0 +1,211 @@ +# Troubleshooting Quota Errors + +**Table of Contents:** [Common Quota Errors](#common-quota-errors) · [Detailed Error Resolution](#detailed-error-resolution) · [Request Quota Increase Process](#request-quota-increase-process) · [Diagnostic Commands](#diagnostic-commands) · [External Resources](#external-resources) + +## Common Quota Errors + +| Error | Cause | Quick Fix | +|-------|-------|-----------| +| `QuotaExceeded` | Regional quota consumed (TPM or PTU) | Delete unused deployments or request increase | +| `InsufficientQuota` | Not enough available for requested capacity | Reduce deployment capacity or free quota | +| `DeploymentLimitReached` | Too many deployment slots used | Delete unused deployments to free slots | +| `429 Rate Limit` | TPM capacity too low for traffic (Standard only) | Increase TPM capacity or migrate to PTU | +| `PTU capacity unavailable` | No PTU quota in region | Request PTU quota or try different region | +| `SKU not supported` | PTU not available for model/region | Check model availability or use Standard TPM | + +## Detailed Error Resolution + +### QuotaExceeded Error + +All available TPM or PTU quota consumed in the region. + +**Resolution:** + +1. **Check current quota usage:** + ```bash + subId=$(az account show --query id -o tsv) + region="eastus" + az rest --method get \ + --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \ + --query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit}" -o table + ``` + +2. **Choose resolution:** + - **Option A**: Delete unused deployments to free quota + - **Option B**: Reduce requested deployment capacity + - **Option C**: Deploy to different region with available quota + - **Option D**: Request quota increase through Azure Portal + +### InsufficientQuota Error + +Available quota less than requested capacity. + +**Resolution:** + +1. **Check available quota:** + ```bash + # Calculate available: limit - currentValue + subId=$(az account show --query id -o tsv) + region="eastus" + az rest --method get \ + --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \ + --query "value[?name.value=='OpenAI.Standard.gpt-4o'].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table + ``` + +2. **Options:** + - Reduce deployment capacity to fit available quota + - Delete existing deployments to free capacity + - Try different region with more available quota + - Request quota increase + +### DeploymentLimitReached Error + +Resource reached maximum deployment slot limit (10-20 slots). + +**Resolution:** + +1. **List existing deployments:** + ```bash + az cognitiveservices account deployment list \ + --name \ + --resource-group \ + --query '[].{Name:name, Model:properties.model.name, Capacity:sku.capacity}' \ + --output table + ``` + +2. **Delete unused deployments:** + ```bash + az cognitiveservices account deployment delete \ + --name \ + --resource-group \ + --deployment-name + ``` + +3. **Verify slot freed:** + ```bash + az cognitiveservices account deployment list \ + --name \ + --resource-group \ + --query 'length([])' + ``` + +### 429 Rate Limit Errors + +TPM capacity insufficient for traffic volume (Standard TPM only). + +**Resolution:** + +1. **Check deployment capacity:** + ```bash + az cognitiveservices account deployment show \ + --name \ + --resource-group \ + --deployment-name \ + --query '{Name:name, Model:properties.model.name, Capacity:sku.capacity, SKU:sku.name}' + ``` + +2. **Options:** + - **Option A**: Increase TPM capacity on existing deployment + ```bash + az cognitiveservices account deployment update \ + --name \ + --resource-group \ + --deployment-name \ + --sku-capacity + ``` + - **Option B**: Migrate to PTU for guaranteed throughput (no rate limits) + - **Option C**: Implement retry logic with exponential backoff in application + +### PTU Capacity Unavailable Error + +No PTU quota allocated in region, or PTU not available for model/region. + +**Resolution:** + +1. **Check PTU quota:** + ```bash + subId=$(az account show --query id -o tsv) + region="eastus" + az rest --method get \ + --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \ + --query "value[?contains(name.value,'ProvisionedManaged')].{Model:name.value, Used:currentValue, Limit:limit}" -o table + ``` + +2. **Options:** + - Request PTU quota increase through Azure Portal (include capacity calculator results) + - Try different region where PTU is available + - Use Standard TPM instead + +### SKU Not Supported Error + +PTU not available for specific model or region combination. + +**Resolution:** + +1. **Check model availability:** + - Review [PTU model availability by region](https://learn.microsoft.com/azure/ai-services/openai/concepts/models#provisioned-deployment-model-availability) + +2. **Options:** + - Deploy with Standard TPM SKU instead + - Choose different region where PTU is supported + - Use alternative model that supports PTU in your region + +## Request Quota Increase Process + +### For Standard TPM Quota + +1. Navigate to Azure Portal → Your Foundry resource → **Quotas** +2. Identify model needing increase (e.g., "GPT-4o Standard") +3. Click **Request quota increase** +4. Fill form: + - Model name + - Requested quota (in TPM) + - Business justification (required) +5. Submit and monitor status + +**Processing Time:** Typically 1-2 business days + +### For PTU Quota + +1. Navigate to Azure Portal → Your Foundry resource → **Quotas** +2. Select **Provisioned throughput unit** tab +3. Identify model needing PTU increase +4. Click **Request quota increase** +5. Fill form: + - Model name + - Requested PTU quota + - Include capacity calculator results + - Detailed business justification (workload characteristics) +6. Submit and monitor status + +**Processing Time:** Typically 3-5 business days (requires stronger justification) + +## Diagnostic Commands + +```bash +# Check deployment status +az cognitiveservices account deployment show \ + --name \ + --resource-group \ + --deployment-name + +# Verify available quota +subId=$(az account show --query id -o tsv) +az rest --method get \ + --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/eastus/usages?api-version=2023-05-01" \ + --query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" \ + --output table + +# List all deployments +az cognitiveservices account deployment list \ + --name \ + --resource-group \ + --query '[].{Name:name, Model:properties.model.name, Capacity:sku.capacity, SKU:sku.name}' \ + --output table +``` + +## External Resources + +- [Quota Management Documentation](https://learn.microsoft.com/azure/ai-services/openai/how-to/quota) +- [Rate Limits Documentation](https://learn.microsoft.com/azure/ai-services/openai/quotas-limits) +- [Troubleshooting Guide](https://learn.microsoft.com/azure/ai-services/openai/troubleshooting) diff --git a/skills/microsoft-foundry/quota/references/workflows.md b/skills/microsoft-foundry/quota/references/workflows.md new file mode 100644 index 00000000..74ef6319 --- /dev/null +++ b/skills/microsoft-foundry/quota/references/workflows.md @@ -0,0 +1,176 @@ +# Detailed Workflows: Quota Management + +**Table of Contents:** [Workflow 1: View Current Quota Usage](#workflow-1-view-current-quota-usage---detailed-steps) · [Workflow 2: Find Best Region for Model Deployment](#workflow-2-find-best-region-for-model-deployment---detailed-steps) · [Workflow 3: Check Quota Before Deployment](#workflow-3-check-quota-before-deployment---detailed-steps) · [Workflow 4: Monitor Quota Across Deployments](#workflow-4-monitor-quota-across-deployments---detailed-steps) · [Quick Command Reference](#quick-command-reference) · [MCP Tools Reference](#mcp-tools-reference-optional-wrappers) + +## Workflow 1: View Current Quota Usage - Detailed Steps + +### Step 1: Show Regional Quota Summary (REQUIRED APPROACH) + +> **CRITICAL AGENT INSTRUCTION:** +> - When showing quota: Query REGIONAL quota summary, NOT individual resources +> - DO NOT run `az cognitiveservices account list` for quota queries +> - DO NOT filter resources by username or name patterns +> - ONLY check specific resource deployments if user provides resource name +> - Quotas are managed at SUBSCRIPTION + REGION level, NOT per-resource + +**Show Regional Quota Summary:** + +```bash +# Get subscription ID +subId=$(az account show --query id -o tsv) + +# Check quota for key regions +regions=("eastus" "eastus2" "westus" "westus2") +for region in "${regions[@]}"; do + echo "=== Region: $region ===" + az rest --method get \ + --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \ + --query "value[?contains(name.value,'OpenAI.Standard')].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" \ + --output table + echo "" +done +``` + +### Step 2: If User Asks for Specific Resource (ONLY IF EXPLICITLY REQUESTED) + +```bash +# User must provide resource name +az cognitiveservices account deployment list \ + --name \ + --resource-group \ + --query '[].{Name:name, Model:properties.model.name, Capacity:sku.capacity, SKU:sku.name}' \ + --output table +``` + +**Alternative - Use MCP Tools (Optional Wrappers):** +``` +foundry_models_deployments_list( + resource-group="", + azure-ai-services="" +) +``` +*Note: MCP tools are convenience wrappers around the same control plane APIs shown above.* + +**Interpreting Results:** +- `Used` (currentValue): Currently allocated quota +- `Limit`: Maximum quota available in region +- `Available`: Calculated as `limit - currentValue` + +## Workflow 2: Find Best Region for Model Deployment - Detailed Steps + +### Step 1: Check Single Region + +```bash +# Get subscription ID +subId=$(az account show --query id -o tsv) + +# Check quota for GPT-4o Standard in a specific region +region="eastus" # Change to your target region +az rest --method get \ + --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \ + --query "value[?name.value=='OpenAI.Standard.gpt-4o'].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" \ + -o table +``` + +### Step 2: Check Multiple Regions (Common Regions) + +Check these regions in sequence by changing the `region` variable: +- `eastus`, `eastus2` - US East Coast +- `westus`, `westus2`, `westus3` - US West Coast +- `swedencentral` - Europe (Sweden) +- `canadacentral` - Canada +- `uksouth` - UK +- `japaneast` - Asia Pacific + +**Alternative - Use MCP Tool:** +``` +model_quota_list(region="eastus") +``` +Repeat for each target region. + +**Key Points:** +- Query returns `currentValue` (used), `limit` (max), and calculated `Available` +- Standard SKU format: `OpenAI.Standard.` +- For PTU: `OpenAI.ProvisionedManaged.` +- Focus on 2-3 regions relevant to your location rather than checking all regions + +## Workflow 3: Check Quota Before Deployment - Detailed Steps + +**Steps:** +1. Check current usage (workflow #1) +2. Calculate available: `limit - currentValue` +3. Compare: `available >= required_capacity` +4. If insufficient: Use workflow #2 to find region with capacity, or request increase + +## Workflow 4: Monitor Quota Across Deployments - Detailed Steps + +**Recommended Approach - Regional Quota Overview:** + +Show quota by region (better than listing all resources): + +```bash +subId=$(az account show --query id -o tsv) +regions=("eastus" "eastus2" "westus" "westus2" "swedencentral") + +for region in "${regions[@]}"; do + echo "=== Region: $region ===" + az rest --method get \ + --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \ + --query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" \ + --output table + echo "" +done +``` + +**Alternative - Check Specific Resource:** + +If user wants to monitor a specific resource, ask for resource name first: + +```bash +# List deployments for specific resource +az cognitiveservices account deployment list \ + --name \ + --resource-group \ + --query '[].{Name:name, Model:properties.model.name, Capacity:sku.capacity}' \ + --output table +``` + +> **Note:** Don't automatically iterate through all resources in the subscription. Show regional quota summary or ask for specific resource name. + +## Quick Command Reference + +```bash +# View quota for specific model using REST API +subId=$(az account show --query id -o tsv) +region="eastus" # Change to your region +az rest --method get \ + --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \ + --query "value[?contains(name.value,'gpt-4')].{Name:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" \ + --output table + +# List all deployments with capacity +az cognitiveservices account deployment list \ + --name \ + --resource-group \ + --query '[].{Name:name, Model:properties.model.name, Capacity:sku.capacity}' \ + --output table + +# Delete deployment to free quota +az cognitiveservices account deployment delete \ + --name \ + --resource-group \ + --deployment-name +``` + +## MCP Tools Reference (Optional Wrappers) + +**Note:** All quota operations are control plane (management) operations. MCP tools are optional convenience wrappers around Azure CLI commands. + +| Tool | Purpose | Equivalent Azure CLI | +|------|---------|---------------------| +| `foundry_models_deployments_list` | List all deployments with capacity | `az cognitiveservices account deployment list` | +| `model_quota_list` | List quota and usage across regions | `az rest` (Management API) | +| `model_catalog_list` | List available models from catalog | `az rest` (Management API) | +| `foundry_resource_get` | Get resource details and endpoint | `az cognitiveservices account show` | + +**Recommended:** Use Azure CLI commands directly for control plane operations. diff --git a/skills/microsoft-foundry/rbac/rbac.md b/skills/microsoft-foundry/rbac/rbac.md new file mode 100644 index 00000000..9d0930eb --- /dev/null +++ b/skills/microsoft-foundry/rbac/rbac.md @@ -0,0 +1,156 @@ +# Microsoft Foundry RBAC Management + +Reference for managing RBAC for Microsoft Foundry resources: user permissions, managed identity configuration, and service principal setup for CI/CD. + +## Quick Reference + +| Property | Value | +|----------|-------| +| **CLI Extension** | `az role assignment`, `az ad sp` | +| **Resource Type** | `Microsoft.CognitiveServices/accounts` | +| **Best For** | Permission management, access auditing, CI/CD setup | + +## When to Use + +- Grant user access to Foundry resources or projects +- Set up developer permissions (Project Manager, Owner roles) +- Audit role assignments or validate permissions +- Configure managed identity roles for connected resources +- Create service principals for CI/CD pipeline automation +- Troubleshoot permission errors + +## Azure AI Foundry Built-in Roles + +| Role | Create Projects | Data Actions | Role Assignments | +|------|-----------------|--------------|------------------| +| Azure AI User | No | Yes | No | +| Azure AI Project Manager | Yes | Yes | Yes (AI User only) | +| Azure AI Account Owner | Yes | No | Yes (AI User only) | +| Azure AI Owner | Yes | Yes | Yes | + +> ⚠️ **Warning:** Azure AI User is auto-assigned via Portal but NOT via SDK/CLI. Automation must explicitly assign roles. + +## Workflows + +All scopes follow the pattern: `/subscriptions//resourceGroups//providers/Microsoft.CognitiveServices/accounts/` + +For project-level scoping, append `/projects/`. + +### 1. Assign User Permissions + +```bash +az role assignment create --role "Azure AI User" --assignee "" --scope "" +``` + +### 2. Assign Developer Permissions + +```bash +# Project Manager (create projects, assign AI User roles) +az role assignment create --role "Azure AI Project Manager" --assignee "" --scope "" + +# Full ownership including data actions +az role assignment create --role "Azure AI Owner" --assignee "" --scope "" +``` + +### 3. Audit Role Assignments + +```bash +# List all assignments +az role assignment list --scope "" --output table + +# Detailed with principal names +az role assignment list --scope "" --query "[].{Principal:principalName, PrincipalType:principalType, Role:roleDefinitionName}" --output table + +# Azure AI roles only +az role assignment list --scope "" --query "[?contains(roleDefinitionName, 'Azure AI')].{Principal:principalName, Role:roleDefinitionName}" --output table +``` + +### 4. Validate Permissions + +```bash +# Current user's roles on resource +az role assignment list --assignee "$(az ad signed-in-user show --query id -o tsv)" --scope "" --query "[].roleDefinitionName" --output tsv + +# Check actions available to a role +az role definition list --name "Azure AI User" --query "[].permissions[].actions" --output json +``` + +**Permission Requirements by Action:** + +| Action | Required Role(s) | +|--------|------------------| +| Deploy models | Azure AI User, Azure AI Project Manager, Azure AI Owner | +| Create projects | Azure AI Project Manager, Azure AI Account Owner, Azure AI Owner | +| Assign Azure AI User role | Azure AI Project Manager, Azure AI Account Owner, Azure AI Owner | +| Full data access | Azure AI User, Azure AI Project Manager, Azure AI Owner | + +### 5. Configure Managed Identity Roles + +```bash +# Get managed identity principal ID +PRINCIPAL_ID=$(az cognitiveservices account show --name --resource-group --query identity.principalId --output tsv) + +# Assign roles to connected resources (repeat pattern for each) +az role assignment create --role "" --assignee "$PRINCIPAL_ID" --scope "" +``` + +**Common Managed Identity Role Assignments:** + +| Connected Resource | Role | Purpose | +|--------------------|------|---------| +| Azure Storage | Storage Blob Data Reader | Read files/documents | +| Azure Storage | Storage Blob Data Contributor | Read/write files | +| Azure Key Vault | Key Vault Secrets User | Read secrets | +| Azure AI Search | Search Index Data Reader | Query indexes | +| Azure AI Search | Search Index Data Contributor | Query and modify indexes | +| Azure Cosmos DB | Cosmos DB Account Reader | Read data | + +### 6. Create Service Principal for CI/CD + +```bash +# Create SP with minimal role +az ad sp create-for-rbac --name "foundry-cicd-sp" --role "Azure AI User" --scopes "" --output json +# Output contains: appId, password, tenant — store securely + +# For project management permissions +az ad sp create-for-rbac --name "foundry-cicd-admin-sp" --role "Azure AI Project Manager" --scopes "" --output json + +# Add Contributor for resource provisioning +SP_APP_ID=$(az ad sp list --display-name "foundry-cicd-sp" --query "[0].appId" -o tsv) +az role assignment create --role "Contributor" --assignee "$SP_APP_ID" --scope "/subscriptions//resourceGroups/" +``` + +> 💡 **Tip:** Use least privilege — start with `Azure AI User` and add roles as needed. + +| CI/CD Scenario | Recommended Role | Additional Roles | +|----------------|------------------|------------------| +| Deploy models only | Azure AI User | None | +| Manage projects | Azure AI Project Manager | None | +| Full provisioning | Azure AI Owner | Contributor (on RG) | +| Read-only monitoring | Reader | Azure AI User (for data) | + +**CI/CD Pipeline Login:** + +```bash +az login --service-principal --username "" --password "" --tenant "" +az account set --subscription "" +``` + +## Error Handling + +| Issue | Cause | Resolution | +|-------|-------|------------| +| "Authorization failed" when deploying | Missing Azure AI User role | Assign Azure AI User role at resource scope | +| Cannot create projects | Missing Project Manager or Owner role | Assign Azure AI Project Manager role | +| "Access denied" on connected resources | Managed identity missing roles | Assign appropriate roles to MI on each resource | +| Portal works but CLI fails | Portal auto-assigns roles, CLI doesn't | Explicitly assign Azure AI User via CLI | +| Service principal cannot access data | Wrong role or scope | Verify Azure AI User is assigned at correct scope | +| "Principal does not exist" | User/SP not found in directory | Verify the assignee email or object ID is correct | +| Role assignment already exists | Duplicate assignment attempt | Use `az role assignment list` to verify existing assignments | + +## Additional Resources + +- [Azure AI Foundry RBAC Documentation](https://learn.microsoft.com/azure/ai-foundry/concepts/rbac-ai-foundry) +- [Azure Built-in Roles](https://learn.microsoft.com/azure/role-based-access-control/built-in-roles) +- [Managed Identities Overview](https://learn.microsoft.com/azure/active-directory/managed-identities-azure-resources/overview) +- [Service Principal Authentication](https://learn.microsoft.com/azure/developer/github/connect-from-azure) diff --git a/skills/microsoft-foundry/references/auth-best-practices.md b/skills/microsoft-foundry/references/auth-best-practices.md new file mode 100644 index 00000000..a2ca1976 --- /dev/null +++ b/skills/microsoft-foundry/references/auth-best-practices.md @@ -0,0 +1,130 @@ +# Azure Authentication Best Practices + +> Source: [Microsoft — Passwordless connections for Azure services](https://learn.microsoft.com/azure/developer/intro/passwordless-overview) and [Azure Identity client libraries](https://learn.microsoft.com/dotnet/azure/sdk/authentication/). + +**Table of Contents:** [Golden Rule](#golden-rule) · [Authentication by Environment](#authentication-by-environment) · [Why Not DefaultAzureCredential in Production?](#why-not-defaultazurecredential-in-production) · [Production Patterns](#production-patterns) · [Local Development Setup](#local-development-setup) · [Environment-Aware Pattern](#environment-aware-pattern) · [Security Checklist](#security-checklist) · [Further Reading](#further-reading) + +## Golden Rule + +Use **managed identities** and **Azure RBAC** in production. Reserve `DefaultAzureCredential` for **local development only**. + +## Authentication by Environment + +| Environment | Recommended Credential | Why | +|---|---|---| +| **Production (Azure-hosted)** | `ManagedIdentityCredential` (system- or user-assigned) | No secrets to manage; auto-rotated by Azure | +| **Production (on-premises)** | `ClientCertificateCredential` or `WorkloadIdentityCredential` | Deterministic; no fallback chain overhead | +| **CI/CD pipelines** | `AzurePipelinesCredential` / `WorkloadIdentityCredential` | Scoped to pipeline identity | +| **Local development** | `DefaultAzureCredential` | Chains CLI, PowerShell, and VS Code credentials for convenience | + +## Why Not `DefaultAzureCredential` in Production? + +1. **Unpredictable fallback chain** — walks through multiple credential types, adding latency and making failures harder to diagnose. +2. **Broad surface area** — checks environment variables, CLI tokens, and other sources that should not exist in production. +3. **Non-deterministic** — which credential actually authenticates depends on the environment, making behavior inconsistent across deployments. +4. **Performance** — each failed credential attempt adds network round-trips before falling back to the next. + +## Production Patterns + +### .NET + +```csharp +using Azure.Identity; + +var credential = Environment.GetEnvironmentVariable("AZURE_FUNCTIONS_ENVIRONMENT") == "Development" + ? new DefaultAzureCredential() // local dev — uses CLI/VS credentials + : new ManagedIdentityCredential(); // production — deterministic, no fallback chain +// For user-assigned identity: new ManagedIdentityCredential("") +``` + +### TypeScript / JavaScript + +```typescript +import { DefaultAzureCredential, ManagedIdentityCredential } from "@azure/identity"; + +const credential = process.env.NODE_ENV === "development" + ? new DefaultAzureCredential() // local dev — uses CLI/VS credentials + : new ManagedIdentityCredential(); // production — deterministic, no fallback chain +// For user-assigned identity: new ManagedIdentityCredential("") +``` + +### Python + +```python +import os +from azure.identity import DefaultAzureCredential, ManagedIdentityCredential + +credential = ( + DefaultAzureCredential() # local dev — uses CLI/VS credentials + if os.getenv("AZURE_FUNCTIONS_ENVIRONMENT") == "Development" + else ManagedIdentityCredential() # production — deterministic, no fallback chain +) +# For user-assigned identity: ManagedIdentityCredential(client_id="") +``` + +### Java + +```java +import com.azure.identity.DefaultAzureCredentialBuilder; +import com.azure.identity.ManagedIdentityCredentialBuilder; + +var credential = "Development".equals(System.getenv("AZURE_FUNCTIONS_ENVIRONMENT")) + ? new DefaultAzureCredentialBuilder().build() // local dev — uses CLI/VS credentials + : new ManagedIdentityCredentialBuilder().build(); // production — deterministic, no fallback chain +// For user-assigned identity: new ManagedIdentityCredentialBuilder().clientId("").build() +``` + +## Local Development Setup + +`DefaultAzureCredential` is ideal for local dev because it automatically picks up credentials from developer tools: + +1. **Azure CLI** — `az login` +2. **Azure Developer CLI** — `azd auth login` +3. **Azure PowerShell** — `Connect-AzAccount` +4. **Visual Studio / VS Code** — sign in via Azure extension + +```typescript +import { DefaultAzureCredential } from "@azure/identity"; + +// Local development only — uses CLI/PowerShell/VS Code credentials +const credential = new DefaultAzureCredential(); +``` + +## Environment-Aware Pattern + +Detect the runtime environment and select the appropriate credential. The key principle: use `DefaultAzureCredential` only when running locally, and a specific credential in production. + +> **Tip:** Azure Functions sets `AZURE_FUNCTIONS_ENVIRONMENT` to `"Development"` when running locally. For App Service or containers, use any environment variable you control (e.g. `NODE_ENV`, `ASPNETCORE_ENVIRONMENT`). + +```typescript +import { DefaultAzureCredential, ManagedIdentityCredential } from "@azure/identity"; + +function getCredential() { + if (process.env.NODE_ENV === "development") { + return new DefaultAzureCredential(); // picks up az login / VS Code creds + } + return process.env.AZURE_CLIENT_ID + ? new ManagedIdentityCredential(process.env.AZURE_CLIENT_ID) // user-assigned + : new ManagedIdentityCredential(); // system-assigned +} +``` + +## Security Checklist + +- [ ] Use managed identity for all Azure-hosted apps +- [ ] Never hardcode credentials, connection strings, or keys +- [ ] Apply least-privilege RBAC roles at the narrowest scope +- [ ] Use `ManagedIdentityCredential` (not `DefaultAzureCredential`) in production +- [ ] Store any required secrets in Azure Key Vault +- [ ] Rotate secrets and certificates on a schedule +- [ ] Enable Microsoft Defender for Cloud on production resources + +## Further Reading + +- [Passwordless connections overview](https://learn.microsoft.com/azure/developer/intro/passwordless-overview) +- [Managed identities overview](https://learn.microsoft.com/entra/identity/managed-identities-azure-resources/overview) +- [Azure RBAC overview](https://learn.microsoft.com/azure/role-based-access-control/overview) +- [.NET authentication guide](https://learn.microsoft.com/dotnet/azure/sdk/authentication/) +- [Python identity library](https://learn.microsoft.com/python/api/overview/azure/identity-readme) +- [JavaScript identity library](https://learn.microsoft.com/javascript/api/overview/azure/identity-readme) +- [Java identity library](https://learn.microsoft.com/java/api/overview/azure/identity-readme) diff --git a/skills/microsoft-foundry/references/private-network-standard-agent-setup.md b/skills/microsoft-foundry/references/private-network-standard-agent-setup.md new file mode 100644 index 00000000..9f77f225 --- /dev/null +++ b/skills/microsoft-foundry/references/private-network-standard-agent-setup.md @@ -0,0 +1,40 @@ +# Private Network Standard Agent Setup + +> **MANDATORY:** Read [Standard Agent Setup with Network Isolation docs](https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/configure-private-link?tabs=azure-portal&pivots=fdp-project) before proceeding. It covers RBAC requirements, resource provider registration, and role assignments. + +## Overview + +Extends [standard agent setup](standard-agent-setup.md) with full VNet isolation using private endpoints and subnet delegation. All resources communicate over private network only. + +## Networking Constraints + +Two subnets required: + +| Subnet | CIDR | Purpose | Delegation | +|--------|------|---------|------------| +| Agent Subnet | /24 (e.g., 192.168.0.0/24) | Agent workloads | `Microsoft.App/environments` (exclusive) | +| Private Endpoint Subnet | /24 (e.g., 192.168.1.0/24) | Private endpoints | None | + +- All Foundry resources **must be in the same region as the VNet**. +- Agent subnet must be exclusive to one Foundry account. +- VNet address space must not overlap with existing networks or reserved ranges. + +> ⚠️ **Warning:** If providing an existing VNet, ensure both subnets exist before deployment. Otherwise the template creates a new VNet with default address spaces. + +## Deployment + +**Always use the official Bicep template:** +[Private Network Standard Agent Setup Bicep](https://github.com/microsoft-foundry/foundry-samples/tree/main/infrastructure/infrastructure-setup-bicep/15-private-network-standard-agent-setup) + +> ⚠️ **Warning:** Capability host provisioning is **asynchronous** (10–20 minutes). Poll deployment status until success before proceeding. + +## Post-Deployment + +1. **Deploy a model** to the new AI Services account (e.g., `gpt-4o`). Fall back to `Standard` SKU if `GlobalStandard` quota is exhausted. +2. **Create the agent** using MCP tools (`agent_update`) or the Python SDK. + +## References + +- [Azure AI Foundry Networking](https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/configure-private-link?tabs=azure-portal&pivots=fdp-project) +- [Azure AI Foundry RBAC](https://learn.microsoft.com/en-us/azure/ai-foundry/concepts/rbac-azure-ai-foundry?pivots=fdp-project) +- [Standard Agent Setup (public network)](standard-agent-setup.md) diff --git a/skills/microsoft-foundry/references/sdk/foundry-sdk-py.md b/skills/microsoft-foundry/references/sdk/foundry-sdk-py.md new file mode 100644 index 00000000..2a54721e --- /dev/null +++ b/skills/microsoft-foundry/references/sdk/foundry-sdk-py.md @@ -0,0 +1,265 @@ +# Microsoft Foundry - Python SDK Guide + +Python-specific implementations for working with Microsoft Foundry. + +**Table of Contents:** [Prerequisites](#prerequisites) · [Model Discovery and Deployment](#model-discovery-and-deployment-mcp) · [RAG Agent with Azure AI Search](#rag-agent-with-azure-ai-search) · [Creating Agents](#creating-agents) · [Agent Evaluation](#agent-evaluation) · [Knowledge Index Operations](#knowledge-index-operations-mcp) · [Best Practices](#best-practices) · [Error Handling](#error-handling) + +## Prerequisites + +```bash +pip install azure-ai-projects azure-identity azure-ai-inference openai azure-ai-evaluation python-dotenv +``` + +### Environment Variables + +```bash +PROJECT_ENDPOINT=https://.services.ai.azure.com/api/projects/ +MODEL_DEPLOYMENT_NAME=gpt-4o +AZURE_AI_SEARCH_CONNECTION_NAME=my-search-connection +AI_SEARCH_INDEX_NAME=my-index +AZURE_OPENAI_ENDPOINT=https://.openai.azure.com +AZURE_OPENAI_DEPLOYMENT=gpt-4o +``` + +## Model Discovery and Deployment (MCP) + +```python +foundry_models_list() # All models +foundry_models_list(publisher="OpenAI") # Filter by publisher +foundry_models_list(search_for_free_playground=True) # Free playground models + +foundry_models_deploy( + resource_group="my-rg", deployment="gpt-4o-deployment", + model_name="gpt-4o", model_format="OpenAI", + azure_ai_services="my-foundry-resource", + model_version="2024-05-13", sku_capacity=10, scale_type="Standard" +) +``` + +## RAG Agent with Azure AI Search + +> **Auth:** `DefaultAzureCredential` is for local development. See [auth-best-practices.md](../auth-best-practices.md) for production patterns. + +```python +import os +from azure.ai.projects import AIProjectClient +from azure.identity import DefaultAzureCredential +from azure.ai.agents.models import ( + AzureAISearchToolDefinition, AzureAISearchToolResource, + AISearchIndexResource, AzureAISearchQueryType, +) + +project_client = AIProjectClient( + endpoint=os.environ["FOUNDRY_PROJECT_ENDPOINT"], + credential=DefaultAzureCredential(), +) + +azs_connection = project_client.connections.get( + os.environ["AZURE_AI_SEARCH_CONNECTION_NAME"] +) + +agent = project_client.agents.create_agent( + model=os.environ["FOUNDRY_MODEL_DEPLOYMENT_NAME"], + name="RAGAgent", + instructions="You are a helpful assistant. Use the knowledge base to answer. " + "Provide citations as: `[message_idx:search_idx†source]`.", + tools=[AzureAISearchToolDefinition( + azure_ai_search=AzureAISearchToolResource(indexes=[ + AISearchIndexResource( + index_connection_id=azs_connection.id, + index_name=os.environ["AI_SEARCH_INDEX_NAME"], + query_type=AzureAISearchQueryType.HYBRID, + ), + ]) + )], +) +``` + +### Querying a RAG Agent (Streaming) + +```python +openai_client = project_client.get_openai_client() + +stream = openai_client.responses.create( + stream=True, tool_choice="required", input="Your question here", + extra_body={"agent": {"name": agent.name, "type": "agent_reference"}}, +) +for event in stream: + if event.type == "response.output_text.delta": + print(event.delta, end="", flush=True) + elif event.type == "response.output_item.done": + if event.item.type == "message" and event.item.content[-1].type == "output_text": + for ann in event.item.content[-1].annotations: + if ann.type == "url_citation": + print(f"\nCitation: {ann.url}") +``` + +## Creating Agents + +### Basic Agent + +```python +agent = project_client.agents.create_agent( + model=os.environ["MODEL_DEPLOYMENT_NAME"], + name="my-agent", + instructions="You are a helpful assistant.", +) +``` + +### Agent with Custom Function Tools + +```python +from azure.ai.agents.models import FunctionTool, ToolSet + +def get_weather(location: str, unit: str = "celsius") -> str: + """Get the current weather for a location.""" + return f"Sunny and 22°{unit[0].upper()} in {location}" + +functions = FunctionTool([get_weather]) +toolset = ToolSet() +toolset.add(functions) + +agent = project_client.agents.create_agent( + model=os.environ["MODEL_DEPLOYMENT_NAME"], + name="function-agent", + instructions="You are a helpful assistant with tool access.", + toolset=toolset, +) +``` + +### Agent with Web Search + +```python +from azure.ai.projects.models import ( + PromptAgentDefinition, WebSearchPreviewTool, ApproximateLocation, +) + +agent = project_client.agents.create_version( + agent_name="WebSearchAgent", + definition=PromptAgentDefinition( + model=os.environ["MODEL_DEPLOYMENT_NAME"], + instructions="Search the web for current information. Provide sources.", + tools=[ + WebSearchPreviewTool( + user_location=ApproximateLocation( + country="US", city="Seattle", region="Washington" + ) + ) + ], + ), +) +``` + +> 💡 **Tip:** `WebSearchPreviewTool` requires no external resource or connection. For Bing Grounding (which requires a dedicated Bing resource and project connection), see [Bing Grounding reference](../../foundry-agent/create/references/tool-bing-grounding.md). + +### Interacting with Agents + +```python +from azure.ai.agents.models import ListSortOrder + +thread = project_client.agents.threads.create() +project_client.agents.messages.create(thread_id=thread.id, role="user", content="Hello") + +run = project_client.agents.runs.create_and_process(thread_id=thread.id, agent_id=agent.id) +if run.status == "failed": + print(f"Run failed: {run.last_error}") + +messages = project_client.agents.messages.list(thread_id=thread.id, order=ListSortOrder.ASCENDING) +for msg in messages: + if msg.text_messages: + print(f"{msg.role}: {msg.text_messages[-1].text.value}") + +project_client.agents.delete_agent(agent.id) +``` + +## Agent Evaluation + +### Single Response Evaluation (MCP) + +```python +foundry_agents_query_and_evaluate( + agent_id="", query="What's the weather?", + endpoint="https://my-foundry.services.ai.azure.com/api/projects/my-project", + azure_openai_endpoint="https://my-openai.openai.azure.com", + azure_openai_deployment="gpt-4o", + evaluators="intent_resolution,task_adherence,tool_call_accuracy" +) + +foundry_agents_evaluate( + query="What's the weather?", response="Sunny and 22°C.", + evaluator="intent_resolution", + azure_openai_endpoint="https://my-openai.openai.azure.com", + azure_openai_deployment="gpt-4o" +) +``` + +### Batch Evaluation + +```python +from azure.ai.evaluation import AIAgentConverter, IntentResolutionEvaluator, evaluate + +converter = AIAgentConverter(project_client) +converter.prepare_evaluation_data(thread_ids=["t1", "t2", "t3"], filename="eval_data.jsonl") + +result = evaluate( + data="eval_data.jsonl", + evaluators={ + "intent_resolution": IntentResolutionEvaluator( + azure_openai_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"], + azure_openai_deployment=os.environ["AZURE_OPENAI_DEPLOYMENT"] + ), + }, + output_path="./eval_results" +) +print(f"Results: {result['studio_url']}") +``` + +> 💡 **Tip:** Continuous evaluation requires project managed identity with **Azure AI User** role and Application Insights connected to the project. + +## Knowledge Index Operations (MCP) + +```python +foundry_knowledge_index_list(endpoint="") +foundry_knowledge_index_schema(endpoint="", index="my-index") +``` + +## Best Practices + +1. **Never hardcode credentials** — use environment variables and `python-dotenv` +2. **Check `run.status`** and handle `HttpResponseError` exceptions +3. **Reuse `AIProjectClient`** instances — don't create new ones per request +4. **Use type hints** in custom functions for better tool integration +5. **Use context managers** for agent cleanup + +## Error Handling + +```python +from azure.core.exceptions import HttpResponseError + +try: + agent = project_client.agents.create_agent( + model=os.environ["MODEL_DEPLOYMENT_NAME"], + name="my-agent", instructions="You are helpful." + ) +except HttpResponseError as e: + if e.status_code == 429: + print("Rate limited — wait and retry with exponential backoff.") + elif e.status_code == 401: + print("Authentication failed — check credentials.") + else: + print(f"Error: {e.message}") +``` + +### Context Manager for Agent Cleanup + +```python +from contextlib import contextmanager + +@contextmanager +def temporary_agent(project_client, **kwargs): + agent = project_client.agents.create_agent(**kwargs) + try: + yield agent + finally: + project_client.agents.delete_agent(agent.id) +``` diff --git a/skills/microsoft-foundry/references/standard-agent-setup.md b/skills/microsoft-foundry/references/standard-agent-setup.md new file mode 100644 index 00000000..3bf849e1 --- /dev/null +++ b/skills/microsoft-foundry/references/standard-agent-setup.md @@ -0,0 +1,51 @@ +# Standard Agent Setup + +> **MANDATORY:** Read [Standard Agent Setup docs](https://learn.microsoft.com/en-us/azure/foundry/agents/concepts/standard-agent-setup?view=foundry) before proceeding with standard setup. + +## Overview + +Azure AI Foundry supports two agent setup configurations: + +| Setup | Capability Host | Description | +|-------|----------------|-------------| +| **Basic** | None | Default setup. All resources are Microsoft-managed. No additional connections required. | +| **Standard** | Azure AI Services | Advanced setup. Bring-your-own storage and search connections for full control over data residency and scaling. | + +## Standard Setup Connections + +| Connection | Service | Required | Purpose | +|------------|---------|----------|---------| +| Thread storage | Azure Cosmos DB | ✅ Yes | Store conversation threads in your own Cosmos DB instance | +| File storage | Azure Storage | ✅ Yes | Store uploaded files in your own Azure Storage account | +| Vector store | Azure AI Search | ✅ Yes | Use your own Azure AI Search instance for vector/knowledge retrieval | +| Azure AI Services | Azure AI Services | ❌ Optional | Use OpenAI models from a different AI Services resource | + +> 💡 **Tip:** Standard setup is recommended for production workloads that require control over data storage, custom vector search, or integration with models from a separate AI Services resource. + +## Prerequisites + +Before starting deployment, confirm the following with the user: + +1. **RBAC role on the resource group:** The user must have **Owner** or **User Access Administrator** role on the target resource group. The Bicep template assigns RBAC roles (Storage Blob Data Contributor, Cosmos DB Operator, AI Search roles) to the project's managed identity — this will fail without `Microsoft.Authorization/roleAssignments/write` permission. +2. **Subscription quota:** Verify the target region has available quota for AI Services. If quota is exhausted, try an alternate region (e.g., `swedencentral`, `eastus`, `westus3`). +3. **Azure Policy compliance:** Some subscriptions enforce policies (e.g., storage accounts must disable public network access). If the Bicep template fails due to policy violations, patch the template to comply (e.g., set `publicNetworkAccess: 'Disabled'` and `defaultAction: 'Deny'` on the storage account). + +## Deployment + +- Standard setup always creates a **new Foundry resource and a new project**. Do not ask the user for a project endpoint — one will be provisioned as part of the deployment. +- **Always use the official Bicep template:** + [Standard Agent Setup Bicep Template](https://github.com/azure-ai-foundry/foundry-samples/blob/main/infrastructure/infrastructure-setup-bicep/43-standard-agent-setup-with-customization/main.bicep) + +> ⚠️ **Warning:** Capability host provisioning is **asynchronous** and can take 10–20 minutes. After deploying the Bicep template, you **must poll** the deployment status until it succeeds. Do not assume the setup is complete immediately. + +## Post-Deployment: Model & Agent + +After infrastructure provisioning succeeds: + +1. **Deploy a model** to the new AI Services account (e.g., `gpt-4o`). If `GlobalStandard` SKU quota is exhausted, fall back to `Standard` SKU. +2. **Create the agent** using MCP tools (`agent_update`) or the Python SDK (`client.agents.create_version`). See [SDK Operations](../foundry-agent/create/references/sdk-operations.md) for details. + +## References + +- [Capability Hosts — Agent Setup Types](https://learn.microsoft.com/en-us/azure/ai-foundry/agents/concepts/capability-hosts?view=foundry) +- [Standard Agent Setup](https://learn.microsoft.com/en-us/azure/foundry/agents/concepts/standard-agent-setup?view=foundry) diff --git a/skills/microsoft-foundry/resource/create/create-foundry-resource.md b/skills/microsoft-foundry/resource/create/create-foundry-resource.md new file mode 100644 index 00000000..c143149d --- /dev/null +++ b/skills/microsoft-foundry/resource/create/create-foundry-resource.md @@ -0,0 +1,152 @@ +--- +name: microsoft-foundry:resource/create +description: | + Create Azure AI Services multi-service resource (Foundry resource) using Azure CLI. + USE FOR: create Foundry resource, new AI Services resource, create multi-service resource, provision Azure AI Services, AIServices kind resource, register resource provider, enable Cognitive Services, setup AI Services account, create resource group for Foundry. + DO NOT USE FOR: creating ML workspace hubs (use microsoft-foundry:project/create), deploying models (use microsoft-foundry:models/deploy), managing permissions (use microsoft-foundry:rbac), monitoring resource usage (use microsoft-foundry:quota). +compatibility: + required: + - azure-cli: ">=2.0" + optional: + - powershell: ">=7.0" + - azure-portal: "any" +--- + +# Create Foundry Resource + +This sub-skill orchestrates creation of Azure AI Services multi-service resources using Azure CLI. + +> **Important:** All resource creation operations are **control plane (management)** operations. Use **Azure CLI commands** as the primary method. + +> **Note:** For monitoring resource usage and quotas, use the `microsoft-foundry:quota` skill. + +**Table of Contents:** [Quick Reference](#quick-reference) · [When to Use](#when-to-use) · [Prerequisites](#prerequisites) · [Core Workflows](#core-workflows) · [Important Notes](#important-notes) · [Additional Resources](#additional-resources) + +## Quick Reference + +| Property | Value | +|----------|-------| +| **Classification** | WORKFLOW SKILL | +| **Operation Type** | Control Plane (Management) | +| **Primary Method** | Azure CLI: `az cognitiveservices account create` | +| **Resource Type** | `Microsoft.CognitiveServices/accounts` (kind: `AIServices`) | +| **Resource Kind** | `AIServices` (multi-service) | + +## When to Use + +Use this sub-skill when you need to: + +- **Create Foundry resource** - Provision new Azure AI Services multi-service account +- **Create resource group** - Set up resource group before creating resources +- **Register resource provider** - Enable Microsoft.CognitiveServices provider +- **Manual resource creation** - CLI-based resource provisioning + +**Do NOT use for:** +- Creating ML workspace hubs/projects (use `microsoft-foundry:project/create`) +- Deploying AI models (use `microsoft-foundry:models/deploy`) +- Managing RBAC permissions (use `microsoft-foundry:rbac`) +- Monitoring resource usage (use `microsoft-foundry:quota`) + +## Prerequisites + +- **Azure subscription** - Active subscription ([create free account](https://azure.microsoft.com/pricing/purchase-options/azure-account)) +- **Azure CLI** - Version 2.0 or later installed +- **Authentication** - Run `az login` before commands +- **RBAC roles** - One of: + - Contributor + - Owner + - Custom role with `Microsoft.CognitiveServices/accounts/write` +- **Resource provider** - `Microsoft.CognitiveServices` must be registered in your subscription + - If not registered, see [Workflow #3: Register Resource Provider](#3-register-resource-provider) + - If you lack permissions, ask a subscription Owner/Contributor to register it or grant you `/register/action` privilege + +> **Need RBAC help?** See [microsoft-foundry:rbac](../../rbac/rbac.md) for permission management. + +## Core Workflows + +### 1. Create Resource Group + +**Command Pattern:** "Create a resource group for my Foundry resources" + +#### Steps + +1. **Ask user preference**: Use existing or create new resource group +2. **If using existing**: List and let user select from available groups (0-4: show all, 5+: show 5 most recent with "Other" option) +3. **If creating new**: Ask user to choose region, then create + +```bash +# List existing resource groups +az group list --query "[-5:].{Name:name, Location:location}" --out table + +# Or create new +az group create --name --location +az group show --name --query "{Name:name, Location:location, State:properties.provisioningState}" +``` + +See [Detailed Workflow Steps](./references/workflows.md) for complete instructions. + +--- + +### 2. Create Foundry Resource + +**Command Pattern:** "Create a new Azure AI Services resource" + +#### Steps + +1. **Verify prerequisites**: Check Azure CLI, authentication, and provider registration +2. **Choose location**: Always ask user to select region (don't assume resource group location) +3. **Create resource**: Use `--kind AIServices` and `--sku S0` (only supported tier) +4. **Verify and get keys** + +```bash +# Create Foundry resource +az cognitiveservices account create \ + --name \ + --resource-group \ + --kind AIServices \ + --sku S0 \ + --location \ + --yes + +# Verify and get keys +az cognitiveservices account show --name --resource-group +az cognitiveservices account keys list --name --resource-group +``` + +**Important:** S0 (Standard) is the only supported SKU - F0 free tier not available for AIServices. + +See [Detailed Workflow Steps](./references/workflows.md) for complete instructions. + +--- + +### 3. Register Resource Provider + +**Command Pattern:** "Register Cognitive Services provider" + +Required when first creating Cognitive Services in subscription or if you get `ResourceProviderNotRegistered` error. + +```bash +# Register provider (requires Owner/Contributor role) +az provider register --namespace Microsoft.CognitiveServices +az provider show --namespace Microsoft.CognitiveServices --query "registrationState" +``` + +If you lack permissions, ask a subscription Owner/Contributor to register it or use `microsoft-foundry:rbac` skill. + +See [Detailed Workflow Steps](./references/workflows.md) for complete instructions. + +--- + +## Important Notes + +- **Resource kind must be `AIServices`** for multi-service Foundry resources +- **SKU must be S0** (Standard) - F0 free tier not available for AIServices +- Always ask user to choose location - different regions may have varying availability + +--- + +## Additional Resources + +- [Common Patterns](./references/patterns.md) - Quick setup patterns and command reference +- [Troubleshooting](./references/troubleshooting.md) - Common errors and solutions +- [Azure AI Services documentation](https://learn.microsoft.com/en-us/azure/ai-services/multi-service-resource?pivots=azcli) diff --git a/skills/microsoft-foundry/resource/create/references/patterns.md b/skills/microsoft-foundry/resource/create/references/patterns.md new file mode 100644 index 00000000..5c7622f7 --- /dev/null +++ b/skills/microsoft-foundry/resource/create/references/patterns.md @@ -0,0 +1,136 @@ +# Common Patterns: Create Foundry Resource + +**Table of Contents:** [Pattern A: Quick Setup](#pattern-a-quick-setup) · [Pattern B: Multi-Region Setup](#pattern-b-multi-region-setup) · [Quick Commands Reference](#quick-commands-reference) + +## Pattern A: Quick Setup + +Complete setup in one go: + +```bash +# Ask user: "Use existing resource group or create new?" + +# ==== If user chooses "Use existing" ==== +# Count and list existing resource groups +TOTAL_RG_COUNT=$(az group list --query "length([])" -o tsv) +az group list --query "[-5:].{Name:name, Location:location}" --out table + +# Based on count: show appropriate list and options +# User selects resource group +RG="" + +# Fetch details to verify +az group show --name $RG --query "{Name:name, Location:location, State:properties.provisioningState}" +# Then skip to creating Foundry resource below + +# ==== If user chooses "Create new" ==== +# List regions and ask user to choose +az account list-locations --query "[].{Region:name}" --out table + +# Variables +RG="rg-ai-services" # New resource group name +LOCATION="westus2" # User's chosen location +RESOURCE_NAME="my-foundry-resource" + +# Create new resource group +az group create --name $RG --location $LOCATION + +# Verify creation +az group show --name $RG --query "{Name:name, Location:location, State:properties.provisioningState}" + +# Create Foundry resource in user's chosen location +az cognitiveservices account create \ + --name $RESOURCE_NAME \ + --resource-group $RG \ + --kind AIServices \ + --sku S0 \ + --location $LOCATION \ + --yes + +# Get endpoint and keys +echo "Resource created successfully!" +az cognitiveservices account show \ + --name $RESOURCE_NAME \ + --resource-group $RG \ + --query "{Endpoint:properties.endpoint, Location:location}" + +az cognitiveservices account keys list \ + --name $RESOURCE_NAME \ + --resource-group $RG +``` + +## Pattern B: Multi-Region Setup + +Create resources in multiple regions: + +```bash +# Variables +RG="rg-ai-services" +REGIONS=("eastus" "westus2" "westeurope") + +# Create resource group +az group create --name $RG --location eastus + +# Create resources in each region +for REGION in "${REGIONS[@]}"; do + RESOURCE_NAME="foundry-${REGION}" + echo "Creating resource in $REGION..." + + az cognitiveservices account create \ + --name $RESOURCE_NAME \ + --resource-group $RG \ + --kind AIServices \ + --sku S0 \ + --location $REGION \ + --yes + + echo "Resource $RESOURCE_NAME created in $REGION" +done + +# List all resources +az cognitiveservices account list --resource-group $RG --output table +``` + +## Quick Commands Reference + +```bash +# Count total resource groups to determine which scenario applies +az group list --query "length([])" -o tsv + +# Check existing resource groups (up to 5 most recent) +# 0 → create new | 1-4 → select or create | 5+ → select/other/create +az group list --query "[-5:].{Name:name, Location:location}" --out table + +# If 5+ resource groups exist and user selects "Other", show all +az group list --query "[].{Name:name, Location:location}" --out table + +# If user selects existing resource group, fetch details to verify and get location +az group show --name --query "{Name:name, Location:location, State:properties.provisioningState}" + +# List available regions (for creating new resource group) +az account list-locations --query "[].{Region:name}" --out table + +# Create resource group (if needed) +az group create --name rg-ai-services --location westus2 + +# Create Foundry resource +az cognitiveservices account create \ + --name my-foundry-resource \ + --resource-group rg-ai-services \ + --kind AIServices \ + --sku S0 \ + --location westus2 \ + --yes + +# List resources in group +az cognitiveservices account list --resource-group rg-ai-services + +# Get resource details +az cognitiveservices account show \ + --name my-foundry-resource \ + --resource-group rg-ai-services + +# Delete resource +az cognitiveservices account delete \ + --name my-foundry-resource \ + --resource-group rg-ai-services +``` diff --git a/skills/microsoft-foundry/resource/create/references/troubleshooting.md b/skills/microsoft-foundry/resource/create/references/troubleshooting.md new file mode 100644 index 00000000..c4cd1e67 --- /dev/null +++ b/skills/microsoft-foundry/resource/create/references/troubleshooting.md @@ -0,0 +1,92 @@ +# Troubleshooting: Create Foundry Resource + +## Resource Creation Failures + +### ResourceProviderNotRegistered + +**Solution:** +1. If you have Owner/Contributor role, register the provider: + ```bash + az provider register --namespace Microsoft.CognitiveServices + ``` +2. If you lack permissions, ask a subscription Owner or Contributor to register it +3. Alternatively, ask them to grant you the `/register/action` privilege + +### InsufficientPermissions + +**Solution:** +```bash +# Check your role assignments +az role assignment list --assignee --subscription + +# You need: Contributor, Owner, or custom role with Microsoft.CognitiveServices/accounts/write +``` + +Use `microsoft-foundry:rbac` skill to manage permissions. + +### LocationNotAvailableForResourceType + +**Solution:** +```bash +# List available regions for Cognitive Services +az provider show --namespace Microsoft.CognitiveServices \ + --query "resourceTypes[?resourceType=='accounts'].locations" --out table + +# Choose different region from the list +``` + +### ResourceNameNotAvailable + +Resource name must be globally unique. Try adding a unique suffix: + +```bash +UNIQUE_SUFFIX=$(date +%s) +az cognitiveservices account create \ + --name "foundry-${UNIQUE_SUFFIX}" \ + --resource-group \ + --kind AIServices \ + --sku S0 \ + --location \ + --yes +``` + +## Resource Shows as Failed + +**Check provisioning state:** +```bash +az cognitiveservices account show \ + --name \ + --resource-group \ + --query "properties.provisioningState" +``` + +If `Failed`, delete and recreate: +```bash +# Delete failed resource +az cognitiveservices account delete \ + --name \ + --resource-group + +# Recreate +az cognitiveservices account create \ + --name \ + --resource-group \ + --kind AIServices \ + --sku S0 \ + --location \ + --yes +``` + +## Cannot Access Keys + +**Error:** `AuthorizationFailed` when listing keys + +**Solution:** You need `Cognitive Services User` or higher role on the resource. + +Use `microsoft-foundry:rbac` skill to grant appropriate permissions. + +## External Resources + +- [Create multi-service resource](https://learn.microsoft.com/en-us/azure/ai-services/multi-service-resource?pivots=azcli) +- [Azure AI Services documentation](https://learn.microsoft.com/en-us/azure/ai-services/) +- [Azure regions with AI Services](https://azure.microsoft.com/global-infrastructure/services/?products=cognitive-services) diff --git a/skills/microsoft-foundry/resource/create/references/workflows.md b/skills/microsoft-foundry/resource/create/references/workflows.md new file mode 100644 index 00000000..a3cd8c52 --- /dev/null +++ b/skills/microsoft-foundry/resource/create/references/workflows.md @@ -0,0 +1,237 @@ +# Detailed Workflows: Create Foundry Resource + +**Table of Contents:** [Workflow 1: Create Resource Group](#workflow-1-create-resource-group---detailed-steps) · [Workflow 2: Create Foundry Resource](#workflow-2-create-foundry-resource---detailed-steps) · [Workflow 3: Register Resource Provider](#workflow-3-register-resource-provider---detailed-steps) + +## Workflow 1: Create Resource Group - Detailed Steps + +### Step 1: Ask user preference + +Ask the user which option they prefer: +1. Use an existing resource group +2. Create a new resource group + +### Step 2a: If user chooses "Use existing resource group" + +Count and list existing resource groups: + +```bash +# Count total resource groups +TOTAL_RG_COUNT=$(az group list --query "length([])" -o tsv) + +# Get list of resource groups (up to 5 most recent) +az group list --query "[-5:].{Name:name, Location:location}" --out table +``` + +**Handle based on count:** + +**If 0 resources found:** +- Inform user: "No existing resource groups found" +- Ask if they want to create a new one, then proceed to Step 2b + +**If 1-4 resources found:** +- Display all X resource groups to the user +- Let user select from the list +- Fetch the selected resource group details: + ```bash + az group show --name --query "{Name:name, Location:location, State:properties.provisioningState}" + ``` +- Display details to user, then proceed to create Foundry resource + +**If 5+ resources found:** +- Display the 5 most recent resource groups +- Present options: + 1. Select from the 5 displayed + 2. Other (see all resource groups) +- If user selects a resource group, fetch details: + ```bash + az group show --name --query "{Name:name, Location:location, State:properties.provisioningState}" + ``` +- If user chooses "Other", show all: + ```bash + az group list --query "[].{Name:name, Location:location}" --out table + ``` + Then let user select, and fetch details as above +- Display details to user, then proceed to create Foundry resource + +### Step 2b: If user chooses "Create new resource group" + +1. List available Azure regions: + +```bash +az account list-locations --query "[].{Region:name}" --out table +``` + +Common regions: +- `eastus`, `eastus2` - US East Coast +- `westus`, `westus2`, `westus3` - US West Coast +- `centralus` - US Central +- `westeurope`, `northeurope` - Europe +- `southeastasia`, `eastasia` - Asia Pacific + +2. Ask user to choose a region from the list above + +3. Create resource group in the chosen region: + +```bash +az group create \ + --name \ + --location +``` + +4. Verify creation: + +```bash +az group show --name --query "{Name:name, Location:location, State:properties.provisioningState}" +``` + +Expected output: `State: "Succeeded"` + +## Workflow 2: Create Foundry Resource - Detailed Steps + +### Step 1: Verify prerequisites + +```bash +# Check Azure CLI version (need 2.0+) +az --version + +# Verify authentication +az account show + +# Check resource provider registration status +az provider show --namespace Microsoft.CognitiveServices --query "registrationState" +``` + +If provider not registered, see Workflow #3: Register Resource Provider. + +### Step 2: Choose location + +**Always ask the user to choose a location.** List available regions and let the user select: + +```bash +# List available regions for Cognitive Services +az account list-locations --query "[].{Region:name, DisplayName:displayName}" --out table +``` + +Common regions for AI Services: +- `eastus`, `eastus2` - US East Coast +- `westus`, `westus2`, `westus3` - US West Coast +- `centralus` - US Central +- `westeurope`, `northeurope` - Europe +- `southeastasia`, `eastasia` - Asia Pacific + +> **Important:** Do not automatically use the resource group's location. Always ask the user which region they prefer. + +### Step 3: Create Foundry resource + +```bash +az cognitiveservices account create \ + --name \ + --resource-group \ + --kind AIServices \ + --sku S0 \ + --location \ + --yes +``` + +**Parameters:** +- `--name`: Unique resource name (globally unique across Azure) +- `--resource-group`: Existing resource group name +- `--kind`: **Must be `AIServices`** for multi-service resource +- `--sku`: Must be **S0** (Standard - the only supported tier for AIServices) +- `--location`: Azure region (**always ask user to choose** from available regions) +- `--yes`: Auto-accept terms without prompting + +### Step 4: Verify resource creation + +```bash +# Check resource details to verify creation +az cognitiveservices account show \ + --name \ + --resource-group + +# View endpoint and configuration +az cognitiveservices account show \ + --name \ + --resource-group \ + --query "{Name:name, Endpoint:properties.endpoint, Location:location, Kind:kind, SKU:sku.name}" +``` + +Expected output: +- `provisioningState: "Succeeded"` +- Endpoint URL +- SKU: S0 +- Kind: AIServices + +### Step 5: Get access keys + +```bash +az cognitiveservices account keys list \ + --name \ + --resource-group +``` + +This returns `key1` and `key2` for API authentication. + +## Workflow 3: Register Resource Provider - Detailed Steps + +### When Needed + +Required when: +- First time creating Cognitive Services in subscription +- Error: `ResourceProviderNotRegistered` +- Insufficient permissions during resource creation + +### Steps + +**Step 1: Check registration status** + +```bash +az provider show \ + --namespace Microsoft.CognitiveServices \ + --query "registrationState" +``` + +Possible states: +- `Registered`: Ready to use +- `NotRegistered`: Needs registration +- `Registering`: Registration in progress + +**Step 2: Register provider** + +```bash +az provider register --namespace Microsoft.CognitiveServices +``` + +**Step 3: Wait for registration** + +Registration typically takes 1-2 minutes. Check status: + +```bash +az provider show \ + --namespace Microsoft.CognitiveServices \ + --query "registrationState" +``` + +Wait until state is `Registered`. + +**Step 4: Verify registration** + +```bash +az provider list --query "[?namespace=='Microsoft.CognitiveServices']" +``` + +### Required Permissions + +To register a resource provider, you need one of: +- **Subscription Owner** role +- **Contributor** role +- **Custom role** with `Microsoft.*/register/action` permission + +**If you are not the subscription owner:** +1. Ask someone with the **Owner** or **Contributor** role to register the provider for you +2. Alternatively, ask them to grant you the `/register/action` privilege so you can register it yourself + +**Alternative registration methods:** +- **Azure CLI** (recommended): `az provider register --namespace Microsoft.CognitiveServices` +- **Azure Portal**: Navigate to Subscriptions → Resource providers → Microsoft.CognitiveServices → Register +- **PowerShell**: `Register-AzResourceProvider -ProviderNamespace Microsoft.CognitiveServices`