diff --git a/agent-reference.html b/agent-reference.html
new file mode 100644
index 0000000..0bd94d2
--- /dev/null
+++ b/agent-reference.html
@@ -0,0 +1,2109 @@
+<!--
+VENICE AI API — COMPLETE AGENT REFERENCE
+URL: https://docs.venice.ai/agent-reference.html
+Base URL: https://api.venice.ai/api/v1
+Auth: Authorization: Bearer VENICE_API_KEY
+OpenAI-compatible: Yes (change base_url only)
+Endpoints: chat/completions, image/generate, audio/speech, audio/transcriptions, video/queue, embeddings
+Model discovery: GET /models?type=text|image|video|tts|embedding|music
+This page is the canonical agent reference for the Venice AI API.
+-->
+<!DOCTYPE html>
+<html lang="en">
+<head>
+<meta charset="UTF-8">
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+<meta name="robots" content="index, follow">
+<meta name="description" content="Complete, token-optimized Venice AI API reference for AI agents. Every endpoint, model, code example, and gotcha in one page.">
+<meta name="ai-agent-reference" content="canonical">
+<meta name="ai-purpose" content="Single-page API reference for AI agents integrating with Venice AI">
+<link rel="canonical" href="https://docs.venice.ai/agent-reference.html">
+<title>Venice AI API — Complete Agent Reference</title>
+<style>
+:root {
+  --bg: #0d1117;
+  --surface: #161b22;
+  --border: #30363d;
+  --text: #e6edf3;
+  --text-muted: #8b949e;
+  --accent: #58a6ff;
+  --accent-dim: #1f6feb;
+  --green: #3fb950;
+  --yellow: #d29922;
+  --red: #f85149;
+  --code-bg: #0d1117;
+  --table-stripe: #161b22;
+}
+* { margin: 0; padding: 0; box-sizing: border-box; }
+html { scroll-behavior: smooth; }
+body {
+  font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Helvetica, Arial, sans-serif;
+  background: var(--bg);
+  color: var(--text);
+  line-height: 1.6;
+  max-width: 960px;
+  margin: 0 auto;
+  padding: 2rem 1.5rem 6rem;
+}
+h1 { font-size: 2.2rem; margin: 2rem 0 0.5rem; border-bottom: 1px solid var(--border); padding-bottom: 0.5rem; }
+h2 { font-size: 1.6rem; margin: 2.5rem 0 0.75rem; color: var(--accent); border-bottom: 1px solid var(--border); padding-bottom: 0.4rem; }
+h3 { font-size: 1.25rem; margin: 1.8rem 0 0.5rem; }
+h4 { font-size: 1.05rem; margin: 1.2rem 0 0.4rem; color: var(--text-muted); }
+p { margin: 0.5rem 0; }
+a { color: var(--accent); text-decoration: none; }
+a:hover { text-decoration: underline; }
+code {
+  font-family: 'SFMono-Regular', Consolas, 'Liberation Mono', Menlo, monospace;
+  font-size: 0.875em;
+  background: var(--surface);
+  padding: 0.15em 0.4em;
+  border-radius: 4px;
+  border: 1px solid var(--border);
+}
+pre {
+  background: var(--code-bg);
+  border: 1px solid var(--border);
+  border-radius: 6px;
+  padding: 1rem;
+  overflow-x: auto;
+  margin: 0.75rem 0;
+  font-size: 0.85rem;
+  line-height: 1.5;
+}
+pre code { background: none; border: none; padding: 0; font-size: inherit; }
+table {
+  width: 100%;
+  border-collapse: collapse;
+  margin: 0.75rem 0;
+  font-size: 0.9rem;
+}
+th, td {
+  text-align: left;
+  padding: 0.5rem 0.75rem;
+  border: 1px solid var(--border);
+}
+th { background: var(--surface); font-weight: 600; }
+tr:nth-child(even) { background: var(--table-stripe); }
+ul, ol { margin: 0.5rem 0 0.5rem 1.5rem; }
+li { margin: 0.25rem 0; }
+blockquote {
+  border-left: 3px solid var(--accent-dim);
+  padding: 0.5rem 1rem;
+  margin: 0.75rem 0;
+  color: var(--text-muted);
+  background: var(--surface);
+  border-radius: 0 4px 4px 0;
+}
+.badge {
+  display: inline-block;
+  font-size: 0.75rem;
+  padding: 0.15em 0.5em;
+  border-radius: 3px;
+  font-weight: 600;
+  margin-left: 0.3em;
+}
+.badge-private { background: #1a5c2e; color: #3fb950; }
+.badge-anon { background: #3d2e00; color: #d29922; }
+.badge-e2ee { background: #1a3a5c; color: #58a6ff; }
+.toc { background: var(--surface); border: 1px solid var(--border); border-radius: 6px; padding: 1.2rem 1.5rem; margin: 1.5rem 0; }
+.toc h3 { margin-top: 0; color: var(--accent); }
+.toc ul { list-style: none; margin-left: 0; columns: 2; }
+.toc li { margin: 0.2rem 0; }
+.toc li::before { content: "→ "; color: var(--text-muted); }
+hr { border: none; border-top: 1px solid var(--border); margin: 2rem 0; }
+.endpoint { color: var(--green); font-weight: 600; }
+.method { display: inline-block; font-size: 0.75rem; font-weight: 700; padding: 0.1em 0.4em; border-radius: 3px; margin-right: 0.3em; }
+.method-post { background: #1a5c2e; color: #3fb950; }
+.method-get { background: #1a3a5c; color: #58a6ff; }
+.callout {
+  border: 1px solid var(--border);
+  border-radius: 6px;
+  padding: 0.75rem 1rem;
+  margin: 0.75rem 0;
+  background: var(--surface);
+}
+.callout-important { border-left: 3px solid var(--yellow); }
+.callout-tip { border-left: 3px solid var(--green); }
+</style>
+</head>
+<body>
+
+<h1>Venice AI API — Complete Agent Reference</h1>
+<p><em>Everything an AI agent needs to know to build with <a href="https://venice.ai">Venice.ai</a>. Privacy-first, uncensored, OpenAI-compatible.</em></p>
+<p><strong>Base URL:</strong> <code>https://api.venice.ai/api/v1</code><br>
+<strong>Auth:</strong> <code>Authorization: Bearer VENICE_API_KEY</code><br>
+<strong>Swagger:</strong> <a href="https://api.venice.ai/doc/api/swagger.yaml">swagger.yaml</a> &middot;
+<strong>Docs:</strong> <a href="https://docs.venice.ai">docs.venice.ai</a> &middot;
+<strong>LLMs.txt:</strong> <a href="https://docs.venice.ai/llms.txt">llms.txt</a></p>
+
+<nav class="toc">
+<h3>Table of Contents</h3>
+<ul>
+<li><a href="#agent-pitfalls">⚠ Agent Pitfalls (Read First)</a></li>
+<li><a href="#model-discovery">Model Discovery</a></li>
+<li><a href="#overview">Overview &amp; Philosophy</a></li>
+<li><a href="#quickstart">Quickstart</a></li>
+<li><a href="#authentication">Authentication</a></li>
+<li><a href="#openai-compat">OpenAI Compatibility</a></li>
+<li><a href="#chat-completions">Chat Completions</a></li>
+<li><a href="#venice-parameters">Venice Parameters</a></li>
+<li><a href="#streaming">Streaming</a></li>
+<li><a href="#structured-responses">Structured Responses</a></li>
+<li><a href="#reasoning">Reasoning Models</a></li>
+<li><a href="#vision">Vision / Multimodal</a></li>
+<li><a href="#tool-calling">Function / Tool Calling</a></li>
+<li><a href="#web-search">Web Search &amp; Scraping</a></li>
+<li><a href="#image-generation">Image Generation</a></li>
+<li><a href="#image-editing">Image Editing &amp; Upscaling</a></li>
+<li><a href="#audio">Audio (TTS, STT, Music)</a></li>
+<li><a href="#video">Video Generation</a></li>
+<li><a href="#embeddings">Embeddings</a></li>
+<li><a href="#characters">Characters API</a></li>
+<li><a href="#privacy">Privacy Architecture</a></li>
+<li><a href="#tee-e2ee">TEE &amp; E2EE Models</a></li>
+<li><a href="#prompt-caching">Prompt Caching</a></li>
+<li><a href="#models">Models Reference</a></li>
+<li><a href="#pricing">Pricing</a></li>
+<li><a href="#x402">X402 Wallet Payments (Agents)</a></li>
+<li><a href="#rate-limits">Rate Limits</a></li>
+<li><a href="#error-codes">Error Codes</a></li>
+<li><a href="#response-headers">Response Headers</a></li>
+<li><a href="#agent-frameworks">Agent Frameworks &amp; Integrations</a></li>
+<li><a href="#claude-code">Claude Code Router</a></li>
+<li><a href="#mcp-server">Venice MCP Server</a></li>
+<li><a href="#best-practices">Best Practices</a></li>
+</ul>
+</nav>
+
+<!-- ───────────────────────────────────────── -->
+<h2 id="agent-pitfalls">⚠ Agent Pitfalls — Read This First</h2>
+
+<p>Venice has excellent capabilities but several <strong>default behaviors that trip up agents</strong>. Read these before making your first call.</p>
+
+<div class="callout callout-important">
+<h3 style="margin-top:0">1. /models returns TEXT ONLY by default</h3>
+<p><code>GET /models</code> returns ~68 text models. Image, video, audio, and embedding models are <strong>hidden</strong> unless you pass a <code>?type=</code> filter. This is the #1 reason agents fail to discover non-text capabilities.</p>
+<pre><code># ❌ Returns text models only
+GET /api/v1/models
+
+# ✅ Filter by type to discover other models
+GET /api/v1/models?type=image       # 26+ image models
+GET /api/v1/models?type=video       # 65+ video models
+GET /api/v1/models?type=tts         # 3 TTS models
+GET /api/v1/models?type=embedding   # 1 embedding model
+GET /api/v1/models?type=music       # music generation models</code></pre>
+</div>
+
+<div class="callout callout-important">
+<h3 style="margin-top:0">2. ?type=audio returns ZERO models</h3>
+<p>The TTS model type is <code>tts</code>, NOT <code>audio</code>. Using <code>?type=audio</code> returns an empty list.</p>
+<pre><code># ❌ Returns 0 models
+GET /api/v1/models?type=audio
+
+# ✅ Correct
+GET /api/v1/models?type=tts</code></pre>
+</div>
+
+<div class="callout callout-important">
+<h3 style="margin-top:0">3. STT models are INVISIBLE in the /models API</h3>
+<p>Speech-to-text models (<code>nvidia/parakeet-tdt-0.6b-v3</code>, <code>openai/whisper-large-v3</code>) do NOT appear in any <code>/models</code> query. You must know the model names in advance. They are hardcoded here for your convenience.</p>
+</div>
+
+<div class="callout callout-important">
+<h3 style="margin-top:0">4. Image API is NOT OpenAI-compatible</h3>
+<p>Venice's image endpoints use different paths and response formats than OpenAI:</p>
+<table>
+<tr><th></th><th>OpenAI</th><th>Venice</th></tr>
+<tr><td><strong>Endpoint</strong></td><td><code>/images/generations</code></td><td><code>/image/generate</code></td></tr>
+<tr><td><strong>Response</strong></td><td><code>data[0].b64_json</code></td><td><code>images[0]</code> (raw base64)</td></tr>
+<tr><td><strong>SDK</strong></td><td><code>client.images.generate()</code></td><td>Use <code>requests.post()</code> or <code>fetch()</code> directly</td></tr>
+</table>
+<p>Agents using the OpenAI SDK's <code>client.images.generate()</code> will fail. Use raw HTTP requests for images.</p>
+</div>
+
+<div class="callout callout-important">
+<h3 style="margin-top:0">5. Video and Music are async-only (queue + poll)</h3>
+<p>Unlike text/image/TTS (which return results immediately), video and music use an async pattern:</p>
+<ol>
+<li><code>POST /video/queue</code> → get <code>queue_id</code></li>
+<li><code>POST /video/retrieve</code> with <code>queue_id</code> → poll until <code>Content-Type: video/mp4</code></li>
+</ol>
+<p>There is no synchronous video/music endpoint. See the <a href="#video">Video</a> and <a href="#audio">Audio</a> sections for complete polling examples.</p>
+</div>
+
+<h3>Valid /models ?type= Values</h3>
+<table>
+<tr><th>Type</th><th>Returns</th><th>Count</th></tr>
+<tr><td><code>text</code> (default)</td><td>Chat/completion LLMs</td><td>~68</td></tr>
+<tr><td><code>image</code></td><td>Image generation models</td><td>~26</td></tr>
+<tr><td><code>video</code></td><td>Text-to-video and image-to-video</td><td>~65</td></tr>
+<tr><td><code>tts</code></td><td>Text-to-speech models</td><td>3</td></tr>
+<tr><td><code>embedding</code></td><td>Vector embedding models</td><td>1</td></tr>
+<tr><td><code>music</code></td><td>Music generation models</td><td>~6</td></tr>
+<tr><td><em>(none / omitted)</em></td><td>Text models only</td><td>~68</td></tr>
+</table>
+
+<div class="callout callout-tip">
+<strong>STT models not in API:</strong> Use <code>nvidia/parakeet-tdt-0.6b-v3</code> or <code>openai/whisper-large-v3</code> directly — they work but won't appear in any <code>/models</code> query.
+</div>
+
+<!-- ───────────────────────────────────────── -->
+<h2 id="model-discovery">Model Discovery</h2>
+
+<p><span class="method method-get">GET</span> <span class="endpoint">/models</span></p>
+
+<p>Returns models filtered by type. <strong>Defaults to text-only.</strong></p>
+
+<pre><code># Discover image models
+curl "https://api.venice.ai/api/v1/models?type=image" \
+  -H "Authorization: Bearer $VENICE_API_KEY"
+
+# Discover TTS models (NOT ?type=audio!)
+curl "https://api.venice.ai/api/v1/models?type=tts" \
+  -H "Authorization: Bearer $VENICE_API_KEY"
+
+# Discover video models
+curl "https://api.venice.ai/api/v1/models?type=video" \
+  -H "Authorization: Bearer $VENICE_API_KEY"</code></pre>
+
+<h3>Key Response Fields Per Model</h3>
+<pre><code>{
+  "id": "zai-org-glm-4.7",
+  "type": "text",
+  "object": "model",
+  "owned_by": "venice.ai",
+  "model_spec": {
+    "availableContextTokens": 198000,
+    "capabilities": {
+      "supportsFunctionCalling": true,
+      "supportsResponseSchema": true,
+      "supportsWebSearch": true,
+      "supportsReasoning": true,
+      "supportsReasoningEffort": true
+    }
+  }
+}</code></pre>
+
+<h3>Recommended Default Models by Type</h3>
+<table>
+<tr><th>Type</th><th>Recommended Model</th><th>Why</th></tr>
+<tr><td>Text (general)</td><td><code>zai-org-glm-4.7</code></td><td>Best balance of cost, speed, and capability. Private.</td></tr>
+<tr><td>Text (uncensored)</td><td><code>venice-uncensored</code></td><td>No content filtering. Private.</td></tr>
+<tr><td>Text (cheap/fast)</td><td><code>zai-org-glm-4.7-flash</code></td><td>$0.13/M input. Great for classification.</td></tr>
+<tr><td>Text (vision)</td><td><code>qwen3-vl-235b-a22b</code></td><td>Image understanding + text.</td></tr>
+<tr><td>Image</td><td><code>venice-sd35</code></td><td>$0.01/image, Private, works with all features.</td></tr>
+<tr><td>Image (quality)</td><td><code>recraft-v4-pro</code></td><td>$0.29/image, highest quality.</td></tr>
+<tr><td>TTS</td><td><code>tts-kokoro</code></td><td>50+ voices, cheapest ($3.50/1M chars).</td></tr>
+<tr><td>STT</td><td><code>nvidia/parakeet-tdt-0.6b-v3</code></td><td>Fast, accurate. NOT in /models API.</td></tr>
+<tr><td>Embedding</td><td><code>text-embedding-bge-m3</code></td><td>Only option. 1024 dimensions.</td></tr>
+<tr><td>Video</td><td><code>wan-2.6-text-to-video</code></td><td>Good quality, reasonable price.</td></tr>
+<tr><td>Music</td><td><code>ace-step-15</code></td><td>$0.03-0.08 per song. Cheapest.</td></tr>
+</table>
+
+<!-- ───────────────────────────────────────── -->
+<h2 id="overview">Overview &amp; Philosophy</h2>
+
+<p>Venice is a <strong>privacy-first, uncensored AI API platform</strong> offering text generation, image creation, audio synthesis, video generation, music, and embeddings — all with <strong>zero data retention</strong> and full OpenAI SDK compatibility.</p>
+
+<p>Venice provides permissionless access to AI models with no content filtering, making it ideal for developers building applications that require uncensored outputs, privacy guarantees, and full control over AI interactions.</p>
+
+<h3>Four Privacy Tiers</h3>
+<table>
+<tr><th>Tier</th><th>How It Works</th></tr>
+<tr><td><strong>Anonymized</strong></td><td>Third-party models (Claude, GPT, Gemini, Grok) with all identifying metadata stripped before forwarding.</td></tr>
+<tr><td><strong>Private</strong></td><td>Zero data retention. Self-hosted open-source models. No logs, no storage.</td></tr>
+<tr><td><strong>TEE</strong></td><td>Models running inside hardware-secured enclaves (Intel TDX / NVIDIA CC). Venice cannot access the computation.</td></tr>
+<tr><td><strong>E2EE</strong></td><td>End-to-end encrypted. Prompts encrypted client-side before sending. Only the TEE can decrypt them.</td></tr>
+</table>
+
+<h3>Key Differentiators</h3>
+<ul>
+<li>No data collection, no logging of prompts or responses</li>
+<li>Uncensored models available (<code>venice-uncensored</code>)</li>
+<li>OpenAI-compatible — change <code>base_url</code> and you're done</li>
+<li>Venice-specific extensions via <code>venice_parameters</code></li>
+<li>Cryptographically verifiable privacy (TEE attestation)</li>
+<li>Pay-as-you-go USD, crypto, or DIEM staking</li>
+</ul>
+
+<!-- ───────────────────────────────────────── -->
+<h2 id="quickstart">Quickstart</h2>
+
+<h3>1. Get an API Key</h3>
+<p>Generate at <a href="https://venice.ai/settings/api">venice.ai/settings/api</a>.</p>
+
+<h3>2. Set Environment Variable</h3>
+<pre><code>export VENICE_API_KEY='your-api-key-here'</code></pre>
+
+<h3>3. Install SDK (optional — any OpenAI SDK works)</h3>
+<pre><code># Python
+pip install openai
+
+# Node.js
+npm install openai</code></pre>
+
+<h3>4. Make Your First Request</h3>
+
+<h4>Python</h4>
+<pre><code>from openai import OpenAI
+
+client = OpenAI(
+    api_key="your-api-key",
+    base_url="https://api.venice.ai/api/v1"
+)
+
+response = client.chat.completions.create(
+    model="venice-uncensored",
+    messages=[{"role": "user", "content": "Hello World!"}]
+)
+
+print(response.choices[0].message.content)</code></pre>
+
+<h4>JavaScript / TypeScript</h4>
+<pre><code>import OpenAI from "openai";
+
+const client = new OpenAI({
+  apiKey: process.env.VENICE_API_KEY,
+  baseURL: "https://api.venice.ai/api/v1",
+});
+
+const completion = await client.chat.completions.create({
+  model: "venice-uncensored",
+  messages: [{ role: "user", content: "Hello World!" }],
+});
+
+console.log(completion.choices[0].message.content);</code></pre>
+
+<h4>cURL</h4>
+<pre><code>curl https://api.venice.ai/api/v1/chat/completions \
+  -H "Authorization: Bearer $VENICE_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "venice-uncensored",
+    "messages": [{"role": "user", "content": "Hello World!"}]
+  }'</code></pre>
+
+<!-- ───────────────────────────────────────── -->
+<h2 id="authentication">Authentication</h2>
+
+<p>All requests require HTTP Bearer authentication:</p>
+<pre><code>Authorization: Bearer VENICE_API_KEY</code></pre>
+
+<p>API keys are managed at <a href="https://venice.ai/settings/api">venice.ai/settings/api</a>. Keep your key secret — never expose it in client-side code.</p>
+
+<p>You can also manage keys programmatically via the <code>/api/v1/api_keys</code> endpoints.</p>
+
+<!-- ───────────────────────────────────────── -->
+<h2 id="openai-compat">OpenAI Compatibility</h2>
+
+<p>Venice implements the <strong>OpenAI API specification</strong>. Any OpenAI client library works — just change the base URL:</p>
+
+<table>
+<tr><th>Language</th><th>Config Change</th></tr>
+<tr><td>Python</td><td><code>base_url="https://api.venice.ai/api/v1"</code></td></tr>
+<tr><td>JavaScript</td><td><code>baseURL: "https://api.venice.ai/api/v1"</code></td></tr>
+<tr><td>Go</td><td><code>client.BaseURL = "https://api.venice.ai/api/v1"</code></td></tr>
+<tr><td>cURL</td><td>Replace <code>https://api.openai.com/v1</code> with <code>https://api.venice.ai/api/v1</code></td></tr>
+<tr><td>PHP / C# / Java / Swift</td><td>Set base URL to <code>https://api.venice.ai/api/v1</code></td></tr>
+</table>
+
+<h3>Key Differences from OpenAI</h3>
+<ol>
+<li><strong><code>venice_parameters</code></strong> — additional config for web search, characters, reasoning control, etc.</li>
+<li><strong>System Prompts</strong> — Venice appends defaults that optimize uncensored responses (disable with <code>include_venice_system_prompt: false</code>).</li>
+<li><strong>Model IDs</strong> — Use Venice model IDs (e.g. <code>zai-org-glm-4.7</code>), not OpenAI's.</li>
+<li><strong>Response Headers</strong> — Unique headers for balance tracking, model deprecation, content safety.</li>
+<li><strong>Content Policies</strong> — More permissive, with dedicated uncensored models.</li>
+<li><strong>/models returns text-only by default</strong> — Must use <code>?type=image</code>, <code>?type=tts</code>, etc. to discover non-text models. See <a href="#agent-pitfalls">Agent Pitfalls</a>.</li>
+<li><strong>Image API is NOT compatible</strong> — Different endpoint path (<code>/image/generate</code> vs <code>/images/generations</code>) and response format (<code>images[]</code> vs <code>data[].b64_json</code>). Do not use <code>client.images.generate()</code>.</li>
+<li><strong>Video/Music are async-only</strong> — Queue + poll pattern, not synchronous.</li>
+</ol>
+
+<h3>What IS OpenAI-Compatible (drop-in)</h3>
+<table>
+<tr><th>Endpoint</th><th>Compatible?</th><th>Notes</th></tr>
+<tr><td><code>/chat/completions</code></td><td>✅ Yes</td><td>Full drop-in. Tools, streaming, structured output all work.</td></tr>
+<tr><td><code>/audio/speech</code></td><td>✅ Yes</td><td>Same request/response format as OpenAI TTS.</td></tr>
+<tr><td><code>/audio/transcriptions</code></td><td>✅ Yes</td><td>Same multipart format. Use Venice model names.</td></tr>
+<tr><td><code>/embeddings</code></td><td>✅ Yes</td><td>Same request/response format.</td></tr>
+<tr><td><code>/models</code></td><td>⚠️ Partial</td><td>Same format but defaults to text-only. Must filter by type.</td></tr>
+<tr><td><code>/image/generate</code></td><td>❌ No</td><td>Different path AND response format. Use raw HTTP.</td></tr>
+<tr><td><code>/video/queue</code></td><td>❌ No</td><td>Venice-specific async pattern.</td></tr>
+</table>
+
+<!-- ───────────────────────────────────────── -->
+<h2 id="chat-completions">Chat Completions</h2>
+
+<p><span class="method method-post">POST</span> <span class="endpoint">/chat/completions</span></p>
+
+<p>The primary text generation endpoint. Supports text, vision, tool calling, streaming, and multimodal inputs (images, audio, video).</p>
+
+<h3>Request Body</h3>
+<table>
+<tr><th>Field</th><th>Type</th><th>Description</th></tr>
+<tr><td><code>model</code></td><td>string (required)</td><td>Model ID, e.g. <code>zai-org-glm-4.7</code></td></tr>
+<tr><td><code>messages</code></td><td>array (required)</td><td>Array of message objects with <code>role</code> and <code>content</code></td></tr>
+<tr><td><code>temperature</code></td><td>number</td><td>Sampling temperature (0-2). Default: model-specific</td></tr>
+<tr><td><code>max_tokens</code></td><td>integer</td><td>Max tokens in the response</td></tr>
+<tr><td><code>top_p</code></td><td>number</td><td>Nucleus sampling (0-1)</td></tr>
+<tr><td><code>frequency_penalty</code></td><td>number</td><td>Penalize repeated tokens (-2 to 2)</td></tr>
+<tr><td><code>presence_penalty</code></td><td>number</td><td>Penalize new topic tokens (-2 to 2)</td></tr>
+<tr><td><code>stream</code></td><td>boolean</td><td>Enable SSE streaming</td></tr>
+<tr><td><code>tools</code></td><td>array</td><td>Function definitions for tool calling</td></tr>
+<tr><td><code>response_format</code></td><td>object</td><td>Structured output schema (JSON mode)</td></tr>
+<tr><td><code>reasoning_effort</code></td><td>string</td><td>Control reasoning depth: <code>none</code>, <code>low</code>, <code>medium</code>, <code>high</code>, <code>xhigh</code>, <code>max</code></td></tr>
+<tr><td><code>venice_parameters</code></td><td>object</td><td>Venice-specific extensions (see below)</td></tr>
+</table>
+
+<h3>Message Roles</h3>
+<table>
+<tr><th>Role</th><th>Purpose</th></tr>
+<tr><td><code>system</code></td><td>Instructions for model behavior</td></tr>
+<tr><td><code>user</code></td><td>Prompts or questions</td></tr>
+<tr><td><code>assistant</code></td><td>Previous model responses (multi-turn)</td></tr>
+<tr><td><code>tool</code></td><td>Function calling results</td></tr>
+</table>
+
+<!-- ───────────────────────────────────────── -->
+<h2 id="venice-parameters">Venice Parameters</h2>
+
+<p>The <code>venice_parameters</code> object extends the OpenAI spec with Venice-specific features. Pass it as a top-level field in your request body.</p>
+
+<table>
+<tr><th>Parameter</th><th>Type</th><th>Default</th><th>Description</th></tr>
+<tr><td><code>enable_web_search</code></td><td><code>"off"</code> | <code>"on"</code> | <code>"auto"</code></td><td><code>"off"</code></td><td>Enable real-time web search. Additional pricing applies ($10/1K requests).</td></tr>
+<tr><td><code>enable_web_scraping</code></td><td>boolean</td><td><code>false</code></td><td>Scrape up to 5 URLs detected in user message. $10/1K URLs.</td></tr>
+<tr><td><code>enable_x_search</code></td><td>boolean</td><td><code>false</code></td><td>Enable xAI native search (web + X/Twitter) for Grok models.</td></tr>
+<tr><td><code>enable_web_citations</code></td><td>boolean</td><td><code>false</code></td><td>Include <code>[REF]0[/REF]</code> citations in web search results.</td></tr>
+<tr><td><code>include_search_results_in_stream</code></td><td>boolean</td><td><code>false</code></td><td>Experimental: emit search results as first stream chunk.</td></tr>
+<tr><td><code>return_search_results_as_documents</code></td><td>boolean</td><td><code>false</code></td><td>Return results as <code>venice_web_search_documents</code> tool call (LangChain compatible).</td></tr>
+<tr><td><code>include_venice_system_prompt</code></td><td>boolean</td><td><code>true</code></td><td>Include Venice's default system prompts alongside yours.</td></tr>
+<tr><td><code>strip_thinking_response</code></td><td>boolean</td><td><code>false</code></td><td>Strip <code>&lt;think&gt;</code> blocks from response (legacy tag format).</td></tr>
+<tr><td><code>disable_thinking</code></td><td>boolean</td><td><code>false</code></td><td>Disable reasoning and strip thinking blocks entirely.</td></tr>
+<tr><td><code>character_slug</code></td><td>string</td><td>—</td><td>Use a specific AI character persona.</td></tr>
+</table>
+
+<h3>Model Suffix Syntax</h3>
+<p>Venice parameters can also be appended directly to the model name as URL-style suffixes. Useful for SDKs that don't support extra body parameters:</p>
+<pre><code>"model": "zai-org-glm-4.7:enable_web_search=auto&enable_web_citations=true"</code></pre>
+
+<h3>Passing Venice Parameters by SDK</h3>
+<table>
+<tr><th>SDK</th><th>Syntax</th></tr>
+<tr><td>cURL / raw JSON</td><td><code>"venice_parameters": { ... }</code> at top level</td></tr>
+<tr><td>Python OpenAI</td><td><code>extra_body={"venice_parameters": { ... }}</code></td></tr>
+<tr><td>JavaScript OpenAI</td><td><code>venice_parameters: { ... }</code> at top level (TypeScript: <code>// @ts-ignore</code>)</td></tr>
+<tr><td>Go / PHP / C# / Java</td><td>Use model suffix syntax</td></tr>
+</table>
+
+<!-- ───────────────────────────────────────── -->
+<h2 id="streaming">Streaming</h2>
+
+<p>Set <code>stream: true</code> for Server-Sent Events (SSE) streaming:</p>
+
+<pre><code># Python
+stream = client.chat.completions.create(
+    model="venice-uncensored",
+    messages=[{"role": "user", "content": "Write a story"}],
+    stream=True
+)
+for chunk in stream:
+    if chunk.choices and chunk.choices[0].delta.content is not None:
+        print(chunk.choices[0].delta.content, end="")</code></pre>
+
+<pre><code>// JavaScript
+const stream = await client.chat.completions.create({
+    model: "venice-uncensored",
+    messages: [{ role: "user", content: "Write a story" }],
+    stream: true
+});
+for await (const chunk of stream) {
+    if (chunk.choices?.[0]?.delta?.content) {
+        process.stdout.write(chunk.choices[0].delta.content);
+    }
+}</code></pre>
+
+<!-- ───────────────────────────────────────── -->
+<h2 id="structured-responses">Structured Responses (JSON Schema)</h2>
+
+<p>Force the model to output JSON matching a specific schema using <code>response_format</code>:</p>
+
+<pre><code>{
+  "model": "venice-uncensored",
+  "messages": [
+    {"role": "system", "content": "You are a helpful math tutor."},
+    {"role": "user", "content": "solve 8x + 31 = 2"}
+  ],
+  "response_format": {
+    "type": "json_schema",
+    "json_schema": {
+      "name": "math_response",
+      "strict": true,
+      "schema": {
+        "type": "object",
+        "properties": {
+          "steps": {
+            "type": "array",
+            "items": {
+              "type": "object",
+              "properties": {
+                "explanation": {"type": "string"},
+                "output": {"type": "string"}
+              },
+              "required": ["explanation", "output"],
+              "additionalProperties": false
+            }
+          },
+          "final_answer": {"type": "string"}
+        },
+        "required": ["steps", "final_answer"],
+        "additionalProperties": false
+      }
+    }
+  }
+}</code></pre>
+
+<div class="callout callout-important">
+<strong>Requirements:</strong>
+<ul>
+<li><code>strict</code> must be <code>true</code></li>
+<li><code>additionalProperties</code> must be <code>false</code> at every level</li>
+<li>All fields must have a <code>required</code> tag. Use <code>"type": ["string", "null"]</code> for optional fields.</li>
+<li>Check model supports it: look for <code>supportsResponseSchema: true</code> in <code>/v1/models</code></li>
+<li>Not compatible with parallel function calls</li>
+</ul>
+</div>
+
+<!-- ───────────────────────────────────────── -->
+<h2 id="reasoning">Reasoning Models</h2>
+
+<p>Some models produce visible chain-of-thought reasoning. Thinking appears in a separate <code>reasoning_content</code> field, keeping <code>content</code> clean.</p>
+
+<pre><code>response = client.chat.completions.create(
+    model="zai-org-glm-4.7",
+    messages=[{"role": "user", "content": "What is 15% of 240?"}]
+)
+thinking = response.choices[0].message.reasoning_content
+answer = response.choices[0].message.content</code></pre>
+
+<h3>Reasoning Effort Levels</h3>
+<table>
+<tr><th>Value</th><th>Description</th></tr>
+<tr><td><code>none</code></td><td>Disables reasoning</td></tr>
+<tr><td><code>minimal</code></td><td>Basic reasoning</td></tr>
+<tr><td><code>low</code></td><td>Light reasoning for simple problems</td></tr>
+<tr><td><code>medium</code></td><td>Balanced (recommended default)</td></tr>
+<tr><td><code>high</code></td><td>Deep reasoning for complex problems</td></tr>
+<tr><td><code>xhigh</code></td><td>Extra-high depth</td></tr>
+<tr><td><code>max</code></td><td>Maximum capability</td></tr>
+</table>
+
+<pre><code># Pass via reasoning object
+extra_body={"reasoning": {"effort": "high"}}
+
+# Or flat format
+extra_body={"reasoning_effort": "high"}</code></pre>
+
+<h3>Model Support for Reasoning Effort</h3>
+<table>
+<tr><th>Model</th><th>Supported Values</th></tr>
+<tr><td>GPT-5.2</td><td><code>none</code>, <code>low</code>, <code>medium</code>, <code>high</code>, <code>xhigh</code></td></tr>
+<tr><td>Claude Opus 4.6</td><td><code>low</code>, <code>medium</code>, <code>high</code>, <code>max</code></td></tr>
+<tr><td>Claude Opus 4.5, Sonnet 4.5/4.6</td><td><code>low</code>, <code>medium</code>, <code>high</code></td></tr>
+<tr><td>Gemini 3 Pro</td><td><code>low</code>, <code>high</code></td></tr>
+<tr><td>Gemini 3.1 Pro</td><td><code>low</code>, <code>medium</code>, <code>high</code></td></tr>
+<tr><td>GLM 4.7, Qwen 3 Thinking, Kimi K2.5, MiniMax</td><td><code>low</code>, <code>medium</code>, <code>high</code></td></tr>
+<tr><td>Grok models</td><td>Not supported</td></tr>
+<tr><td>DeepSeek R1</td><td>Built-in only, not configurable</td></tr>
+</table>
+
+<h3>Disabling Reasoning</h3>
+<pre><code># Recommended: Venice-level toggle
+extra_body={"reasoning": {"enabled": False}}
+
+# Alternative: provider-level (only some models)
+extra_body={"reasoning": {"effort": "none"}}</code></pre>
+
+<!-- ───────────────────────────────────────── -->
+<h2 id="vision">Vision / Multimodal</h2>
+
+<p>Pass images alongside text using vision-capable models. Images can be URLs or base64 data URIs.</p>
+
+<pre><code>response = client.chat.completions.create(
+    model="qwen3-vl-235b-a22b",
+    messages=[{
+        "role": "user",
+        "content": [
+            {"type": "text", "text": "What is in this image?"},
+            {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
+        ]
+    }]
+)</code></pre>
+
+<p><strong>Vision models:</strong> <code>qwen3-vl-235b-a22b</code>, <code>mistral-small-3-2-24b-instruct</code> (with suffix), and E2EE variants.</p>
+
+<!-- ───────────────────────────────────────── -->
+<h2 id="tool-calling">Function / Tool Calling</h2>
+
+<p>Define tools for models to call external APIs. Works with the OpenAI function calling spec. Below is a <strong>complete round-trip</strong> showing the full loop.</p>
+
+<h3>Step 1: Send request with tool definitions</h3>
+<pre><code>import json
+from openai import OpenAI
+
+client = OpenAI(api_key="your-key", base_url="https://api.venice.ai/api/v1")
+
+tools = [{
+    "type": "function",
+    "function": {
+        "name": "get_weather",
+        "description": "Get current weather for a location",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "location": {"type": "string", "description": "City name"}
+            },
+            "required": ["location"]
+        }
+    }
+}]
+
+messages = [{"role": "user", "content": "What's the weather in NYC?"}]
+
+response = client.chat.completions.create(
+    model="zai-org-glm-4.7",
+    messages=messages,
+    tools=tools,
+    tool_choice="auto"  # "auto" | "required" | {"type":"function","function":{"name":"get_weather"}}
+)</code></pre>
+
+<h3>Step 2: Model returns a tool_calls array</h3>
+<pre><code># The response.choices[0].message looks like:
+# {
+#   "role": "assistant",
+#   "content": null,
+#   "tool_calls": [
+#     {
+#       "id": "call_abc123",
+#       "type": "function",
+#       "function": {
+#         "name": "get_weather",
+#         "arguments": "{\"location\": \"New York\"}"
+#       }
+#     }
+#   ]
+# }
+
+assistant_message = response.choices[0].message
+tool_call = assistant_message.tool_calls[0]
+function_args = json.loads(tool_call.function.arguments)</code></pre>
+
+<h3>Step 3: Execute the function and feed the result back</h3>
+<pre><code># Execute your actual function
+weather_result = {"temperature": 72, "condition": "sunny", "humidity": 45}
+
+# Append the assistant's tool call message + the tool result
+messages.append(assistant_message)
+messages.append({
+    "role": "tool",
+    "tool_call_id": tool_call.id,        # must match the tool_call id
+    "content": json.dumps(weather_result)  # result as a JSON string
+})
+
+# Get the final response
+final_response = client.chat.completions.create(
+    model="zai-org-glm-4.7",
+    messages=messages,
+    tools=tools
+)
+
+print(final_response.choices[0].message.content)
+# "The current weather in New York is 72°F and sunny with 45% humidity."</code></pre>
+
+<h3>tool_choice Options</h3>
+<table>
+<tr><th>Value</th><th>Behavior</th></tr>
+<tr><td><code>"auto"</code></td><td>Model decides whether to call a tool (default)</td></tr>
+<tr><td><code>"required"</code></td><td>Model must call at least one tool</td></tr>
+<tr><td><code>"none"</code></td><td>Model must not call any tools</td></tr>
+<tr><td><code>{"type": "function", "function": {"name": "get_weather"}}</code></td><td>Force a specific tool</td></tr>
+</table>
+
+<div class="callout callout-important">
+<strong>Note:</strong> Some models may return multiple tool calls in a single response (parallel tool calling). Process each one and return a separate <code>role: "tool"</code> message for each <code>tool_call_id</code>. Parallel tool calls are not compatible with structured <code>response_format</code>.
+</div>
+
+<p><strong>Models with function calling:</strong> <code>zai-org-glm-4.7</code>, <code>zai-org-glm-5</code>, <code>qwen3-4b</code>, <code>mistral-small-3-2-24b-instruct</code>, <code>llama-3.2-3b</code>, and all Claude / GPT / Gemini / Grok models. Check <code>supportsFunctionCalling</code> in the <code>/models</code> endpoint.</p>
+
+<!-- ───────────────────────────────────────── -->
+<h2 id="web-search">Web Search &amp; Scraping</h2>
+
+<p>Enable real-time web search on any text model:</p>
+
+<pre><code># Via venice_parameters
+{
+  "model": "zai-org-glm-4.7",
+  "messages": [{"role": "user", "content": "Latest AI news"}],
+  "venice_parameters": {
+    "enable_web_search": "auto",
+    "enable_web_citations": true
+  }
+}
+
+# Via model suffix
+{
+  "model": "zai-org-glm-4.7:enable_web_search=on&enable_web_citations=true",
+  "messages": [{"role": "user", "content": "Latest AI news"}]
+}</code></pre>
+
+<h3>Web Scraping</h3>
+<p>Automatically scrapes up to 5 URLs detected in the user message:</p>
+<pre><code>"venice_parameters": {
+  "enable_web_scraping": true
+}</code></pre>
+
+<h3>X Search (xAI)</h3>
+<p>For Grok models, enables xAI's native search across web + X/Twitter:</p>
+<pre><code>"venice_parameters": {
+  "enable_x_search": true
+}</code></pre>
+
+<h3>Pricing</h3>
+<table>
+<tr><th>Feature</th><th>Price</th></tr>
+<tr><td>Web Search</td><td>$10.00 / 1K requests</td></tr>
+<tr><td>Web Scraping</td><td>$10.00 / 1K URLs</td></tr>
+<tr><td>X Search</td><td>$10.00 / 1K results</td></tr>
+</table>
+
+<!-- ───────────────────────────────────────── -->
+<h2 id="image-generation">Image Generation</h2>
+
+<p><span class="method method-post">POST</span> <span class="endpoint">/image/generate</span></p>
+
+<div class="callout callout-important">
+<strong>NOT OpenAI-compatible.</strong> Venice uses <code>/image/generate</code> (not <code>/images/generations</code>) and returns <code>images[0]</code> (not <code>data[0].b64_json</code>). Do NOT use the OpenAI SDK's <code>client.images.generate()</code> — use raw HTTP requests instead.
+</div>
+
+<pre><code>curl https://api.venice.ai/api/v1/image/generate \
+  -H "Authorization: Bearer $VENICE_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "venice-sd35",
+    "prompt": "A cyberpunk city with neon lights and rain",
+    "width": 1024,
+    "height": 1024,
+    "format": "webp"
+  }'</code></pre>
+
+<h3>Request Parameters</h3>
+<table>
+<tr><th>Field</th><th>Type</th><th>Default</th><th>Description</th></tr>
+<tr><td><code>model</code></td><td>string (required)</td><td>—</td><td>Image model ID</td></tr>
+<tr><td><code>prompt</code></td><td>string (required)</td><td>—</td><td>Image description (max 7500 chars)</td></tr>
+<tr><td><code>width</code></td><td>integer</td><td>1024</td><td>Width in pixels (max 1280)</td></tr>
+<tr><td><code>height</code></td><td>integer</td><td>1024</td><td>Height in pixels (max 1280)</td></tr>
+<tr><td><code>format</code></td><td><code>jpeg</code>|<code>png</code>|<code>webp</code></td><td><code>webp</code></td><td>Output format</td></tr>
+<tr><td><code>negative_prompt</code></td><td>string</td><td>—</td><td>What to exclude from the image</td></tr>
+<tr><td><code>cfg_scale</code></td><td>number</td><td>7.5</td><td>Prompt adherence (0-20)</td></tr>
+<tr><td><code>seed</code></td><td>integer</td><td>random</td><td>Reproducibility seed</td></tr>
+<tr><td><code>variants</code></td><td>integer</td><td>1</td><td>Number of images (1-4, requires <code>return_binary: false</code>)</td></tr>
+<tr><td><code>style_preset</code></td><td>string</td><td>—</td><td>e.g. "3D Model", "Anime", etc.</td></tr>
+<tr><td><code>aspect_ratio</code></td><td>string</td><td>—</td><td>e.g. "1:1", "16:9" (certain models)</td></tr>
+<tr><td><code>resolution</code></td><td>string</td><td>—</td><td>"1K", "2K", "4K" (certain models like Nano Banana)</td></tr>
+<tr><td><code>safe_mode</code></td><td>boolean</td><td><code>true</code></td><td>Blur adult content</td></tr>
+<tr><td><code>return_binary</code></td><td>boolean</td><td><code>false</code></td><td>Return raw image bytes instead of base64 JSON</td></tr>
+<tr><td><code>embed_exif_metadata</code></td><td>boolean</td><td><code>false</code></td><td>Embed prompt info in EXIF</td></tr>
+<tr><td><code>hide_watermark</code></td><td>boolean</td><td><code>false</code></td><td>Hide Venice watermark</td></tr>
+</table>
+
+<h3>Response (return_binary: false — default)</h3>
+<pre><code>{
+  "id": "generate-image-1234567890",
+  "images": [
+    "/9j/4AAQSkZJRgABAQ..."   // base64-encoded image data
+  ],
+  "timing": {
+    "total": 3200,
+    "inferenceDuration": 2800,
+    "inferencePreprocessingTime": 150,
+    "inferenceQueueTime": 250
+  }
+}</code></pre>
+
+<h4>Save to file (Python)</h4>
+<pre><code>import base64, requests, os
+
+resp = requests.post("https://api.venice.ai/api/v1/image/generate",
+    headers={"Authorization": f"Bearer {os.getenv('VENICE_API_KEY')}",
+             "Content-Type": "application/json"},
+    json={"model": "venice-sd35", "prompt": "A sunset over Venice", "width": 1024, "height": 1024}
+).json()
+
+with open("output.webp", "wb") as f:
+    f.write(base64.b64decode(resp["images"][0]))</code></pre>
+
+<h4>Response (return_binary: true)</h4>
+<p>Returns raw image bytes with <code>Content-Type: image/webp</code> (or <code>image/png</code>, <code>image/jpeg</code> depending on <code>format</code>). Save directly:</p>
+<pre><code>curl https://api.venice.ai/api/v1/image/generate \
+  -H "Authorization: Bearer $VENICE_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{"model":"venice-sd35","prompt":"A sunset","return_binary":true}' \
+  -o output.webp</code></pre>
+
+<h3>Image Models (Selection)</h3>
+<table>
+<tr><th>Model</th><th>ID</th><th>Price</th><th>Privacy</th></tr>
+<tr><td>Recraft V4 Pro</td><td><code>recraft-v4-pro</code></td><td>$0.29/img</td><td><span class="badge badge-anon">Anonymized</span></td></tr>
+<tr><td>GPT Image 1.5</td><td><code>gpt-image-1-5</code></td><td>$0.26/img</td><td><span class="badge badge-anon">Anonymized</span></td></tr>
+<tr><td>Nano Banana Pro</td><td><code>nano-banana-pro</code></td><td>$0.18-$0.35</td><td><span class="badge badge-anon">Anonymized</span></td></tr>
+<tr><td>Qwen Image 2 Pro</td><td><code>qwen-image-2-pro</code></td><td>$0.10/img</td><td><span class="badge badge-anon">Anonymized</span></td></tr>
+<tr><td>Flux 2 Max</td><td><code>flux-2-max</code></td><td>$0.09/img</td><td><span class="badge badge-anon">Anonymized</span></td></tr>
+<tr><td>Venice SD35</td><td><code>venice-sd35</code></td><td>$0.01/img</td><td><span class="badge badge-private">Private</span></td></tr>
+<tr><td>Qwen Image</td><td><code>qwen-image</code></td><td>$0.01/img</td><td><span class="badge badge-private">Private</span></td></tr>
+<tr><td>Chroma</td><td><code>chroma</code></td><td>$0.01/img</td><td><span class="badge badge-private">Private</span></td></tr>
+<tr><td>Z-Image Turbo</td><td><code>z-image-turbo</code></td><td>$0.01/img</td><td><span class="badge badge-private">Private</span></td></tr>
+</table>
+
+<!-- ───────────────────────────────────────── -->
+<h2 id="image-editing">Image Editing &amp; Upscaling</h2>
+
+<h3>Image Editing</h3>
+<p><span class="method method-post">POST</span> <span class="endpoint">/image/edit</span></p>
+<p>AI-powered inpainting. Send base64-encoded image + text prompt. Returns edited image as <strong>raw binary</strong> (<code>Content-Type: image/png</code>).</p>
+
+<pre><code># Python — edit an image and save result
+import base64, requests, os
+
+with open("photo.jpg", "rb") as f:
+    image_b64 = base64.b64encode(f.read()).decode()
+
+resp = requests.post("https://api.venice.ai/api/v1/image/edit",
+    headers={"Authorization": f"Bearer {os.getenv('VENICE_API_KEY')}",
+             "Content-Type": "application/json"},
+    json={"prompt": "Make it look like a watercolor painting", "image": image_b64}
+)
+
+with open("edited.png", "wb") as f:
+    f.write(resp.content)</code></pre>
+
+<pre><code># cURL
+curl -X POST https://api.venice.ai/api/v1/image/edit \
+  -H "Authorization: Bearer $VENICE_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{"prompt": "Colorize", "image": "'$(base64 -i photo.jpg)'"}' \
+  -o edited.png</code></pre>
+
+<p><strong>Edit models:</strong> <code>qwen-image</code> (default), <code>flux-2-max-edit</code>, <code>gpt-image-1-5-edit</code>, <code>qwen-image-2-edit</code>, <code>nano-banana-pro-edit</code>, <code>seedream-v4-edit</code>, <code>grok-imagine-edit</code>.</p>
+
+<h3>Multi-Edit</h3>
+<p><span class="method method-post">POST</span> <span class="endpoint">/image/multi-edit</span></p>
+<p>Combine and edit up to 3 images with layered inputs. Send an array of base64 images with per-layer prompts. Returns binary image data.</p>
+
+<h3>Image Upscaling</h3>
+<p><span class="method method-post">POST</span> <span class="endpoint">/image/upscale</span></p>
+<p>Returns upscaled image as <strong>raw binary</strong> (<code>Content-Type: image/png</code>).</p>
+
+<pre><code># Python
+import base64, requests, os
+
+with open("small.jpg", "rb") as f:
+    image_b64 = base64.b64encode(f.read()).decode()
+
+resp = requests.post("https://api.venice.ai/api/v1/image/upscale",
+    headers={"Authorization": f"Bearer {os.getenv('VENICE_API_KEY')}",
+             "Content-Type": "application/json"},
+    json={"image": image_b64, "scale": 4}   # 2 or 4
+)
+
+with open("upscaled.png", "wb") as f:
+    f.write(resp.content)</code></pre>
+
+<pre><code># cURL
+curl -X POST https://api.venice.ai/api/v1/image/upscale \
+  -H "Authorization: Bearer $VENICE_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{"image": "'$(base64 -i small.jpg)'", "scale": 2}' \
+  -o upscaled.png</code></pre>
+
+<p>Pricing: 2x = $0.02, 4x = $0.08. Model: <code>upscaler</code>.</p>
+
+<h3>Background Removal</h3>
+<p><span class="method method-post">POST</span> <span class="endpoint">/image/background-remove</span></p>
+<p>Returns a <strong>PNG with transparent background</strong>. Accepts base64, file upload (multipart), or URL.</p>
+
+<pre><code># Via URL (easiest)
+curl -X POST https://api.venice.ai/api/v1/image/background-remove \
+  -H "Authorization: Bearer $VENICE_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{"image_url": "https://example.com/photo.jpg"}' \
+  -o no-bg.png
+
+# Via base64
+curl -X POST https://api.venice.ai/api/v1/image/background-remove \
+  -H "Authorization: Bearer $VENICE_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{"image": "'$(base64 -i photo.jpg)'"}' \
+  -o no-bg.png</code></pre>
+
+<p>Model: <code>bria-bg-remover</code>. $0.03/image. Max file size: 25MB.</p>
+
+<!-- ───────────────────────────────────────── -->
+<h2 id="audio">Audio (TTS, STT, Music)</h2>
+
+<h3>Text-to-Speech</h3>
+<p><span class="method method-post">POST</span> <span class="endpoint">/audio/speech</span></p>
+
+<pre><code>{
+  "input": "Hello, welcome to Venice Voice.",
+  "model": "tts-kokoro",
+  "voice": "af_sky",
+  "response_format": "mp3",
+  "speed": 1.0
+}</code></pre>
+
+<table>
+<tr><th>Field</th><th>Type</th><th>Description</th></tr>
+<tr><td><code>input</code></td><td>string (required)</td><td>Text to speak (max 4096 chars)</td></tr>
+<tr><td><code>model</code></td><td>string</td><td><code>tts-kokoro</code>, <code>tts-qwen3-0-6b</code>, or <code>tts-qwen3-1-7b</code></td></tr>
+<tr><td><code>voice</code></td><td>string</td><td>Voice ID (60+ options). Default: <code>af_sky</code></td></tr>
+<tr><td><code>response_format</code></td><td>string</td><td><code>mp3</code>, <code>opus</code>, <code>aac</code>, <code>flac</code>, <code>wav</code>, <code>pcm</code></td></tr>
+<tr><td><code>speed</code></td><td>number</td><td>0.25 to 4.0 (default 1.0)</td></tr>
+<tr><td><code>streaming</code></td><td>boolean</td><td>Stream sentence by sentence</td></tr>
+<tr><td><code>language</code></td><td>string</td><td>Qwen 3 TTS only: Auto, English, Chinese, Spanish, French, etc.</td></tr>
+<tr><td><code>prompt</code></td><td>string</td><td>Qwen 3 TTS only: emotion/style prompt (max 500 chars)</td></tr>
+</table>
+
+<h4>Kokoro Voices (selection)</h4>
+<p><code>af_sky</code>, <code>af_nova</code>, <code>af_bella</code>, <code>af_heart</code>, <code>am_adam</code>, <code>am_echo</code>, <code>am_liam</code>, <code>am_michael</code>, <code>bf_emma</code>, <code>bf_lily</code>, <code>bm_george</code>, <code>zf_xiaobei</code>, <code>jm_kumo</code>, <code>ff_siwis</code>, <code>pf_dora</code>, and many more.</p>
+
+<h4>Qwen 3 TTS Voices</h4>
+<p><code>Vivian</code>, <code>Serena</code>, <code>Dylan</code>, <code>Eric</code>, <code>Ryan</code>, <code>Aiden</code>, <code>Ono_Anna</code>, <code>Sohee</code>, <code>Uncle_Fu</code>.</p>
+
+<div class="callout callout-tip">
+<strong>TTS Response:</strong> Returns <strong>raw audio bytes</strong> (not JSON). The <code>Content-Type</code> matches your <code>response_format</code> (e.g. <code>audio/mpeg</code> for mp3). Save directly to file.
+</div>
+
+<pre><code># cURL — save TTS output
+curl -X POST https://api.venice.ai/api/v1/audio/speech \
+  -H "Authorization: Bearer $VENICE_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{"input":"Hello world","model":"tts-kokoro","voice":"af_sky"}' \
+  -o speech.mp3</code></pre>
+
+<h3>Speech-to-Text</h3>
+<p><span class="method method-post">POST</span> <span class="endpoint">/audio/transcriptions</span></p>
+<p>Multipart form upload. Supports WAV, FLAC, MP3, M4A, AAC, MP4.</p>
+
+<pre><code># cURL
+curl -X POST https://api.venice.ai/api/v1/audio/transcriptions \
+  -H "Authorization: Bearer $VENICE_API_KEY" \
+  -F file=@recording.mp3 \
+  -F model=nvidia/parakeet-tdt-0.6b-v3 \
+  -F response_format=json</code></pre>
+
+<pre><code># Python
+from openai import OpenAI
+
+client = OpenAI(api_key="your-key", base_url="https://api.venice.ai/api/v1")
+
+with open("recording.mp3", "rb") as f:
+    transcript = client.audio.transcriptions.create(
+        model="nvidia/parakeet-tdt-0.6b-v3",
+        file=f,
+        response_format="json"
+    )
+print(transcript.text)</code></pre>
+
+<p><strong>Models:</strong> <code>nvidia/parakeet-tdt-0.6b-v3</code>, <code>openai/whisper-large-v3</code>. Add <code>timestamps=true</code> for word-level timing.</p>
+
+<h3>Music Generation (Async)</h3>
+<p><span class="method method-post">POST</span> <span class="endpoint">/audio/queue</span> → poll <span class="endpoint">/audio/retrieve</span></p>
+
+<p>Same async pattern as video. Queue a job, get a <code>queue_id</code>, poll until audio bytes are returned.</p>
+
+<pre><code># Step 1: Get a price quote (optional)
+curl -X POST https://api.venice.ai/api/v1/audio/quote \
+  -H "Authorization: Bearer $VENICE_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{"model":"ace-step-15","duration_seconds":60}'
+# → {"quote": 0.03}
+
+# Step 2: Queue generation
+curl -X POST https://api.venice.ai/api/v1/audio/queue \
+  -H "Authorization: Bearer $VENICE_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{"model":"ace-step-15","prompt":"An upbeat electronic track with synth leads"}'
+# → {"model":"ace-step-15","queue_id":"abc-123","status":"QUEUED"}
+
+# Step 3: Poll until complete (returns audio bytes when done)
+curl -X POST https://api.venice.ai/api/v1/audio/retrieve \
+  -H "Authorization: Bearer $VENICE_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{"model":"ace-step-15","queue_id":"abc-123"}' \
+  -o music.mp3
+# While processing: → {"status":"PROCESSING","average_execution_time":20000,"execution_duration":5200}
+# When complete: → raw audio bytes (Content-Type: audio/mpeg)</code></pre>
+
+<h4>Audio Queue Request Fields</h4>
+<table>
+<tr><th>Field</th><th>Type</th><th>Description</th></tr>
+<tr><td><code>model</code></td><td>string (required)</td><td>Music model ID</td></tr>
+<tr><td><code>prompt</code></td><td>string (required)</td><td>Description of the audio to generate</td></tr>
+<tr><td><code>lyrics_prompt</code></td><td>string</td><td>Lyrics for lyric-capable models</td></tr>
+<tr><td><code>duration_seconds</code></td><td>integer</td><td>Duration hint in seconds</td></tr>
+<tr><td><code>force_instrumental</code></td><td>boolean</td><td>Force instrumental (no vocals)</td></tr>
+<tr><td><code>voice</code></td><td>string</td><td>Voice selection for voice-enabled models</td></tr>
+</table>
+
+<p><strong>Models:</strong> <code>ace-step-15</code>, <code>elevenlabs-music</code>, <code>minimax-music-v2</code>, <code>stable-audio-25</code>. Sound effects: <code>elevenlabs-sound-effects-v2</code>, <code>mmaudio-v2-text-to-audio</code>.</p>
+
+<!-- ───────────────────────────────────────── -->
+<h2 id="video">Video Generation</h2>
+
+<p><span class="method method-post">POST</span> <span class="endpoint">/video/queue</span> → poll <span class="endpoint">/video/retrieve</span></p>
+
+<p>Asynchronous: queue a job, get a <code>queue_id</code>, poll until complete.</p>
+
+<pre><code>{
+  "model": "wan-2.5-preview-text-to-video",
+  "prompt": "Commerce in Venice, Italy",
+  "duration": "10s",
+  "resolution": "720p",
+  "aspect_ratio": "16:9"
+}</code></pre>
+
+<h3>Request Fields</h3>
+<table>
+<tr><th>Field</th><th>Type</th><th>Description</th></tr>
+<tr><td><code>model</code></td><td>string (required)</td><td>Video model ID</td></tr>
+<tr><td><code>prompt</code></td><td>string (required)</td><td>Max 2500 chars</td></tr>
+<tr><td><code>duration</code></td><td>string (required)</td><td><code>5s</code> or <code>10s</code></td></tr>
+<tr><td><code>resolution</code></td><td>string</td><td><code>480p</code>, <code>720p</code>, <code>1080p</code></td></tr>
+<tr><td><code>aspect_ratio</code></td><td>string</td><td>e.g. <code>16:9</code></td></tr>
+<tr><td><code>image_url</code></td><td>string</td><td>Reference image for image-to-video models</td></tr>
+<tr><td><code>negative_prompt</code></td><td>string</td><td>What to avoid</td></tr>
+<tr><td><code>audio</code></td><td>boolean</td><td>Generate audio (if model supports it)</td></tr>
+</table>
+
+<h3>Complete Video Generation Flow</h3>
+
+<pre><code># Step 1: Get a price quote (optional but recommended)
+curl -X POST https://api.venice.ai/api/v1/video/quote \
+  -H "Authorization: Bearer $VENICE_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{"model":"wan-2.6-text-to-video","duration":"10s","resolution":"720p"}'
+# → {"quote": 0.35}</code></pre>
+
+<pre><code># Step 2: Queue the video generation
+curl -X POST https://api.venice.ai/api/v1/video/queue \
+  -H "Authorization: Bearer $VENICE_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "wan-2.6-text-to-video",
+    "prompt": "A drone shot over Venice canals at golden hour",
+    "duration": "10s",
+    "resolution": "720p",
+    "aspect_ratio": "16:9"
+  }'
+# Response:
+# {"model": "wan-2.6-text-to-video", "queue_id": "550e8400-e29b-41d4-a716-446655440000"}</code></pre>
+
+<pre><code># Step 3: Poll /video/retrieve until complete
+# While processing → returns JSON:
+# {"status":"PROCESSING","average_execution_time":145000,"execution_duration":53200}
+#
+# When complete → returns raw MP4 bytes (Content-Type: video/mp4)
+
+curl -X POST https://api.venice.ai/api/v1/video/retrieve \
+  -H "Authorization: Bearer $VENICE_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "wan-2.6-text-to-video",
+    "queue_id": "550e8400-e29b-41d4-a716-446655440000",
+    "delete_media_on_completion": true
+  }' \
+  -o output.mp4</code></pre>
+
+<h4>Python Polling Loop</h4>
+<pre><code>import requests, time, os
+
+API = "https://api.venice.ai/api/v1"
+HEADERS = {"Authorization": f"Bearer {os.getenv('VENICE_API_KEY')}",
+           "Content-Type": "application/json"}
+
+# Queue
+q = requests.post(f"{API}/video/queue", headers=HEADERS, json={
+    "model": "wan-2.6-text-to-video",
+    "prompt": "A drone shot over Venice canals at golden hour",
+    "duration": "10s", "resolution": "720p"
+}).json()
+
+queue_id = q["queue_id"]
+print(f"Queued: {queue_id}")
+
+# Poll
+while True:
+    r = requests.post(f"{API}/video/retrieve", headers=HEADERS, json={
+        "model": "wan-2.6-text-to-video",
+        "queue_id": queue_id,
+        "delete_media_on_completion": True
+    })
+    if r.headers.get("Content-Type", "").startswith("video/"):
+        with open("output.mp4", "wb") as f:
+            f.write(r.content)
+        print("Done! Saved output.mp4")
+        break
+    else:
+        status = r.json()
+        elapsed = status.get("execution_duration", 0) / 1000
+        eta = status.get("average_execution_time", 0) / 1000
+        print(f"Processing... {elapsed:.0f}s / ~{eta:.0f}s est.")
+        time.sleep(10)</code></pre>
+
+<h3>Video Retrieve Request</h3>
+<table>
+<tr><th>Field</th><th>Type</th><th>Description</th></tr>
+<tr><td><code>model</code></td><td>string (required)</td><td>Same model used in queue</td></tr>
+<tr><td><code>queue_id</code></td><td>string (required)</td><td>ID returned by <code>/video/queue</code></td></tr>
+<tr><td><code>delete_media_on_completion</code></td><td>boolean</td><td>Delete from storage after download (default: <code>false</code>)</td></tr>
+</table>
+
+<h3>Video Models (Selection)</h3>
+<table>
+<tr><th>Model</th><th>ID (Text-to-Video)</th><th>Privacy</th></tr>
+<tr><td>Veo 3.1</td><td><code>veo3.1-full-text-to-video</code></td><td><span class="badge badge-anon">Anon</span></td></tr>
+<tr><td>Sora 2 Pro</td><td><code>sora-2-pro-text-to-video</code></td><td><span class="badge badge-anon">Anon</span></td></tr>
+<tr><td>Kling V3 Pro</td><td><code>kling-v3-pro-text-to-video</code></td><td><span class="badge badge-anon">Anon</span></td></tr>
+<tr><td>Wan 2.6</td><td><code>wan-2.6-text-to-video</code></td><td><span class="badge badge-anon">Anon</span></td></tr>
+<tr><td>Longcat</td><td><code>longcat-text-to-video</code></td><td><span class="badge badge-private">Private</span></td></tr>
+<tr><td>LTX 2.0</td><td><code>ltx-2-full-text-to-video</code></td><td><span class="badge badge-anon">Anon</span></td></tr>
+</table>
+
+<!-- ───────────────────────────────────────── -->
+<h2 id="embeddings">Embeddings</h2>
+
+<p><span class="method method-post">POST</span> <span class="endpoint">/embeddings</span></p>
+
+<pre><code>{
+  "model": "text-embedding-bge-m3",
+  "input": "Privacy-first AI infrastructure",
+  "encoding_format": "float"
+}</code></pre>
+
+<p>Model: <code>text-embedding-bge-m3</code>. Input: $0.15/1M tokens, Output: $0.60/1M tokens. Privacy: Private. Max input: 8192 tokens. Batch: up to 2048 inputs per request.</p>
+
+<h3>Response</h3>
+<pre><code>{
+  "object": "list",
+  "model": "text-embedding-bge-m3",
+  "data": [
+    {
+      "object": "embedding",
+      "index": 0,
+      "embedding": [0.0023064255, -0.009327292, 0.015797377, ...]  // 1024-dimensional float vector
+    }
+  ],
+  "usage": {
+    "prompt_tokens": 8,
+    "total_tokens": 8
+  }
+}</code></pre>
+
+<p><strong>Vector dimensions:</strong> <code>text-embedding-bge-m3</code> produces <strong>1024-dimensional</strong> vectors by default. You can optionally reduce dimensions via the <code>dimensions</code> parameter.</p>
+<p><strong>Encoding formats:</strong> <code>"float"</code> (array of numbers) or <code>"base64"</code> (compact binary).</p>
+<p><strong>Batch input:</strong> Pass an array of strings to embed multiple inputs in one request.</p>
+
+<!-- ───────────────────────────────────────── -->
+<h2 id="characters">Characters API</h2>
+
+<p><span class="method method-get">GET</span> <span class="endpoint">/characters</span></p>
+
+<p>List and filter AI character personas. Use a character's <code>slug</code> in <code>venice_parameters.character_slug</code> for chat completions.</p>
+
+<h3>Query Parameters</h3>
+<table>
+<tr><th>Param</th><th>Description</th></tr>
+<tr><td><code>search</code></td><td>Search by name, description, or tags</td></tr>
+<tr><td><code>categories</code></td><td>Filter by category (e.g. <code>roleplay</code>, <code>philosophy</code>)</td></tr>
+<tr><td><code>tags</code></td><td>Filter by tags</td></tr>
+<tr><td><code>modelId</code></td><td>Filter by model</td></tr>
+<tr><td><code>isAdult</code></td><td><code>true</code> or <code>false</code></td></tr>
+<tr><td><code>sortBy</code></td><td><code>featured</code>, <code>highestRating</code>, <code>mostRecent</code>, <code>imports</code>, etc.</td></tr>
+<tr><td><code>limit</code> / <code>offset</code></td><td>Pagination (max 100)</td></tr>
+</table>
+
+<h3>Using a Character in Chat</h3>
+<pre><code>{
+  "model": "venice-uncensored",
+  "messages": [{"role": "user", "content": "What is the meaning of life?"}],
+  "venice_parameters": {
+    "character_slug": "alan-watts"
+  }
+}</code></pre>
+
+<!-- ───────────────────────────────────────── -->
+<h2 id="privacy">Privacy Architecture</h2>
+
+<blockquote>The only way to achieve reasonable user privacy is to avoid collecting information in the first place.</blockquote>
+
+<ul>
+<li>Venice <strong>does not store or log</strong> any prompts or model responses.</li>
+<li>API calls are forwarded directly to GPUs over encrypted HTTPS paths.</li>
+<li>Decentralized compute providers — no single point of data collection.</li>
+<li>Private models have zero data retention.</li>
+<li>Anonymized models strip identifying metadata before forwarding to third-party providers.</li>
+<li>TEE/E2EE models provide cryptographic guarantees.</li>
+</ul>
+
+<!-- ───────────────────────────────────────── -->
+<h2 id="tee-e2ee">TEE &amp; E2EE Models</h2>
+
+<h3>TEE (Trusted Execution Environment)</h3>
+<p>Models with <code>tee-*</code> prefix run inside hardware-secured enclaves (Intel TDX, NVIDIA CC). Venice cannot access the computation.</p>
+
+<pre><code>response = client.chat.completions.create(
+    model="tee-qwen3-5-122b-a10b",
+    messages=[{"role": "user", "content": "Explain quantum computing"}]
+)</code></pre>
+
+<h4>Verify Attestation</h4>
+<p><span class="method method-get">GET</span> <span class="endpoint">/tee/attestation?model=...&nonce=...</span></p>
+<p>Returns cryptographic proof the model runs in a genuine TEE. Fields: <code>verified</code>, <code>nonce</code>, <code>tee_provider</code>, <code>intel_quote</code>, <code>nvidia_payload</code>, <code>signing_key</code>, <code>signing_address</code>.</p>
+
+<h4>Verify Response Signature</h4>
+<p><span class="method method-get">GET</span> <span class="endpoint">/tee/signature?model=...&request_id=...</span></p>
+<p>Proves a response came from the attested enclave.</p>
+
+<h3>E2EE (End-to-End Encryption)</h3>
+<p>Models with <code>e2ee-*</code> prefix add client-side encryption on top of TEE. Your prompts are encrypted before leaving your device.</p>
+
+<p><strong>Crypto stack:</strong> ECDH on secp256k1 → HKDF-SHA256 → AES-256-GCM.</p>
+
+<h4>E2EE Implementation Steps</h4>
+<ol>
+<li><strong>Generate ephemeral key pair</strong> — secp256k1 (same curve as Ethereum/Bitcoin). Create a fresh pair per session.</li>
+<li><strong>Fetch TEE attestation</strong> — <code>GET /tee/attestation?model=e2ee-glm-4-7-p&nonce=&lt;random-hex&gt;</code>. Response includes <code>signing_key</code> (model's public key).</li>
+<li><strong>Verify attestation</strong> — Check <code>verified: true</code>, nonce match, and optionally parse Intel TDX quote.</li>
+<li><strong>Derive shared secret</strong> — ECDH(your_private_key, model_public_key) → shared_secret.</li>
+<li><strong>Derive encryption key</strong> — HKDF-SHA256(shared_secret) → 256-bit AES key.</li>
+<li><strong>Encrypt messages</strong> — AES-256-GCM encrypt each message content. Replace plaintext with ciphertext in the request body.</li>
+<li><strong>Send request</strong> with headers:
+  <ul>
+    <li><code>X-Venice-TEE-Client-Pub-Key</code> — your public key (hex)</li>
+    <li><code>X-Venice-TEE-Model-Pub-Key</code> — model's public key from attestation (hex)</li>
+    <li><code>X-Venice-TEE-Signing-Algo</code> — <code>"secp256k1"</code></li>
+  </ul>
+</li>
+<li><strong>Decrypt response</strong> — TEE encrypts response chunks with the shared secret. Decrypt each chunk with your AES key.</li>
+</ol>
+
+<div class="callout callout-important">
+<strong>E2EE requires non-trivial client-side crypto.</strong> For a complete reference implementation with working code (Node.js ESM and Python), see the full guide: <a href="https://docs.venice.ai/overview/guides/tee-e2ee-models">docs.venice.ai/overview/guides/tee-e2ee-models</a>
+</div>
+
+<h3>Available E2EE Models</h3>
+<table>
+<tr><th>Model</th><th>ID</th><th>Context</th></tr>
+<tr><td>GLM 4.7</td><td><code>e2ee-glm-4-7-p</code></td><td>128K</td></tr>
+<tr><td>GLM 4.7 Flash</td><td><code>e2ee-glm-4-7-flash-p</code></td><td>198K</td></tr>
+<tr><td>GLM 5</td><td><code>e2ee-glm-5</code></td><td>198K</td></tr>
+<tr><td>Gemma 3 27B</td><td><code>e2ee-gemma-3-27b-p</code></td><td>40K</td></tr>
+<tr><td>GPT OSS 120B</td><td><code>e2ee-gpt-oss-120b-p</code></td><td>128K</td></tr>
+<tr><td>Qwen3 30B</td><td><code>e2ee-qwen3-30b-a3b-p</code></td><td>256K</td></tr>
+<tr><td>Qwen3 VL 30B</td><td><code>e2ee-qwen3-vl-30b-a3b-p</code></td><td>128K</td></tr>
+<tr><td>Qwen3.5 122B</td><td><code>e2ee-qwen3-5-122b-a10b</code></td><td>128K</td></tr>
+<tr><td>Venice Uncensored</td><td><code>e2ee-venice-uncensored-24b-p</code></td><td>32K</td></tr>
+<tr><td>Qwen 2.5 7B</td><td><code>e2ee-qwen-2-5-7b-p</code></td><td>32K</td></tr>
+</table>
+
+<!-- ───────────────────────────────────────── -->
+<h2 id="prompt-caching">Prompt Caching</h2>
+
+<p>Reduces latency (up to 80%) and costs (up to 90%) by reusing processed input tokens on prefix-matched requests.</p>
+
+<div class="callout callout-tip">
+<strong>Automatic for most models.</strong> Just structure prompts with static content first, dynamic content last.
+</div>
+
+<h3>Key Rules</h3>
+<ul>
+<li>Caching operates on <strong>prefix matching</strong> — changes to the beginning invalidate everything after.</li>
+<li>Minimum token threshold: ~1,024 tokens (4,000 for Claude).</li>
+<li>Cache lifetime: 5-10 minutes of inactivity (1 hour for Gemini).</li>
+<li>Cache keys are byte-exact — whitespace changes break hits.</li>
+</ul>
+
+<h3>prompt_cache_key</h3>
+<p>Routing hint for cache affinity. Same key → same server → higher hit rate.</p>
+<pre><code>{
+  "model": "claude-opus-45",
+  "prompt_cache_key": "session-abc-123",
+  "messages": [...]
+}</code></pre>
+
+<h3>Cache Stats in Response</h3>
+<pre><code>{
+  "usage": {
+    "prompt_tokens": 5500,
+    "prompt_tokens_details": {
+      "cached_tokens": 5000,
+      "cache_creation_input_tokens": 0
+    }
+  }
+}</code></pre>
+
+<h3>Provider Cache Behavior</h3>
+<table>
+<tr><th>Provider</th><th>Min Tokens</th><th>Lifetime</th><th>Read Discount</th></tr>
+<tr><td>Anthropic (Claude)</td><td>~4,000</td><td>5 min</td><td>90%</td></tr>
+<tr><td>OpenAI (GPT)</td><td>1,024</td><td>5-10 min</td><td>90%</td></tr>
+<tr><td>Google (Gemini)</td><td>~1,024</td><td>1 hour</td><td>75-90%</td></tr>
+<tr><td>xAI (Grok)</td><td>~1,024</td><td>5 min</td><td>75-88%</td></tr>
+<tr><td>DeepSeek</td><td>~1,024</td><td>5 min</td><td>50%</td></tr>
+<tr><td>MiniMax</td><td>~1,024</td><td>5 min</td><td>90%</td></tr>
+<tr><td>Moonshot (Kimi)</td><td>~1,024</td><td>5 min</td><td>50%</td></tr>
+</table>
+
+<!-- ───────────────────────────────────────── -->
+<h2 id="models">Models Reference</h2>
+
+<p>Discover models programmatically: <span class="method method-get">GET</span> <span class="endpoint">/models</span></p>
+
+<p>Key response fields per model: <code>id</code>, <code>type</code>, <code>model_spec.availableContextTokens</code>, <code>model_spec.capabilities</code> (<code>supportsFunctionCalling</code>, <code>supportsResponseSchema</code>, <code>supportsWebSearch</code>, <code>supportsReasoning</code>, <code>supportsReasoningEffort</code>).</p>
+
+<h3>Recommended Models by Use Case</h3>
+<table>
+<tr><th>Use Case</th><th>Model ID</th><th>Context</th></tr>
+<tr><td>Flagship / complex tasks</td><td><code>zai-org-glm-4.7</code></td><td>198K</td></tr>
+<tr><td>Flagship v2</td><td><code>zai-org-glm-5</code></td><td>198K</td></tr>
+<tr><td>Balanced general use</td><td><code>llama-3.3-70b</code></td><td>128K</td></tr>
+<tr><td>Fast / cost-efficient</td><td><code>qwen3-4b</code></td><td>40K</td></tr>
+<tr><td>Uncensored</td><td><code>venice-uncensored</code></td><td>32K</td></tr>
+<tr><td>Vision</td><td><code>qwen3-vl-235b-a22b</code></td><td>256K</td></tr>
+<tr><td>Long context coding</td><td><code>qwen3-coder-480b-a35b-instruct</code></td><td>256K</td></tr>
+<tr><td>Deep reasoning</td><td><code>qwen3-235b-a22b-thinking-2507</code></td><td>128K</td></tr>
+<tr><td>GPT (via Venice)</td><td><code>openai-gpt-52</code></td><td>256K</td></tr>
+<tr><td>Claude (via Venice)</td><td><code>claude-opus-4-6</code></td><td>1000K</td></tr>
+<tr><td>Gemini (via Venice)</td><td><code>gemini-3-1-pro-preview</code></td><td>1000K</td></tr>
+<tr><td>Grok (via Venice)</td><td><code>grok-4-20-beta</code></td><td>2000K</td></tr>
+<tr><td>Image generation (budget)</td><td><code>venice-sd35</code></td><td>—</td></tr>
+<tr><td>Image generation (quality)</td><td><code>recraft-v4-pro</code></td><td>—</td></tr>
+<tr><td>TTS</td><td><code>tts-kokoro</code></td><td>—</td></tr>
+<tr><td>Embeddings</td><td><code>text-embedding-bge-m3</code></td><td>—</td></tr>
+</table>
+
+<!-- ───────────────────────────────────────── -->
+<h2 id="pricing">Pricing</h2>
+
+<p>Prices per 1M tokens unless noted. All USD. Full pricing: <a href="https://docs.venice.ai/overview/pricing">docs.venice.ai/overview/pricing</a></p>
+
+<h3>Text Models (Selection)</h3>
+<table>
+<tr><th>Model</th><th>ID</th><th>Input</th><th>Output</th><th>Cache Read</th><th>Context</th><th>Privacy</th></tr>
+<tr><td>Claude Opus 4.6</td><td><code>claude-opus-4-6</code></td><td>$6.00</td><td>$30.00</td><td>$0.60</td><td>1000K</td><td><span class="badge badge-anon">Anon</span></td></tr>
+<tr><td>GPT-5.2</td><td><code>openai-gpt-52</code></td><td>$2.19</td><td>$17.50</td><td>$0.22</td><td>256K</td><td><span class="badge badge-anon">Anon</span></td></tr>
+<tr><td>Grok 4.20</td><td><code>grok-4-20-beta</code></td><td>$2.50</td><td>$7.50</td><td>$0.25</td><td>2000K</td><td><span class="badge badge-anon">Anon</span></td></tr>
+<tr><td>GLM 4.7</td><td><code>zai-org-glm-4.7</code></td><td>$0.55</td><td>$2.65</td><td>$0.11</td><td>198K</td><td><span class="badge badge-private">Private</span></td></tr>
+<tr><td>GLM 5</td><td><code>zai-org-glm-5</code></td><td>$1.00</td><td>$3.20</td><td>$0.20</td><td>198K</td><td><span class="badge badge-private">Private</span></td></tr>
+<tr><td>DeepSeek V3.2</td><td><code>deepseek-v3.2</code></td><td>$0.33</td><td>$0.48</td><td>$0.16</td><td>160K</td><td><span class="badge badge-private">Private</span></td></tr>
+<tr><td>Llama 3.3 70B</td><td><code>llama-3.3-70b</code></td><td>$0.70</td><td>$2.80</td><td>—</td><td>128K</td><td><span class="badge badge-private">Private</span></td></tr>
+<tr><td>Venice Uncensored</td><td><code>venice-uncensored</code></td><td>$0.20</td><td>$0.90</td><td>—</td><td>32K</td><td><span class="badge badge-private">Private</span></td></tr>
+<tr><td>Qwen 3 Coder 480B</td><td><code>qwen3-coder-480b-a35b-instruct</code></td><td>$0.75</td><td>$3.00</td><td>—</td><td>256K</td><td><span class="badge badge-private">Private</span></td></tr>
+<tr><td>GLM 4.7 Flash</td><td><code>zai-org-glm-4.7-flash</code></td><td>$0.13</td><td>$0.50</td><td>—</td><td>128K</td><td><span class="badge badge-private">Private</span></td></tr>
+<tr><td>Qwen 3.5 9B</td><td><code>qwen3-5-9b</code></td><td>$0.05</td><td>$0.15</td><td>—</td><td>256K</td><td><span class="badge badge-private">Private</span></td></tr>
+</table>
+
+<h3>Payment Options</h3>
+<table>
+<tr><th>Method</th><th>Details</th></tr>
+<tr><td><strong>USD</strong></td><td>Credit card. Credits never expire.</td></tr>
+<tr><td><strong>Crypto</strong></td><td>Cryptocurrency. Same rates as USD.</td></tr>
+<tr><td><strong>DIEM Staking</strong></td><td>Each DIEM = $1/day of credits that refresh daily.</td></tr>
+<tr><td><strong>Pro Subscription</strong></td><td>One-time $10 API credit when upgrading to Pro.</td></tr>
+</table>
+
+<!-- ───────────────────────────────────────── -->
+<h2 id="x402">X402 Wallet Payments (Agent-Friendly)</h2>
+
+<p><a href="https://www.x402.org/">X402</a> is an open standard for internet-native payments using the HTTP <code>402 Payment Required</code> status code. Venice supports X402 to let <strong>agents authenticate with a crypto wallet and pay for inference automatically</strong> — no API key required.</p>
+
+<h3>What X402 Enables</h3>
+<ul>
+<li><strong>Wallet auth instead of API keys</strong> — authenticate with a signed SIWE (Sign In With Ethereum) payload</li>
+<li><strong>Automatic payments</strong> — spend DIEM-backed balance or USDC on Base automatically during requests</li>
+<li><strong>Agent-friendly</strong> — autonomous agents can pay for their own compute without human intervention</li>
+</ul>
+
+<h3>Prerequisites</h3>
+<ul>
+<li>An EVM wallet on <strong>Base</strong> (Coinbase's L2) — Metamask, Phantom, Base Wallet, etc.</li>
+<li>USDC + gas funds on Base</li>
+<li>If you have staked DIEM, your daily DIEM inference is used first automatically</li>
+</ul>
+
+<div class="callout callout-important">
+<strong>Use a limited-purpose wallet</strong> — not your main treasury. The private key will be used for signing.
+</div>
+
+<h3>Supported Endpoints</h3>
+<table>
+<tr><th>Endpoint</th><th>Auth Header</th><th>Purpose</th></tr>
+<tr><td><code>POST /chat/completions</code></td><td><code>X-Sign-In-With-X</code></td><td>Paid inference (only endpoint currently supported)</td></tr>
+<tr><td><code>GET /x402/balance/{address}</code></td><td><code>X-Sign-In-With-X</code></td><td>Check spendable balance</td></tr>
+<tr><td><code>GET /x402/transactions/{address}</code></td><td><code>X-Sign-In-With-X</code></td><td>View transaction history</td></tr>
+<tr><td><code>POST /x402/top-up</code></td><td><code>X-402-Payment</code></td><td>Add USDC balance (different header!)</td></tr>
+</table>
+
+<h3>Auth Flow: Building X-Sign-In-With-X</h3>
+
+<p>Venice X402 uses a SIWE (Sign In With Ethereum) message, signed with EIP-191, then base64-encoded:</p>
+
+<pre><code>import { Wallet } from 'ethers'
+import { SiweMessage, generateNonce } from 'siwe'
+
+const signer = new Wallet(process.env.PRIVATE_KEY)
+
+const siwe = new SiweMessage({
+  domain: 'outerface.venice.ai',
+  address: signer.address,
+  statement: 'Sign in to Venice AI',
+  uri: 'https://outerface.venice.ai/api/v1/chat/completions',  // must match the route you're calling
+  version: '1',
+  chainId: 8453,   // Base
+  nonce: generateNonce(),
+  issuedAt: new Date().toISOString(),
+  expirationTime: new Date(Date.now() + 10 * 60 * 1000).toISOString(),  // 10 min
+})
+
+const message = siwe.prepareMessage()
+const signature = await signer.signMessage(message)
+
+const headerValue = Buffer.from(JSON.stringify({
+  address: signer.address.toLowerCase(),
+  message,
+  signature,
+  chainId: 8453,
+  timestamp: Date.now(),
+}), 'utf8').toString('base64')</code></pre>
+
+<div class="callout callout-important">
+<strong>The <code>uri</code> in the SIWE message must match the Venice route you're calling.</strong> Generate a fresh header for each different endpoint. Headers expire after 10 minutes.
+</div>
+
+<h3>Complete Agent Flow</h3>
+
+<h4>Step 1: Check balance</h4>
+<pre><code>curl -X GET "https://api.venice.ai/api/v1/x402/balance/0xYOUR_ADDRESS" \
+  -H "X-Sign-In-With-X: $BALANCE_AUTH"
+
+# Response:
+{
+  "success": true,
+  "data": {
+    "walletAddress": "0xyour_wallet_address",
+    "balanceUsd": 12.5,
+    "canConsume": true,
+    "minimumTopUpUsd": 5,
+    "suggestedTopUpUsd": 10,
+    "diemBalanceUsd": 5.25
+  }
+}</code></pre>
+
+<h4>Step 2: If canConsume is true, call inference</h4>
+<pre><code>curl -X POST https://api.venice.ai/api/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "X-Sign-In-With-X: $CHAT_AUTH" \
+  -d '{
+    "model": "kimi-k2-5",
+    "messages": [{"role": "user", "content": "Hello from an x402-authenticated wallet."}]
+  }'</code></pre>
+
+<h4>Step 3: If balance is low, top up with USDC</h4>
+<pre><code># First call without payment header to get requirements:
+curl -X POST https://api.venice.ai/api/v1/x402/top-up
+# → 402 response with paymentInfo:
+{
+  "error": "PAYMENT_REQUIRED",
+  "message": "Send x402 payment via X-402-Payment header",
+  "paymentInfo": {
+    "receiverWallet": "0xRECEIVER_WALLET",
+    "network": "eip155:8453",
+    "token": "USDC",
+    "tokenAddress": "0xUSDC_TOKEN_ADDRESS",
+    "minimumAmountUsd": 5,
+    "suggestedAmountUsd": 10
+  }
+}
+
+# Then retry with signed X-402-Payment header:
+curl -X POST https://api.venice.ai/api/v1/x402/top-up \
+  -H "X-402-Payment: $X402_PAYMENT"
+# → {"success":true,"data":{"amountCredited":10,"newBalance":22.5}}</code></pre>
+
+<h4>Step 4: Check transaction history</h4>
+<pre><code>curl -X GET "https://api.venice.ai/api/v1/x402/transactions/0xYOUR_ADDRESS?limit=10&offset=0" \
+  -H "X-Sign-In-With-X: $TX_AUTH"</code></pre>
+<p>Returns entries like <code>TOP_UP</code>, <code>CHARGE</code>, <code>REFUND</code> with <code>requestId</code> and <code>modelId</code> for spend correlation.</p>
+
+<h3>Venice X402 Client Library</h3>
+<pre><code>npm install @venice-ai/x402-client</code></pre>
+
+<pre><code>import { VeniceClient } from '@venice-ai/x402-client'
+
+const venice = new VeniceClient(process.env.WALLET_KEY)
+
+const response = await venice.chat({
+  model: 'kimi-k2-5',
+  messages: [{ role: 'user', content: 'Hello!' }]
+})</code></pre>
+
+<h3>X402 Error Reference</h3>
+<table>
+<tr><th>Error</th><th>HTTP</th><th>Cause / Fix</th></tr>
+<tr><td><code>Authentication failed</code></td><td>401</td><td>Regenerate SIWE header. Check: valid base64, signature matches wallet, chain is 8453, not expired.</td></tr>
+<tr><td><code>Unauthorized</code></td><td>403</td><td>Wallet in header doesn't match <code>{address}</code> path parameter.</td></tr>
+<tr><td><code>Insufficient balance</code></td><td>402</td><td>Wallet auth worked but balance too low. Check <code>/x402/balance</code>, top up if needed.</td></tr>
+<tr><td><code>PAYMENT_REQUIRED</code></td><td>402</td><td>Expected on first top-up call. Use returned <code>paymentInfo</code> to build <code>X-402-Payment</code>.</td></tr>
+<tr><td><code>X402_INVALID_PAYMENT</code></td><td>400</td><td>Malformed payment header. Rebuild from scratch.</td></tr>
+</table>
+
+<h3>Prompt for Autonomous Agents</h3>
+<p>If you're sending an agent to use Venice via X402, give it this context:</p>
+<pre><code>Use the Venice API with your wallet.
+
+Auth: Build a SIWE message (domain: outerface.venice.ai, chain: 8453, URI: the Venice route
+you're calling), sign with EIP-191, base64-encode as JSON, send as X-Sign-In-With-X header.
+Headers expire after 10 minutes — regenerate per request.
+
+Inference: POST https://api.venice.ai/api/v1/chat/completions
+  Header: X-Sign-In-With-X: &lt;base64&gt;
+  Body: {"model":"kimi-k2-5","messages":[{"role":"user","content":"Hello"}]}
+
+Balance: GET https://api.venice.ai/api/v1/x402/balance/&lt;address&gt;
+Transactions: GET https://api.venice.ai/api/v1/x402/transactions/&lt;address&gt;
+DIEM staked on the wallet is used automatically. No DIEM = top up with USDC on Base (min $5).
+Guide: https://docs.venice.ai/overview/guides/x402-venice-api</code></pre>
+
+<p>Full documentation: <a href="https://docs.venice.ai/overview/guides/x402-venice-api">docs.venice.ai/overview/guides/x402-venice-api</a> | Client library: <a href="https://github.com/veniceai/x402-client">veniceai/x402-client</a></p>
+
+<!-- ───────────────────────────────────────── -->
+<h2 id="rate-limits">Rate Limits</h2>
+
+<p>Check your limits: <span class="method method-get">GET</span> <span class="endpoint">/api_keys/rate_limits</span></p>
+
+<h3>Default Limits by Model Tier</h3>
+<table>
+<tr><th>Tier</th><th>Requests/min</th><th>Tokens/min</th><th>Example Models</th></tr>
+<tr><td>XS</td><td>500</td><td>1,000,000</td><td><code>qwen3-4b</code>, <code>llama-3.2-3b</code></td></tr>
+<tr><td>S</td><td>75</td><td>750,000</td><td><code>mistral-31-24b</code>, <code>venice-uncensored</code></td></tr>
+<tr><td>M</td><td>50</td><td>750,000</td><td><code>llama-3.3-70b</code>, <code>qwen3-next-80b</code></td></tr>
+<tr><td>L</td><td>20</td><td>500,000</td><td><code>zai-org-glm-4.7</code>, <code>grok-41-fast</code>, <code>qwen3-coder-480b</code></td></tr>
+</table>
+
+<h3>Non-Text Limits</h3>
+<table>
+<tr><th>Type</th><th>Requests/min</th></tr>
+<tr><td>Image</td><td>20</td></tr>
+<tr><td>Audio</td><td>60</td></tr>
+<tr><td>Embedding</td><td>500</td></tr>
+<tr><td>Video (queue)</td><td>40</td></tr>
+<tr><td>Video (retrieve)</td><td>120</td></tr>
+</table>
+
+<h3>Rate Limit Response Headers</h3>
+<table>
+<tr><th>Header</th><th>Description</th></tr>
+<tr><td><code>x-ratelimit-limit-requests</code></td><td>Max requests in current window</td></tr>
+<tr><td><code>x-ratelimit-remaining-requests</code></td><td>Requests remaining</td></tr>
+<tr><td><code>x-ratelimit-reset-requests</code></td><td>Unix timestamp when window resets</td></tr>
+<tr><td><code>x-ratelimit-limit-tokens</code></td><td>Max tokens per minute</td></tr>
+<tr><td><code>x-ratelimit-remaining-tokens</code></td><td>Tokens remaining</td></tr>
+<tr><td><code>x-ratelimit-reset-tokens</code></td><td>Seconds until token limit resets</td></tr>
+</table>
+
+<p><strong>Abuse Protection:</strong> 20+ failed requests in 30 seconds → blocked for 30 seconds.</p>
+
+<p><strong>Partner Tier:</strong> Significantly higher limits available. Contact <a href="mailto:api@venice.ai">api@venice.ai</a>.</p>
+
+<!-- ───────────────────────────────────────── -->
+<h2 id="error-codes">Error Codes</h2>
+
+<table>
+<tr><th>Code</th><th>HTTP</th><th>Meaning</th></tr>
+<tr><td><code>AUTHENTICATION_FAILED</code></td><td>401</td><td>Invalid or missing API key</td></tr>
+<tr><td><code>AUTHENTICATION_FAILED_INACTIVE_KEY</code></td><td>401</td><td>Pro subscription inactive</td></tr>
+<tr><td><code>INVALID_API_KEY</code></td><td>401</td><td>API key format invalid</td></tr>
+<tr><td><code>INSUFFICIENT_BALANCE</code></td><td>402</td><td>No USD or DIEM balance remaining</td></tr>
+<tr><td><code>UNAUTHORIZED</code></td><td>403</td><td>No access to this resource</td></tr>
+<tr><td><code>INVALID_REQUEST</code></td><td>400</td><td>Bad parameters</td></tr>
+<tr><td><code>INVALID_MODEL</code></td><td>400</td><td>Model doesn't exist</td></tr>
+<tr><td><code>CHARACTER_NOT_FOUND</code></td><td>404</td><td>Character slug not found</td></tr>
+<tr><td><code>MODEL_NOT_FOUND</code></td><td>404</td><td>Model not found</td></tr>
+<tr><td><code>INVALID_CONTENT_TYPE</code></td><td>415</td><td>Wrong Content-Type header</td></tr>
+<tr><td><code>INVALID_FILE_SIZE</code></td><td>413</td><td>File too large</td></tr>
+<tr><td><code>INVALID_IMAGE_FORMAT</code></td><td>400</td><td>Unsupported image format</td></tr>
+<tr><td><code>CORRUPTED_IMAGE</code></td><td>400</td><td>Image file unreadable</td></tr>
+<tr><td><code>RATE_LIMIT_EXCEEDED</code></td><td>429</td><td>Too many requests</td></tr>
+<tr><td><code>INFERENCE_FAILED</code></td><td>500</td><td>Model inference error</td></tr>
+<tr><td><code>UPSCALE_FAILED</code></td><td>500</td><td>Upscaling error</td></tr>
+<tr><td><code>UNKNOWN_ERROR</code></td><td>500</td><td>Unexpected server error</td></tr>
+</table>
+
+<h3>Error Response Format</h3>
+<pre><code>// Standard error (most endpoints)
+{"error": "Rate limit exceeded"}
+
+// Detailed error (validation failures — 400)
+{
+  "error": "Invalid request parameters",
+  "details": {
+    "model": {"_errors": ["Invalid model specified"]},
+    "prompt": {"_errors": ["Field is required"]}
+  }
+}
+
+// Content violation (422 — video/audio)
+{
+  "error": "Your prompt violates the content policy",
+  "suggested_prompt": "A cinematic instrumental track inspired by stormy weather"
+}</code></pre>
+
+<p>Retry strategy: use exponential backoff for 429, 500, 503 errors. Check <code>x-ratelimit-reset-requests</code> header for 429.</p>
+
+<!-- ───────────────────────────────────────── -->
+<h2 id="response-headers">Response Headers</h2>
+
+<table>
+<tr><th>Header</th><th>Purpose</th></tr>
+<tr><td><code>CF-RAY</code></td><td>Unique request ID (log for support)</td></tr>
+<tr><td><code>x-venice-version</code></td><td>API version/revision</td></tr>
+<tr><td><code>x-venice-model-id</code></td><td>Model used for inference</td></tr>
+<tr><td><code>x-venice-model-name</code></td><td>Friendly model name</td></tr>
+<tr><td><code>x-venice-model-deprecation-warning</code></td><td>Deprecation notice</td></tr>
+<tr><td><code>x-venice-model-deprecation-date</code></td><td>When model will be removed</td></tr>
+<tr><td><code>x-venice-balance-usd</code></td><td>USD balance before request</td></tr>
+<tr><td><code>x-venice-balance-diem</code></td><td>DIEM balance before request</td></tr>
+<tr><td><code>x-venice-is-blurred</code></td><td>Image was blurred (Safe Venice)</td></tr>
+<tr><td><code>x-venice-is-content-violation</code></td><td>Content policy violation</td></tr>
+<tr><td><code>x-ratelimit-*</code></td><td>Rate limiting info (see above)</td></tr>
+<tr><td><code>x-pagination-*</code></td><td>Pagination metadata</td></tr>
+</table>
+
+<!-- ───────────────────────────────────────── -->
+<h2 id="agent-frameworks">Agent Frameworks &amp; Integrations</h2>
+
+<h3>Migration from OpenAI — The 2-Line Change</h3>
+<p>Venice is a <strong>drop-in replacement</strong> for OpenAI. Same SDK, same code — just change two values:</p>
+<pre><code># Python
+client = OpenAI(
+    api_key="your-venice-api-key",           # ← Change 1
+    base_url="https://api.venice.ai/api/v1"  # ← Change 2
+)
+
+# Node.js
+const client = new OpenAI({
+  apiKey: 'your-venice-api-key',
+  baseURL: 'https://api.venice.ai/api/v1',
+});
+
+# Environment variables (many libraries auto-read these)
+OPENAI_API_KEY=your-venice-api-key
+OPENAI_BASE_URL=https://api.venice.ai/api/v1</code></pre>
+
+<h4>OpenAI → Venice Model Mapping</h4>
+<table>
+<tr><th>OpenAI Model</th><th>Venice Equivalent</th><th>Type</th><th>Pricing (In/Out per 1M)</th></tr>
+<tr><td>gpt-4o</td><td><code>zai-org-glm-4.7</code> <span class="badge badge-private">Private</span></td><td>Text</td><td>$0.55 / $2.65</td></tr>
+<tr><td>gpt-4o</td><td><code>openai-gpt-52</code> <span class="badge badge-anon">Anon</span></td><td>Text</td><td>$2.19 / $17.50</td></tr>
+<tr><td>gpt-4o-mini</td><td><code>qwen3-4b</code></td><td>Text</td><td>$0.05 / $0.15</td></tr>
+<tr><td>o1 / o3</td><td><code>qwen3-235b-a22b-thinking-2507</code></td><td>Reasoning</td><td>$0.45 / $3.50</td></tr>
+<tr><td>gpt-4-vision</td><td><code>qwen3-vl-235b-a22b</code></td><td>Vision</td><td>$0.25 / $1.50</td></tr>
+<tr><td>text-embedding-3-small</td><td><code>text-embedding-bge-m3</code></td><td>Embeddings</td><td>$0.15 / $0.60</td></tr>
+<tr><td>dall-e-3</td><td><code>qwen-image</code> <span class="badge badge-private">Private</span></td><td>Image</td><td>$0.01/img</td></tr>
+<tr><td>whisper</td><td><code>nvidia/parakeet-tdt-0.6b-v3</code></td><td>STT</td><td>$0.0001/sec</td></tr>
+<tr><td>tts-1</td><td><code>tts-kokoro</code></td><td>TTS</td><td>$3.50/1M chars</td></tr>
+</table>
+
+<h4>Framework Migration Quick Reference</h4>
+<table>
+<tr><th>Framework</th><th>Change Required</th></tr>
+<tr><td>LangChain</td><td><code>base_url</code> in <code>ChatOpenAI</code></td></tr>
+<tr><td>Vercel AI SDK</td><td><code>baseURL</code> in <code>createOpenAI</code></td></tr>
+<tr><td>CrewAI</td><td><code>OPENAI_API_BASE</code> env var</td></tr>
+<tr><td>LlamaIndex</td><td><code>api_base</code> in <code>OpenAI</code></td></tr>
+<tr><td>AutoGen</td><td><code>base_url</code> in config</td></tr>
+<tr><td>Haystack</td><td><code>api_base_url</code> in <code>OpenAIGenerator</code></td></tr>
+<tr><td>Claude Code</td><td>Use <a href="#claude-code">claude-code-router</a></td></tr>
+<tr><td>Cursor</td><td>Custom API endpoint in settings</td></tr>
+<tr><td>Continue.dev</td><td><code>apiBase</code> in config.json</td></tr>
+</table>
+
+<p>Full migration guide: <a href="https://docs.venice.ai/overview/guides/openai-migration">docs.venice.ai/overview/guides/openai-migration</a></p>
+
+<hr>
+
+<h3>LangChain</h3>
+
+<pre><code>pip install langchain langchain-openai openai</code></pre>
+
+<h4>Chat Model</h4>
+<pre><code>from langchain_openai import ChatOpenAI
+
+llm = ChatOpenAI(
+    model="venice-uncensored",
+    api_key="your-venice-api-key",
+    base_url="https://api.venice.ai/api/v1",
+    temperature=0.7,
+)
+
+response = llm.invoke("Explain privacy-preserving AI.")
+print(response.content)</code></pre>
+
+<h4>Embeddings</h4>
+<pre><code>from langchain_openai import OpenAIEmbeddings
+
+embeddings = OpenAIEmbeddings(
+    model="text-embedding-bge-m3",
+    api_key="your-venice-api-key",
+    base_url="https://api.venice.ai/api/v1",
+    check_embedding_ctx_length=False,  # Required for Venice
+)</code></pre>
+
+<h4>RAG Pipeline</h4>
+<pre><code>from langchain_community.vectorstores import FAISS
+from langchain_core.prompts import ChatPromptTemplate
+from langchain_core.runnables import RunnablePassthrough
+from langchain_core.output_parsers import StrOutputParser
+
+vectorstore = FAISS.from_texts(documents, embeddings)
+retriever = vectorstore.as_retriever(search_kwargs={"k": 2})
+
+rag_prompt = ChatPromptTemplate.from_messages([
+    ("system", "Answer based only on this context:\n\n{context}"),
+    ("user", "{question}"),
+])
+
+rag_chain = (
+    {"context": retriever | format_docs, "question": RunnablePassthrough()}
+    | rag_prompt | llm | StrOutputParser()
+)
+
+answer = rag_chain.invoke("What privacy levels does Venice offer?")</code></pre>
+
+<h4>Function Calling Agent</h4>
+<pre><code>from langchain_core.tools import tool
+from langchain.agents import create_tool_calling_agent, AgentExecutor
+
+llm = ChatOpenAI(model="zai-org-glm-4.7", api_key="...", base_url="https://api.venice.ai/api/v1")
+
+@tool
+def get_price(model_id: str) -> str:
+    """Get pricing for a Venice AI model."""
+    prices = {"venice-uncensored": "$0.20/$0.90", "zai-org-glm-4.7": "$0.55/$2.65"}
+    return prices.get(model_id, "Not found")
+
+agent = create_tool_calling_agent(llm, [get_price], prompt)
+executor = AgentExecutor(agent=agent, tools=[get_price])
+result = executor.invoke({"input": "What's the cheapest model?"})</code></pre>
+
+<h4>Web Search</h4>
+<pre><code>llm_with_search = ChatOpenAI(
+    model="venice-uncensored",
+    api_key="...",
+    base_url="https://api.venice.ai/api/v1",
+    extra_body={"venice_parameters": {"enable_web_search": "auto"}}
+)</code></pre>
+
+<p>Full guide: <a href="https://docs.venice.ai/overview/guides/langchain">docs.venice.ai/overview/guides/langchain</a></p>
+
+<hr>
+
+<h3>Vercel AI SDK</h3>
+
+<pre><code>npm install ai @ai-sdk/openai</code></pre>
+
+<h4>Provider Setup</h4>
+<pre><code>// lib/venice.ts
+import { createOpenAI } from '@ai-sdk/openai';
+
+const openai = createOpenAI({
+  apiKey: process.env.VENICE_API_KEY!,
+  baseURL: 'https://api.venice.ai/api/v1',
+});
+
+// Use .chat() to ensure compatibility with Venice's chat completions endpoint
+export const venice = (modelId: string) => openai.chat(modelId);</code></pre>
+
+<div class="callout callout-important">
+<strong>Use <code>.chat()</code></strong> — the default <code>openai('model')</code> syntax may use newer OpenAI endpoints Venice doesn't support yet.
+</div>
+
+<h4>Streaming Chat (Next.js App Router)</h4>
+<pre><code>// app/api/chat/route.ts
+import { streamText } from 'ai';
+import { venice } from '@/lib/venice';
+
+export async function POST(req: Request) {
+  const { messages } = await req.json();
+  const result = streamText({
+    model: venice('venice-uncensored'),
+    system: 'You are a helpful, privacy-respecting AI assistant.',
+    messages,
+  });
+  return result.toDataStreamResponse();
+}</code></pre>
+
+<h4>Tool Calling</h4>
+<pre><code>import { streamText, tool } from 'ai';
+import { z } from 'zod';
+
+const result = streamText({
+  model: venice('zai-org-glm-4.7'),
+  messages: [{ role: 'user', content: 'Weather in Tokyo?' }],
+  tools: {
+    getWeather: tool({
+      description: 'Get current weather',
+      parameters: z.object({ location: z.string() }),
+      execute: async ({ location }) => ({ temperature: 22, condition: 'Sunny', location }),
+    }),
+  },
+});</code></pre>
+
+<h4>Structured Output</h4>
+<pre><code>import { generateObject } from 'ai';
+import { z } from 'zod';
+
+const { object } = await generateObject({
+  model: venice('venice-uncensored'),
+  schema: z.object({
+    recipe: z.object({
+      name: z.string(),
+      ingredients: z.array(z.string()),
+      steps: z.array(z.string()),
+    }),
+  }),
+  prompt: 'Generate a recipe for chocolate chip cookies.',
+});</code></pre>
+
+<h4>Embeddings</h4>
+<pre><code>import { embed } from 'ai';
+import { createOpenAI } from '@ai-sdk/openai';
+
+const openai = createOpenAI({
+  apiKey: process.env.VENICE_API_KEY!,
+  baseURL: 'https://api.venice.ai/api/v1',
+});
+
+const { embedding } = await embed({
+  model: openai.textEmbeddingModel('text-embedding-bge-m3'),
+  value: 'Privacy-first AI infrastructure',
+});</code></pre>
+
+<p>Full guide: <a href="https://docs.venice.ai/overview/guides/vercel-ai-sdk">docs.venice.ai/overview/guides/vercel-ai-sdk</a></p>
+
+<hr>
+
+<h3>OpenClaw</h3>
+
+<p><a href="https://openclaw.ai">OpenClaw</a> is an open-source AI gateway connecting messaging platforms (WhatsApp, Telegram, Discord, Slack, iMessage) to AI models. Venice is a built-in provider.</p>
+
+<pre><code># Install
+curl -fsSL https://openclaw.ai/install.sh | bash
+# Or: npm install -g openclaw@latest
+
+# Onboard (select Venice as provider, paste API key)
+openclaw onboard
+
+# Set model
+openclaw models set venice/zai-org-glm-5
+
+# Start
+openclaw tui          # Terminal UI
+openclaw dashboard    # Web dashboard
+openclaw gateway      # Messaging channels</code></pre>
+
+<h4>Recommended OpenClaw Models</h4>
+<table>
+<tr><th>Use Case</th><th>Model</th><th>Privacy</th></tr>
+<tr><td>General</td><td><code>venice/zai-org-glm-5</code></td><td><span class="badge badge-private">Private</span></td></tr>
+<tr><td>Reasoning</td><td><code>venice/kimi-k2-5</code></td><td><span class="badge badge-private">Private</span></td></tr>
+<tr><td>Coding</td><td><code>venice/claude-opus-4-6</code></td><td><span class="badge badge-anon">Anon</span></td></tr>
+<tr><td>Vision</td><td><code>venice/qwen3-vl-235b-a22b</code></td><td><span class="badge badge-private">Private</span></td></tr>
+<tr><td>Uncensored</td><td><code>venice/venice-uncensored</code></td><td><span class="badge badge-private">Private</span></td></tr>
+</table>
+
+<pre><code># Install image/video generation skill
+openclaw skills install nhannah/venice-ai-media</code></pre>
+
+<p>Full guide: <a href="https://docs.venice.ai/overview/guides/openclaw-bot">docs.venice.ai/overview/guides/openclaw-bot</a> | <a href="https://docs.openclaw.ai/providers/venice">OpenClaw Venice provider docs</a></p>
+
+<hr>
+
+<h3>Other Integrations</h3>
+<table>
+<tr><th>Integration</th><th>Type</th><th>Setup</th></tr>
+<tr><td><a href="https://github.com/ai16z/eliza">Eliza (ai16z)</a></td><td>Agent framework</td><td>Set <code>modelProvider: "venice"</code> in character config. Configure <code>VENICE_API_KEY</code> + model env vars.</td></tr>
+<tr><td><a href="https://www.coinbase.com/developer-platform/discover/launches/introducing-agentkit">Coinbase AgentKit</a></td><td>Agent framework</td><td>Native Venice support.</td></tr>
+<tr><td><a href="https://venice.ai/blog/how-to-code-with-the-venice-api-in-cursor-a-quick-guide">Cursor IDE</a></td><td>Coding</td><td>Custom API endpoint in settings.</td></tr>
+<tr><td><a href="https://venice.ai/blog/how-to-use-the-venice-api-with-cline-in-vscode-a-developers-guide">Cline (VS Code)</a></td><td>Coding</td><td>Set Venice base URL + API key.</td></tr>
+<tr><td><a href="https://venice.ai/blog/how-to-use-the-roo-ai-coding-assistant-in-private-with-venice-api-a-quick-guide">ROO Code (VS Code)</a></td><td>Coding</td><td>Set Venice base URL + API key.</td></tr>
+<tr><td><a href="https://venice.ai/blog/how-to-use-open-source-ai-code-editor-void-in-private-with-venice-api">VOID IDE</a></td><td>Coding</td><td>Set Venice base URL + API key.</td></tr>
+<tr><td><a href="https://venice.ai/blog/how-to-use-brave-leo-ai-with-venice-api-a-privacy-first-browser-ai-assistant">Brave Leo</a></td><td>Browser</td><td>Venice as Leo AI backend.</td></tr>
+<tr><td><a href="https://github.com/Aider-AI/aider">Aider</a></td><td>Coding</td><td>AI pair programming in terminal.</td></tr>
+<tr><td><a href="https://github.com/open-webui/open-webui">Open WebUI</a></td><td>Assistant</td><td>Self-hosted chat UI with Venice.</td></tr>
+<tr><td><a href="https://www.librechat.ai/">LibreChat</a></td><td>Assistant</td><td>Multi-provider chat with Venice.</td></tr>
+</table>
+
+<h4>Eliza Setup</h4>
+<pre><code>git clone https://github.com/ai16z/eliza.git
+cp .env.example .env
+# Edit .env: set VENICE_API_KEY, SMALL_VENICE_MODEL, MEDIUM_VENICE_MODEL, LARGE_VENICE_MODEL
+# Create character in /characters/your_char.character.json with modelProvider: "venice"
+pnpm i && pnpm build && pnpm start
+pnpm start --characters="characters/your_char.character.json"</code></pre>
+
+<!-- ───────────────────────────────────────── -->
+<h2 id="claude-code">Claude Code Router</h2>
+
+<p>Use <a href="https://docs.anthropic.com/en/docs/claude-code">Claude Code CLI</a> with Venice for pay-per-token access to Claude Opus/Sonnet models.</p>
+
+<p>The <a href="https://github.com/musistudio/claude-code-router">claude-code-router</a> is an open-source local proxy that intercepts Claude Code requests and redirects them to Venice.</p>
+
+<h3>Setup</h3>
+<pre><code># Install Claude Code + Router
+npm install -g @anthropic-ai/claude-code
+npm install -g @musistudio/claude-code-router
+
+# Create config
+mkdir -p ~/.claude-code-router</code></pre>
+
+<p>Create <code>~/.claude-code-router/config.json</code>:</p>
+<pre><code>{
+  "APIKEY": "",
+  "LOG": true,
+  "LOG_LEVEL": "info",
+  "API_TIMEOUT_MS": 600000,
+  "HOST": "127.0.0.1",
+  "Providers": [{
+    "name": "venice",
+    "api_base_url": "https://api.venice.ai/api/v1/chat/completions",
+    "api_key": "your-venice-api-key-here",
+    "models": ["claude-opus-45", "claude-sonnet-45", "claude-opus-4-6", "claude-sonnet-4-6"],
+    "transformer": {"use": ["anthropic"]}
+  }],
+  "Router": {
+    "default": "venice,claude-opus-45",
+    "think": "venice,claude-opus-45",
+    "background": "venice,claude-opus-45",
+    "longContext": "venice,claude-opus-45",
+    "longContextThreshold": 100000
+  }
+}</code></pre>
+
+<pre><code># Launch
+ccr start
+ccr code
+# Or: eval "$(ccr activate)" && claude</code></pre>
+
+<h3>Supported Models via Router</h3>
+<table>
+<tr><th>Model</th><th>Venice ID</th><th>Best For</th></tr>
+<tr><td>Claude Opus 4.5</td><td><code>claude-opus-45</code></td><td>Complex reasoning, large refactors</td></tr>
+<tr><td>Claude Sonnet 4.5</td><td><code>claude-sonnet-45</code></td><td>Fast iteration, everyday coding</td></tr>
+<tr><td>Claude Opus 4.6</td><td><code>claude-opus-4-6</code></td><td>Complex reasoning, large refactors</td></tr>
+<tr><td>Claude Sonnet 4.6</td><td><code>claude-sonnet-4-6</code></td><td>Fast iteration, everyday coding</td></tr>
+</table>
+
+<h3>Router Features</h3>
+<ul>
+<li><strong>Switch models on the fly:</strong> <code>/model venice,claude-sonnet-45</code> inside Claude Code</li>
+<li><strong>Visual config:</strong> <code>ccr ui</code> for browser-based editor</li>
+<li><strong>Scenario routing:</strong> <code>default</code>, <code>think</code>, <code>background</code>, <code>longContext</code></li>
+<li><strong>Caching:</strong> Works with Claude Code's native cache markers. Venice 5-min TTL by default.</li>
+</ul>
+
+<!-- ───────────────────────────────────────── -->
+<h2 id="mcp-server">Venice MCP Server</h2>
+
+<p>Official MCP (Model Context Protocol) server for Claude Code, Cline, and AI agents: <a href="https://github.com/veniceai/venice-mcp-server">veniceai/venice-mcp-server</a></p>
+
+<p>Allows AI agents to interact with Venice API endpoints as MCP tools.</p>
+
+<!-- ───────────────────────────────────────── -->
+<h2 id="best-practices">Best Practices</h2>
+
+<h3>Production Checklist</h3>
+<ol>
+<li><strong>Rate Limiting:</strong> Monitor <code>x-ratelimit-remaining-*</code> headers. Implement exponential backoff.</li>
+<li><strong>Balance Monitoring:</strong> Track <code>x-venice-balance-usd</code> / <code>x-venice-balance-diem</code> to avoid service interruptions.</li>
+<li><strong>Request Logging:</strong> Log <code>CF-RAY</code> header values for troubleshooting with Venice support.</li>
+<li><strong>Model Deprecation:</strong> Check <code>x-venice-model-deprecation-warning</code> headers proactively.</li>
+<li><strong>System Prompts:</strong> Test with and without Venice's system prompts (<code>include_venice_system_prompt: false</code>).</li>
+<li><strong>API Keys:</strong> Keep keys secure, rotate regularly, never expose in client-side code.</li>
+<li><strong>Prompt Caching:</strong> Put static content first. Use <code>prompt_cache_key</code> for conversations.</li>
+<li><strong>Structured Output:</strong> Always set <code>strict: true</code> and <code>additionalProperties: false</code>.</li>
+</ol>
+
+<h3>Model Selection Guide</h3>
+<ul>
+<li><strong>Need uncensored?</strong> → <code>venice-uncensored</code> (Private, zero data retention)</li>
+<li><strong>Need cheap + fast?</strong> → <code>zai-org-glm-4.7-flash</code> ($0.13 input) or <code>qwen3-5-9b</code> ($0.05 input)</li>
+<li><strong>Need best quality?</strong> → <code>claude-opus-4-6</code> or <code>openai-gpt-52</code></li>
+<li><strong>Need huge context?</strong> → <code>grok-4-20-beta</code> (2M tokens) or <code>gemini-3-1-pro-preview</code> (1M tokens)</li>
+<li><strong>Need privacy guarantees?</strong> → <code>e2ee-*</code> models for end-to-end encryption</li>
+<li><strong>Need vision?</strong> → <code>qwen3-vl-235b-a22b</code></li>
+<li><strong>Need tool calling?</strong> → <code>zai-org-glm-4.7</code>, any Claude, GPT, or Grok model</li>
+</ul>
+
+<h3>Endpoint Quick Reference</h3>
+<table>
+<tr><th>Endpoint</th><th>Method</th><th>Purpose</th></tr>
+<tr><td><code>/chat/completions</code></td><td>POST</td><td>Text generation, vision, tools, streaming</td></tr>
+<tr><td><code>/image/generate</code></td><td>POST</td><td>Text-to-image generation</td></tr>
+<tr><td><code>/image/edit</code></td><td>POST</td><td>AI image editing / inpainting</td></tr>
+<tr><td><code>/image/multi-edit</code></td><td>POST</td><td>Multi-image layered editing</td></tr>
+<tr><td><code>/image/upscale</code></td><td>POST</td><td>Image upscaling (2x or 4x)</td></tr>
+<tr><td><code>/image/background-remove</code></td><td>POST</td><td>Remove image backgrounds</td></tr>
+<tr><td><code>/audio/speech</code></td><td>POST</td><td>Text-to-speech (50+ voices)</td></tr>
+<tr><td><code>/audio/transcriptions</code></td><td>POST</td><td>Speech-to-text</td></tr>
+<tr><td><code>/audio/queue</code></td><td>POST</td><td>Music generation (async)</td></tr>
+<tr><td><code>/audio/retrieve</code></td><td>GET</td><td>Retrieve generated audio</td></tr>
+<tr><td><code>/audio/quote</code></td><td>POST</td><td>Get audio generation price quote</td></tr>
+<tr><td><code>/video/queue</code></td><td>POST</td><td>Video generation (async)</td></tr>
+<tr><td><code>/video/retrieve</code></td><td>GET</td><td>Retrieve generated video</td></tr>
+<tr><td><code>/video/quote</code></td><td>POST</td><td>Get video generation price quote</td></tr>
+<tr><td><code>/embeddings</code></td><td>POST</td><td>Generate vector embeddings</td></tr>
+<tr><td><code>/models</code></td><td>GET</td><td>List all available models</td></tr>
+<tr><td><code>/characters</code></td><td>GET</td><td>List AI character personas</td></tr>
+<tr><td><code>/api_keys</code></td><td>GET/POST</td><td>Manage API keys</td></tr>
+<tr><td><code>/api_keys/rate_limits</code></td><td>GET</td><td>Check rate limit status</td></tr>
+<tr><td><code>/tee/attestation</code></td><td>GET</td><td>Verify TEE attestation</td></tr>
+<tr><td><code>/tee/signature</code></td><td>GET</td><td>Verify TEE response signature</td></tr>
+</table>
+
+<hr>
+
+<h3>Resources</h3>
+<ul>
+<li><strong>Documentation:</strong> <a href="https://docs.venice.ai">docs.venice.ai</a></li>
+<li><strong>LLMs.txt:</strong> <a href="https://docs.venice.ai/llms.txt">docs.venice.ai/llms.txt</a></li>
+<li><strong>Swagger/OpenAPI:</strong> <a href="https://api.venice.ai/doc/api/swagger.yaml">swagger.yaml</a></li>
+<li><strong>API Settings:</strong> <a href="https://venice.ai/settings/api">venice.ai/settings/api</a></li>
+<li><strong>Status:</strong> <a href="https://veniceai-status.com">veniceai-status.com</a></li>
+<li><strong>Discord:</strong> <a href="https://discord.gg/askvenice">discord.gg/askvenice</a></li>
+<li><strong>Twitter:</strong> <a href="https://x.com/AskVenice">@AskVenice</a></li>
+<li><strong>GitHub:</strong> <a href="https://github.com/veniceai">github.com/veniceai</a></li>
+<li><strong>MCP Server:</strong> <a href="https://github.com/veniceai/venice-mcp-server">veniceai/venice-mcp-server</a></li>
+<li><strong>Claude Code Router:</strong> <a href="https://github.com/musistudio/claude-code-router">musistudio/claude-code-router</a></li>
+<li><strong>Blog (API guide):</strong> <a href="https://venice.ai/blog/how-to-use-venice-api">How to use Venice API</a></li>
+<li><strong>Postman Collection:</strong> <a href="https://www.postman.com/veniceai/workspace/venice-ai-workspace">Venice AI Workspace</a></li>
+</ul>
+
+<p style="margin-top: 3rem; color: var(--text-muted); font-size: 0.85rem; text-align: center;">
+  Generated from <a href="https://docs.venice.ai">docs.venice.ai</a> on April 3, 2026.<br>
+  Venice API version: 20260403 &middot; This page is designed to be consumed by AI agents.
+</p>
+
+</body>
+</html>
diff --git a/docs.json b/docs.json
index f36d3ae..c499309 100644
--- a/docs.json
+++ b/docs.json
@@ -8,6 +8,10 @@
     "dark": "#DD3300"
   },
   "favicon": "/favicon.png",
+  "banner": {
+    "content": "**AI Agent?** Get the complete, single-page API reference at [docs.venice.ai/agent-reference.html](https://docs.venice.ai/agent-reference.html) — token-optimized, every endpoint, zero crawling needed.",
+    "dismissible": true
+  },
   "styling": {
     "codeblocks": "system"
   },
@@ -20,6 +24,11 @@
             "group": "Overview",
             "pages": [
               "overview/about-venice",
+              {
+                "title": "Agent API Reference",
+                "href": "https://docs.venice.ai/agent-reference.html",
+                "icon": "robot"
+              },
               "overview/getting-started",
               "overview/privacy",
               "overview/pricing",
@@ -238,7 +247,9 @@
       "twitter:image": "https://venice.ai/images/venice_social_preview_x.png",
       "twitter:site": "@AskVenice",
       "og:image:width": "1200",
-      "og:image:height": "630"
+      "og:image:height": "630",
+      "ai-agent-reference": "https://docs.venice.ai/agent-reference.html",
+      "ai-agent-reference-description": "Complete, token-optimized single-page Venice API reference for AI agents"
     },
     "indexing": "navigable"
   },
diff --git a/llms.txt b/llms.txt
index d1b3580..cfbf9a0 100644
--- a/llms.txt
+++ b/llms.txt
@@ -1,5 +1,7 @@
 # Venice API
 
+> **AGENT NOTICE:** A complete, token-optimized single-page API reference is available at https://docs.venice.ai/agent-reference.html — it covers every endpoint, every model type, every gotcha, and every code example in one page. Use it instead of crawling these docs.
+
 > Venice is a privacy-first, uncensored AI API platform offering text generation, image generation, audio synthesis, video generation, music, and embeddings with zero data retention and OpenAI SDK compatibility.
 
 Venice provides permissionless access to AI models with no content filtering, making it ideal for developers building applications that require uncensored outputs, privacy guarantees, and full control over AI interactions. The API is fully compatible with OpenAI's SDK—just change the base URL to `https://api.venice.ai/api/v1`.
diff --git a/overview/about-venice.mdx b/overview/about-venice.mdx
index 12f7dfe..f2bac92 100644
--- a/overview/about-venice.mdx
+++ b/overview/about-venice.mdx
@@ -3,6 +3,10 @@ title: Venice API
 "og:title": "Venice API Docs" 
 ---
 
+<Note>
+**AI Agent Reference:** If you are an AI agent or building with an agent framework (OpenClaw, Eliza, LangChain, CrewAI, etc.), see the [Complete Agent API Reference](https://docs.venice.ai/agent-reference.html) — a single-page, token-optimized resource covering every Venice API endpoint, model, and integration pattern.
+</Note>
+
 Build AI with no data retention, permissionless access, and compute you permanently own.
 
 <CardGroup cols={3}>
diff --git a/overview/guides/ai-agents.mdx b/overview/guides/ai-agents.mdx
index 0fb07b1..108bf8a 100644
--- a/overview/guides/ai-agents.mdx
+++ b/overview/guides/ai-agents.mdx
@@ -5,6 +5,15 @@ description: "Venice is supported with the following AI Agent communities."
 "og:description": "Venice is supported with the following AI Agent communities"
 ---
 
+## Agent API Reference
+
+For a complete, single-page API reference optimized for AI agents, see:
+**[docs.venice.ai/agent-reference.html](https://docs.venice.ai/agent-reference.html)**
+
+This page covers every endpoint, model type, code example, and common pitfall in a single token-efficient document.
+
+## Agent Communities
+
 * [Coinbase Agentkit](https://www.coinbase.com/developer-platform/discover/launches/introducing-agentkit)
 
 * [Eliza](https://github.com/ai16z/eliza) - Venice support introduced via this [PR](https://github.com/ai16z/eliza/pull/1008).

	OpenAI	Venice
Endpoint	`/images/generations`	`/image/generate`
Response	`data[0].b64_json`	`images[0]` (raw base64)
SDK	`client.images.generate()`	Use `requests.post()` or `fetch()` directly
Type	Returns	Count
`text` (default)	Chat/completion LLMs	~68
`image`	Image generation models	~26
`video`	Text-to-video and image-to-video	~65
`tts`	Text-to-speech models	3
`embedding`	Vector embedding models	1
`music`	Music generation models	~6
(none / omitted)	Text models only	~68
Type	Recommended Model	Why
Text (general)	`zai-org-glm-4.7`	Best balance of cost, speed, and capability. Private.
Text (uncensored)	`venice-uncensored`	No content filtering. Private.
Text (cheap/fast)	`zai-org-glm-4.7-flash`	$0.13/M input. Great for classification.
Text (vision)	`qwen3-vl-235b-a22b`	Image understanding + text.
Image	`venice-sd35`	$0.01/image, Private, works with all features.
Image (quality)	`recraft-v4-pro`	$0.29/image, highest quality.
TTS	`tts-kokoro`	50+ voices, cheapest ($3.50/1M chars).
STT	`nvidia/parakeet-tdt-0.6b-v3`	Fast, accurate. NOT in /models API.
Embedding	`text-embedding-bge-m3`	Only option. 1024 dimensions.
Video	`wan-2.6-text-to-video`	Good quality, reasonable price.
Music	`ace-step-15`	$0.03-0.08 per song. Cheapest.
Tier	How It Works
Anonymized	Third-party models (Claude, GPT, Gemini, Grok) with all identifying metadata stripped before forwarding.
Private	Zero data retention. Self-hosted open-source models. No logs, no storage.
TEE	Models running inside hardware-secured enclaves (Intel TDX / NVIDIA CC). Venice cannot access the computation.
E2EE	End-to-end encrypted. Prompts encrypted client-side before sending. Only the TEE can decrypt them.
Language	Config Change
Python	`base_url="https://api.venice.ai/api/v1"`
JavaScript	`baseURL: "https://api.venice.ai/api/v1"`
Go	`client.BaseURL = "https://api.venice.ai/api/v1"`
cURL	Replace `https://api.openai.com/v1` with `https://api.venice.ai/api/v1`
PHP / C# / Java / Swift	Set base URL to `https://api.venice.ai/api/v1`
Endpoint	Compatible?	Notes
`/chat/completions`	✅ Yes	Full drop-in. Tools, streaming, structured output all work.
`/audio/speech`	✅ Yes	Same request/response format as OpenAI TTS.
`/audio/transcriptions`	✅ Yes	Same multipart format. Use Venice model names.
`/embeddings`	✅ Yes	Same request/response format.
`/models`	⚠️ Partial	Same format but defaults to text-only. Must filter by type.
`/image/generate`	❌ No	Different path AND response format. Use raw HTTP.
`/video/queue`	❌ No	Venice-specific async pattern.
Field	Type	Description
`model`	string (required)	Model ID, e.g. `zai-org-glm-4.7`
`messages`	array (required)	Array of message objects with `role` and `content`
`temperature`	number	Sampling temperature (0-2). Default: model-specific
`max_tokens`	integer	Max tokens in the response
`top_p`	number	Nucleus sampling (0-1)
`frequency_penalty`	number	Penalize repeated tokens (-2 to 2)
`presence_penalty`	number	Penalize new topic tokens (-2 to 2)
`stream`	boolean	Enable SSE streaming
`tools`	array	Function definitions for tool calling
`response_format`	object	Structured output schema (JSON mode)
`reasoning_effort`	string	Control reasoning depth: `none`, `low`, `medium`, `high`, `xhigh`, `max`
`venice_parameters`	object	Venice-specific extensions (see below)
Role	Purpose
`system`	Instructions for model behavior
`user`	Prompts or questions
`assistant`	Previous model responses (multi-turn)
`tool`	Function calling results
Parameter	Type	Default	Description
`enable_web_search`	`"off"` \| `"on"` \| `"auto"`	`"off"`	Enable real-time web search. Additional pricing applies ($10/1K requests).
`enable_web_scraping`	boolean	`false`	Scrape up to 5 URLs detected in user message. $10/1K URLs.
`enable_x_search`	boolean	`false`	Enable xAI native search (web + X/Twitter) for Grok models.
`enable_web_citations`	boolean	`false`	Include `[REF]0[/REF]` citations in web search results.
`include_search_results_in_stream`	boolean	`false`	Experimental: emit search results as first stream chunk.
`return_search_results_as_documents`	boolean	`false`	Return results as `venice_web_search_documents` tool call (LangChain compatible).
`include_venice_system_prompt`	boolean	`true`	Include Venice's default system prompts alongside yours.
`strip_thinking_response`	boolean	`false`	Strip `<think>` blocks from response (legacy tag format).
`disable_thinking`	boolean	`false`	Disable reasoning and strip thinking blocks entirely.
`character_slug`	string	—	Use a specific AI character persona.
SDK	Syntax
cURL / raw JSON	`"venice_parameters": { ... }` at top level
Python OpenAI	`extra_body={"venice_parameters": { ... }}`
JavaScript OpenAI	`venice_parameters: { ... }` at top level (TypeScript: `// @ts-ignore`)
Go / PHP / C# / Java	Use model suffix syntax