diff --git a/agent-reference.html b/agent-reference.html new file mode 100644 index 0000000..0bd94d2 --- /dev/null +++ b/agent-reference.html @@ -0,0 +1,2109 @@ + + + + + + + + + + + +Venice AI API — Complete Agent Reference + + + + +

Venice AI API — Complete Agent Reference

+

Everything an AI agent needs to know to build with Venice.ai. Privacy-first, uncensored, OpenAI-compatible.

+

Base URL: https://api.venice.ai/api/v1
+Auth: Authorization: Bearer VENICE_API_KEY
+Swagger: swagger.yaml · +Docs: docs.venice.ai · +LLMs.txt: llms.txt

+ + + + +

⚠ Agent Pitfalls — Read This First

+ +

Venice has excellent capabilities but several default behaviors that trip up agents. Read these before making your first call.

+ +
+

1. /models returns TEXT ONLY by default

+

GET /models returns ~68 text models. Image, video, audio, and embedding models are hidden unless you pass a ?type= filter. This is the #1 reason agents fail to discover non-text capabilities.

+
# ❌ Returns text models only
+GET /api/v1/models
+
+# ✅ Filter by type to discover other models
+GET /api/v1/models?type=image       # 26+ image models
+GET /api/v1/models?type=video       # 65+ video models
+GET /api/v1/models?type=tts         # 3 TTS models
+GET /api/v1/models?type=embedding   # 1 embedding model
+GET /api/v1/models?type=music       # music generation models
+
+ +
+

2. ?type=audio returns ZERO models

+

The TTS model type is tts, NOT audio. Using ?type=audio returns an empty list.

+
# ❌ Returns 0 models
+GET /api/v1/models?type=audio
+
+# ✅ Correct
+GET /api/v1/models?type=tts
+
+ +
+

3. STT models are INVISIBLE in the /models API

+

Speech-to-text models (nvidia/parakeet-tdt-0.6b-v3, openai/whisper-large-v3) do NOT appear in any /models query. You must know the model names in advance. They are hardcoded here for your convenience.

+
+ +
+

4. Image API is NOT OpenAI-compatible

+

Venice's image endpoints use different paths and response formats than OpenAI:

+ + + + + +
OpenAIVenice
Endpoint/images/generations/image/generate
Responsedata[0].b64_jsonimages[0] (raw base64)
SDKclient.images.generate()Use requests.post() or fetch() directly
+

Agents using the OpenAI SDK's client.images.generate() will fail. Use raw HTTP requests for images.

+
+ +
+

5. Video and Music are async-only (queue + poll)

+

Unlike text/image/TTS (which return results immediately), video and music use an async pattern:

+
    +
  1. POST /video/queue → get queue_id
  2. +
  3. POST /video/retrieve with queue_id → poll until Content-Type: video/mp4
  4. +
+

There is no synchronous video/music endpoint. See the Video and Audio sections for complete polling examples.

+
+ +

Valid /models ?type= Values

+ + + + + + + + + +
TypeReturnsCount
text (default)Chat/completion LLMs~68
imageImage generation models~26
videoText-to-video and image-to-video~65
ttsText-to-speech models3
embeddingVector embedding models1
musicMusic generation models~6
(none / omitted)Text models only~68
+ +
+STT models not in API: Use nvidia/parakeet-tdt-0.6b-v3 or openai/whisper-large-v3 directly — they work but won't appear in any /models query. +
+ + +

Model Discovery

+ +

GET /models

+ +

Returns models filtered by type. Defaults to text-only.

+ +
# Discover image models
+curl "https://api.venice.ai/api/v1/models?type=image" \
+  -H "Authorization: Bearer $VENICE_API_KEY"
+
+# Discover TTS models (NOT ?type=audio!)
+curl "https://api.venice.ai/api/v1/models?type=tts" \
+  -H "Authorization: Bearer $VENICE_API_KEY"
+
+# Discover video models
+curl "https://api.venice.ai/api/v1/models?type=video" \
+  -H "Authorization: Bearer $VENICE_API_KEY"
+ +

Key Response Fields Per Model

+
{
+  "id": "zai-org-glm-4.7",
+  "type": "text",
+  "object": "model",
+  "owned_by": "venice.ai",
+  "model_spec": {
+    "availableContextTokens": 198000,
+    "capabilities": {
+      "supportsFunctionCalling": true,
+      "supportsResponseSchema": true,
+      "supportsWebSearch": true,
+      "supportsReasoning": true,
+      "supportsReasoningEffort": true
+    }
+  }
+}
+ +

Recommended Default Models by Type

+ + + + + + + + + + + + + +
TypeRecommended ModelWhy
Text (general)zai-org-glm-4.7Best balance of cost, speed, and capability. Private.
Text (uncensored)venice-uncensoredNo content filtering. Private.
Text (cheap/fast)zai-org-glm-4.7-flash$0.13/M input. Great for classification.
Text (vision)qwen3-vl-235b-a22bImage understanding + text.
Imagevenice-sd35$0.01/image, Private, works with all features.
Image (quality)recraft-v4-pro$0.29/image, highest quality.
TTStts-kokoro50+ voices, cheapest ($3.50/1M chars).
STTnvidia/parakeet-tdt-0.6b-v3Fast, accurate. NOT in /models API.
Embeddingtext-embedding-bge-m3Only option. 1024 dimensions.
Videowan-2.6-text-to-videoGood quality, reasonable price.
Musicace-step-15$0.03-0.08 per song. Cheapest.
+ + +

Overview & Philosophy

+ +

Venice is a privacy-first, uncensored AI API platform offering text generation, image creation, audio synthesis, video generation, music, and embeddings — all with zero data retention and full OpenAI SDK compatibility.

+ +

Venice provides permissionless access to AI models with no content filtering, making it ideal for developers building applications that require uncensored outputs, privacy guarantees, and full control over AI interactions.

+ +

Four Privacy Tiers

+ + + + + + +
TierHow It Works
AnonymizedThird-party models (Claude, GPT, Gemini, Grok) with all identifying metadata stripped before forwarding.
PrivateZero data retention. Self-hosted open-source models. No logs, no storage.
TEEModels running inside hardware-secured enclaves (Intel TDX / NVIDIA CC). Venice cannot access the computation.
E2EEEnd-to-end encrypted. Prompts encrypted client-side before sending. Only the TEE can decrypt them.
+ +

Key Differentiators

+ + + +

Quickstart

+ +

1. Get an API Key

+

Generate at venice.ai/settings/api.

+ +

2. Set Environment Variable

+
export VENICE_API_KEY='your-api-key-here'
+ +

3. Install SDK (optional — any OpenAI SDK works)

+
# Python
+pip install openai
+
+# Node.js
+npm install openai
+ +

4. Make Your First Request

+ +

Python

+
from openai import OpenAI
+
+client = OpenAI(
+    api_key="your-api-key",
+    base_url="https://api.venice.ai/api/v1"
+)
+
+response = client.chat.completions.create(
+    model="venice-uncensored",
+    messages=[{"role": "user", "content": "Hello World!"}]
+)
+
+print(response.choices[0].message.content)
+ +

JavaScript / TypeScript

+
import OpenAI from "openai";
+
+const client = new OpenAI({
+  apiKey: process.env.VENICE_API_KEY,
+  baseURL: "https://api.venice.ai/api/v1",
+});
+
+const completion = await client.chat.completions.create({
+  model: "venice-uncensored",
+  messages: [{ role: "user", content: "Hello World!" }],
+});
+
+console.log(completion.choices[0].message.content);
+ +

cURL

+
curl https://api.venice.ai/api/v1/chat/completions \
+  -H "Authorization: Bearer $VENICE_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "venice-uncensored",
+    "messages": [{"role": "user", "content": "Hello World!"}]
+  }'
+ + +

Authentication

+ +

All requests require HTTP Bearer authentication:

+
Authorization: Bearer VENICE_API_KEY
+ +

API keys are managed at venice.ai/settings/api. Keep your key secret — never expose it in client-side code.

+ +

You can also manage keys programmatically via the /api/v1/api_keys endpoints.

+ + +

OpenAI Compatibility

+ +

Venice implements the OpenAI API specification. Any OpenAI client library works — just change the base URL:

+ + + + + + + + +
LanguageConfig Change
Pythonbase_url="https://api.venice.ai/api/v1"
JavaScriptbaseURL: "https://api.venice.ai/api/v1"
Goclient.BaseURL = "https://api.venice.ai/api/v1"
cURLReplace https://api.openai.com/v1 with https://api.venice.ai/api/v1
PHP / C# / Java / SwiftSet base URL to https://api.venice.ai/api/v1
+ +

Key Differences from OpenAI

+
    +
  1. venice_parameters — additional config for web search, characters, reasoning control, etc.
  2. +
  3. System Prompts — Venice appends defaults that optimize uncensored responses (disable with include_venice_system_prompt: false).
  4. +
  5. Model IDs — Use Venice model IDs (e.g. zai-org-glm-4.7), not OpenAI's.
  6. +
  7. Response Headers — Unique headers for balance tracking, model deprecation, content safety.
  8. +
  9. Content Policies — More permissive, with dedicated uncensored models.
  10. +
  11. /models returns text-only by default — Must use ?type=image, ?type=tts, etc. to discover non-text models. See Agent Pitfalls.
  12. +
  13. Image API is NOT compatible — Different endpoint path (/image/generate vs /images/generations) and response format (images[] vs data[].b64_json). Do not use client.images.generate().
  14. +
  15. Video/Music are async-only — Queue + poll pattern, not synchronous.
  16. +
+ +

What IS OpenAI-Compatible (drop-in)

+ + + + + + + + + +
EndpointCompatible?Notes
/chat/completions✅ YesFull drop-in. Tools, streaming, structured output all work.
/audio/speech✅ YesSame request/response format as OpenAI TTS.
/audio/transcriptions✅ YesSame multipart format. Use Venice model names.
/embeddings✅ YesSame request/response format.
/models⚠️ PartialSame format but defaults to text-only. Must filter by type.
/image/generate❌ NoDifferent path AND response format. Use raw HTTP.
/video/queue❌ NoVenice-specific async pattern.
+ + +

Chat Completions

+ +

POST /chat/completions

+ +

The primary text generation endpoint. Supports text, vision, tool calling, streaming, and multimodal inputs (images, audio, video).

+ +

Request Body

+ + + + + + + + + + + + + + +
FieldTypeDescription
modelstring (required)Model ID, e.g. zai-org-glm-4.7
messagesarray (required)Array of message objects with role and content
temperaturenumberSampling temperature (0-2). Default: model-specific
max_tokensintegerMax tokens in the response
top_pnumberNucleus sampling (0-1)
frequency_penaltynumberPenalize repeated tokens (-2 to 2)
presence_penaltynumberPenalize new topic tokens (-2 to 2)
streambooleanEnable SSE streaming
toolsarrayFunction definitions for tool calling
response_formatobjectStructured output schema (JSON mode)
reasoning_effortstringControl reasoning depth: none, low, medium, high, xhigh, max
venice_parametersobjectVenice-specific extensions (see below)
+ +

Message Roles

+ + + + + + +
RolePurpose
systemInstructions for model behavior
userPrompts or questions
assistantPrevious model responses (multi-turn)
toolFunction calling results
+ + +

Venice Parameters

+ +

The venice_parameters object extends the OpenAI spec with Venice-specific features. Pass it as a top-level field in your request body.

+ + + + + + + + + + + + + +
ParameterTypeDefaultDescription
enable_web_search"off" | "on" | "auto""off"Enable real-time web search. Additional pricing applies ($10/1K requests).
enable_web_scrapingbooleanfalseScrape up to 5 URLs detected in user message. $10/1K URLs.
enable_x_searchbooleanfalseEnable xAI native search (web + X/Twitter) for Grok models.
enable_web_citationsbooleanfalseInclude [REF]0[/REF] citations in web search results.
include_search_results_in_streambooleanfalseExperimental: emit search results as first stream chunk.
return_search_results_as_documentsbooleanfalseReturn results as venice_web_search_documents tool call (LangChain compatible).
include_venice_system_promptbooleantrueInclude Venice's default system prompts alongside yours.
strip_thinking_responsebooleanfalseStrip <think> blocks from response (legacy tag format).
disable_thinkingbooleanfalseDisable reasoning and strip thinking blocks entirely.
character_slugstringUse a specific AI character persona.
+ +

Model Suffix Syntax

+

Venice parameters can also be appended directly to the model name as URL-style suffixes. Useful for SDKs that don't support extra body parameters:

+
"model": "zai-org-glm-4.7:enable_web_search=auto&enable_web_citations=true"
+ +

Passing Venice Parameters by SDK

+ + + + + + +
SDKSyntax
cURL / raw JSON"venice_parameters": { ... } at top level
Python OpenAIextra_body={"venice_parameters": { ... }}
JavaScript OpenAIvenice_parameters: { ... } at top level (TypeScript: // @ts-ignore)
Go / PHP / C# / JavaUse model suffix syntax
+ + +

Streaming

+ +

Set stream: true for Server-Sent Events (SSE) streaming:

+ +
# Python
+stream = client.chat.completions.create(
+    model="venice-uncensored",
+    messages=[{"role": "user", "content": "Write a story"}],
+    stream=True
+)
+for chunk in stream:
+    if chunk.choices and chunk.choices[0].delta.content is not None:
+        print(chunk.choices[0].delta.content, end="")
+ +
// JavaScript
+const stream = await client.chat.completions.create({
+    model: "venice-uncensored",
+    messages: [{ role: "user", content: "Write a story" }],
+    stream: true
+});
+for await (const chunk of stream) {
+    if (chunk.choices?.[0]?.delta?.content) {
+        process.stdout.write(chunk.choices[0].delta.content);
+    }
+}
+ + +

Structured Responses (JSON Schema)

+ +

Force the model to output JSON matching a specific schema using response_format:

+ +
{
+  "model": "venice-uncensored",
+  "messages": [
+    {"role": "system", "content": "You are a helpful math tutor."},
+    {"role": "user", "content": "solve 8x + 31 = 2"}
+  ],
+  "response_format": {
+    "type": "json_schema",
+    "json_schema": {
+      "name": "math_response",
+      "strict": true,
+      "schema": {
+        "type": "object",
+        "properties": {
+          "steps": {
+            "type": "array",
+            "items": {
+              "type": "object",
+              "properties": {
+                "explanation": {"type": "string"},
+                "output": {"type": "string"}
+              },
+              "required": ["explanation", "output"],
+              "additionalProperties": false
+            }
+          },
+          "final_answer": {"type": "string"}
+        },
+        "required": ["steps", "final_answer"],
+        "additionalProperties": false
+      }
+    }
+  }
+}
+ +
+Requirements: + +
+ + +

Reasoning Models

+ +

Some models produce visible chain-of-thought reasoning. Thinking appears in a separate reasoning_content field, keeping content clean.

+ +
response = client.chat.completions.create(
+    model="zai-org-glm-4.7",
+    messages=[{"role": "user", "content": "What is 15% of 240?"}]
+)
+thinking = response.choices[0].message.reasoning_content
+answer = response.choices[0].message.content
+ +

Reasoning Effort Levels

+ + + + + + + + + +
ValueDescription
noneDisables reasoning
minimalBasic reasoning
lowLight reasoning for simple problems
mediumBalanced (recommended default)
highDeep reasoning for complex problems
xhighExtra-high depth
maxMaximum capability
+ +
# Pass via reasoning object
+extra_body={"reasoning": {"effort": "high"}}
+
+# Or flat format
+extra_body={"reasoning_effort": "high"}
+ +

Model Support for Reasoning Effort

+ + + + + + + + + + +
ModelSupported Values
GPT-5.2none, low, medium, high, xhigh
Claude Opus 4.6low, medium, high, max
Claude Opus 4.5, Sonnet 4.5/4.6low, medium, high
Gemini 3 Prolow, high
Gemini 3.1 Prolow, medium, high
GLM 4.7, Qwen 3 Thinking, Kimi K2.5, MiniMaxlow, medium, high
Grok modelsNot supported
DeepSeek R1Built-in only, not configurable
+ +

Disabling Reasoning

+
# Recommended: Venice-level toggle
+extra_body={"reasoning": {"enabled": False}}
+
+# Alternative: provider-level (only some models)
+extra_body={"reasoning": {"effort": "none"}}
+ + +

Vision / Multimodal

+ +

Pass images alongside text using vision-capable models. Images can be URLs or base64 data URIs.

+ +
response = client.chat.completions.create(
+    model="qwen3-vl-235b-a22b",
+    messages=[{
+        "role": "user",
+        "content": [
+            {"type": "text", "text": "What is in this image?"},
+            {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
+        ]
+    }]
+)
+ +

Vision models: qwen3-vl-235b-a22b, mistral-small-3-2-24b-instruct (with suffix), and E2EE variants.

+ + +

Function / Tool Calling

+ +

Define tools for models to call external APIs. Works with the OpenAI function calling spec. Below is a complete round-trip showing the full loop.

+ +

Step 1: Send request with tool definitions

+
import json
+from openai import OpenAI
+
+client = OpenAI(api_key="your-key", base_url="https://api.venice.ai/api/v1")
+
+tools = [{
+    "type": "function",
+    "function": {
+        "name": "get_weather",
+        "description": "Get current weather for a location",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "location": {"type": "string", "description": "City name"}
+            },
+            "required": ["location"]
+        }
+    }
+}]
+
+messages = [{"role": "user", "content": "What's the weather in NYC?"}]
+
+response = client.chat.completions.create(
+    model="zai-org-glm-4.7",
+    messages=messages,
+    tools=tools,
+    tool_choice="auto"  # "auto" | "required" | {"type":"function","function":{"name":"get_weather"}}
+)
+ +

Step 2: Model returns a tool_calls array

+
# The response.choices[0].message looks like:
+# {
+#   "role": "assistant",
+#   "content": null,
+#   "tool_calls": [
+#     {
+#       "id": "call_abc123",
+#       "type": "function",
+#       "function": {
+#         "name": "get_weather",
+#         "arguments": "{\"location\": \"New York\"}"
+#       }
+#     }
+#   ]
+# }
+
+assistant_message = response.choices[0].message
+tool_call = assistant_message.tool_calls[0]
+function_args = json.loads(tool_call.function.arguments)
+ +

Step 3: Execute the function and feed the result back

+
# Execute your actual function
+weather_result = {"temperature": 72, "condition": "sunny", "humidity": 45}
+
+# Append the assistant's tool call message + the tool result
+messages.append(assistant_message)
+messages.append({
+    "role": "tool",
+    "tool_call_id": tool_call.id,        # must match the tool_call id
+    "content": json.dumps(weather_result)  # result as a JSON string
+})
+
+# Get the final response
+final_response = client.chat.completions.create(
+    model="zai-org-glm-4.7",
+    messages=messages,
+    tools=tools
+)
+
+print(final_response.choices[0].message.content)
+# "The current weather in New York is 72°F and sunny with 45% humidity."
+ +

tool_choice Options

+ + + + + + +
ValueBehavior
"auto"Model decides whether to call a tool (default)
"required"Model must call at least one tool
"none"Model must not call any tools
{"type": "function", "function": {"name": "get_weather"}}Force a specific tool
+ +
+Note: Some models may return multiple tool calls in a single response (parallel tool calling). Process each one and return a separate role: "tool" message for each tool_call_id. Parallel tool calls are not compatible with structured response_format. +
+ +

Models with function calling: zai-org-glm-4.7, zai-org-glm-5, qwen3-4b, mistral-small-3-2-24b-instruct, llama-3.2-3b, and all Claude / GPT / Gemini / Grok models. Check supportsFunctionCalling in the /models endpoint.

+ + + + +

Enable real-time web search on any text model:

+ +
# Via venice_parameters
+{
+  "model": "zai-org-glm-4.7",
+  "messages": [{"role": "user", "content": "Latest AI news"}],
+  "venice_parameters": {
+    "enable_web_search": "auto",
+    "enable_web_citations": true
+  }
+}
+
+# Via model suffix
+{
+  "model": "zai-org-glm-4.7:enable_web_search=on&enable_web_citations=true",
+  "messages": [{"role": "user", "content": "Latest AI news"}]
+}
+ +

Web Scraping

+

Automatically scrapes up to 5 URLs detected in the user message:

+
"venice_parameters": {
+  "enable_web_scraping": true
+}
+ +

X Search (xAI)

+

For Grok models, enables xAI's native search across web + X/Twitter:

+
"venice_parameters": {
+  "enable_x_search": true
+}
+ +

Pricing

+ + + + + +
FeaturePrice
Web Search$10.00 / 1K requests
Web Scraping$10.00 / 1K URLs
X Search$10.00 / 1K results
+ + +

Image Generation

+ +

POST /image/generate

+ +
+NOT OpenAI-compatible. Venice uses /image/generate (not /images/generations) and returns images[0] (not data[0].b64_json). Do NOT use the OpenAI SDK's client.images.generate() — use raw HTTP requests instead. +
+ +
curl https://api.venice.ai/api/v1/image/generate \
+  -H "Authorization: Bearer $VENICE_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "venice-sd35",
+    "prompt": "A cyberpunk city with neon lights and rain",
+    "width": 1024,
+    "height": 1024,
+    "format": "webp"
+  }'
+ +

Request Parameters

+ + + + + + + + + + + + + + + + + + +
FieldTypeDefaultDescription
modelstring (required)Image model ID
promptstring (required)Image description (max 7500 chars)
widthinteger1024Width in pixels (max 1280)
heightinteger1024Height in pixels (max 1280)
formatjpeg|png|webpwebpOutput format
negative_promptstringWhat to exclude from the image
cfg_scalenumber7.5Prompt adherence (0-20)
seedintegerrandomReproducibility seed
variantsinteger1Number of images (1-4, requires return_binary: false)
style_presetstringe.g. "3D Model", "Anime", etc.
aspect_ratiostringe.g. "1:1", "16:9" (certain models)
resolutionstring"1K", "2K", "4K" (certain models like Nano Banana)
safe_modebooleantrueBlur adult content
return_binarybooleanfalseReturn raw image bytes instead of base64 JSON
embed_exif_metadatabooleanfalseEmbed prompt info in EXIF
hide_watermarkbooleanfalseHide Venice watermark
+ +

Response (return_binary: false — default)

+
{
+  "id": "generate-image-1234567890",
+  "images": [
+    "/9j/4AAQSkZJRgABAQ..."   // base64-encoded image data
+  ],
+  "timing": {
+    "total": 3200,
+    "inferenceDuration": 2800,
+    "inferencePreprocessingTime": 150,
+    "inferenceQueueTime": 250
+  }
+}
+ +

Save to file (Python)

+
import base64, requests, os
+
+resp = requests.post("https://api.venice.ai/api/v1/image/generate",
+    headers={"Authorization": f"Bearer {os.getenv('VENICE_API_KEY')}",
+             "Content-Type": "application/json"},
+    json={"model": "venice-sd35", "prompt": "A sunset over Venice", "width": 1024, "height": 1024}
+).json()
+
+with open("output.webp", "wb") as f:
+    f.write(base64.b64decode(resp["images"][0]))
+ +

Response (return_binary: true)

+

Returns raw image bytes with Content-Type: image/webp (or image/png, image/jpeg depending on format). Save directly:

+
curl https://api.venice.ai/api/v1/image/generate \
+  -H "Authorization: Bearer $VENICE_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{"model":"venice-sd35","prompt":"A sunset","return_binary":true}' \
+  -o output.webp
+ +

Image Models (Selection)

+ + + + + + + + + + + +
ModelIDPricePrivacy
Recraft V4 Prorecraft-v4-pro$0.29/imgAnonymized
GPT Image 1.5gpt-image-1-5$0.26/imgAnonymized
Nano Banana Pronano-banana-pro$0.18-$0.35Anonymized
Qwen Image 2 Proqwen-image-2-pro$0.10/imgAnonymized
Flux 2 Maxflux-2-max$0.09/imgAnonymized
Venice SD35venice-sd35$0.01/imgPrivate
Qwen Imageqwen-image$0.01/imgPrivate
Chromachroma$0.01/imgPrivate
Z-Image Turboz-image-turbo$0.01/imgPrivate
+ + +

Image Editing & Upscaling

+ +

Image Editing

+

POST /image/edit

+

AI-powered inpainting. Send base64-encoded image + text prompt. Returns edited image as raw binary (Content-Type: image/png).

+ +
# Python — edit an image and save result
+import base64, requests, os
+
+with open("photo.jpg", "rb") as f:
+    image_b64 = base64.b64encode(f.read()).decode()
+
+resp = requests.post("https://api.venice.ai/api/v1/image/edit",
+    headers={"Authorization": f"Bearer {os.getenv('VENICE_API_KEY')}",
+             "Content-Type": "application/json"},
+    json={"prompt": "Make it look like a watercolor painting", "image": image_b64}
+)
+
+with open("edited.png", "wb") as f:
+    f.write(resp.content)
+ +
# cURL
+curl -X POST https://api.venice.ai/api/v1/image/edit \
+  -H "Authorization: Bearer $VENICE_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{"prompt": "Colorize", "image": "'$(base64 -i photo.jpg)'"}' \
+  -o edited.png
+ +

Edit models: qwen-image (default), flux-2-max-edit, gpt-image-1-5-edit, qwen-image-2-edit, nano-banana-pro-edit, seedream-v4-edit, grok-imagine-edit.

+ +

Multi-Edit

+

POST /image/multi-edit

+

Combine and edit up to 3 images with layered inputs. Send an array of base64 images with per-layer prompts. Returns binary image data.

+ +

Image Upscaling

+

POST /image/upscale

+

Returns upscaled image as raw binary (Content-Type: image/png).

+ +
# Python
+import base64, requests, os
+
+with open("small.jpg", "rb") as f:
+    image_b64 = base64.b64encode(f.read()).decode()
+
+resp = requests.post("https://api.venice.ai/api/v1/image/upscale",
+    headers={"Authorization": f"Bearer {os.getenv('VENICE_API_KEY')}",
+             "Content-Type": "application/json"},
+    json={"image": image_b64, "scale": 4}   # 2 or 4
+)
+
+with open("upscaled.png", "wb") as f:
+    f.write(resp.content)
+ +
# cURL
+curl -X POST https://api.venice.ai/api/v1/image/upscale \
+  -H "Authorization: Bearer $VENICE_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{"image": "'$(base64 -i small.jpg)'", "scale": 2}' \
+  -o upscaled.png
+ +

Pricing: 2x = $0.02, 4x = $0.08. Model: upscaler.

+ +

Background Removal

+

POST /image/background-remove

+

Returns a PNG with transparent background. Accepts base64, file upload (multipart), or URL.

+ +
# Via URL (easiest)
+curl -X POST https://api.venice.ai/api/v1/image/background-remove \
+  -H "Authorization: Bearer $VENICE_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{"image_url": "https://example.com/photo.jpg"}' \
+  -o no-bg.png
+
+# Via base64
+curl -X POST https://api.venice.ai/api/v1/image/background-remove \
+  -H "Authorization: Bearer $VENICE_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{"image": "'$(base64 -i photo.jpg)'"}' \
+  -o no-bg.png
+ +

Model: bria-bg-remover. $0.03/image. Max file size: 25MB.

+ + +

Audio (TTS, STT, Music)

+ +

Text-to-Speech

+

POST /audio/speech

+ +
{
+  "input": "Hello, welcome to Venice Voice.",
+  "model": "tts-kokoro",
+  "voice": "af_sky",
+  "response_format": "mp3",
+  "speed": 1.0
+}
+ + + + + + + + + + + +
FieldTypeDescription
inputstring (required)Text to speak (max 4096 chars)
modelstringtts-kokoro, tts-qwen3-0-6b, or tts-qwen3-1-7b
voicestringVoice ID (60+ options). Default: af_sky
response_formatstringmp3, opus, aac, flac, wav, pcm
speednumber0.25 to 4.0 (default 1.0)
streamingbooleanStream sentence by sentence
languagestringQwen 3 TTS only: Auto, English, Chinese, Spanish, French, etc.
promptstringQwen 3 TTS only: emotion/style prompt (max 500 chars)
+ +

Kokoro Voices (selection)

+

af_sky, af_nova, af_bella, af_heart, am_adam, am_echo, am_liam, am_michael, bf_emma, bf_lily, bm_george, zf_xiaobei, jm_kumo, ff_siwis, pf_dora, and many more.

+ +

Qwen 3 TTS Voices

+

Vivian, Serena, Dylan, Eric, Ryan, Aiden, Ono_Anna, Sohee, Uncle_Fu.

+ +
+TTS Response: Returns raw audio bytes (not JSON). The Content-Type matches your response_format (e.g. audio/mpeg for mp3). Save directly to file. +
+ +
# cURL — save TTS output
+curl -X POST https://api.venice.ai/api/v1/audio/speech \
+  -H "Authorization: Bearer $VENICE_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{"input":"Hello world","model":"tts-kokoro","voice":"af_sky"}' \
+  -o speech.mp3
+ +

Speech-to-Text

+

POST /audio/transcriptions

+

Multipart form upload. Supports WAV, FLAC, MP3, M4A, AAC, MP4.

+ +
# cURL
+curl -X POST https://api.venice.ai/api/v1/audio/transcriptions \
+  -H "Authorization: Bearer $VENICE_API_KEY" \
+  -F file=@recording.mp3 \
+  -F model=nvidia/parakeet-tdt-0.6b-v3 \
+  -F response_format=json
+ +
# Python
+from openai import OpenAI
+
+client = OpenAI(api_key="your-key", base_url="https://api.venice.ai/api/v1")
+
+with open("recording.mp3", "rb") as f:
+    transcript = client.audio.transcriptions.create(
+        model="nvidia/parakeet-tdt-0.6b-v3",
+        file=f,
+        response_format="json"
+    )
+print(transcript.text)
+ +

Models: nvidia/parakeet-tdt-0.6b-v3, openai/whisper-large-v3. Add timestamps=true for word-level timing.

+ +

Music Generation (Async)

+

POST /audio/queue → poll /audio/retrieve

+ +

Same async pattern as video. Queue a job, get a queue_id, poll until audio bytes are returned.

+ +
# Step 1: Get a price quote (optional)
+curl -X POST https://api.venice.ai/api/v1/audio/quote \
+  -H "Authorization: Bearer $VENICE_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{"model":"ace-step-15","duration_seconds":60}'
+# → {"quote": 0.03}
+
+# Step 2: Queue generation
+curl -X POST https://api.venice.ai/api/v1/audio/queue \
+  -H "Authorization: Bearer $VENICE_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{"model":"ace-step-15","prompt":"An upbeat electronic track with synth leads"}'
+# → {"model":"ace-step-15","queue_id":"abc-123","status":"QUEUED"}
+
+# Step 3: Poll until complete (returns audio bytes when done)
+curl -X POST https://api.venice.ai/api/v1/audio/retrieve \
+  -H "Authorization: Bearer $VENICE_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{"model":"ace-step-15","queue_id":"abc-123"}' \
+  -o music.mp3
+# While processing: → {"status":"PROCESSING","average_execution_time":20000,"execution_duration":5200}
+# When complete: → raw audio bytes (Content-Type: audio/mpeg)
+ +

Audio Queue Request Fields

+ + + + + + + + +
FieldTypeDescription
modelstring (required)Music model ID
promptstring (required)Description of the audio to generate
lyrics_promptstringLyrics for lyric-capable models
duration_secondsintegerDuration hint in seconds
force_instrumentalbooleanForce instrumental (no vocals)
voicestringVoice selection for voice-enabled models
+ +

Models: ace-step-15, elevenlabs-music, minimax-music-v2, stable-audio-25. Sound effects: elevenlabs-sound-effects-v2, mmaudio-v2-text-to-audio.

+ + +

Video Generation

+ +

POST /video/queue → poll /video/retrieve

+ +

Asynchronous: queue a job, get a queue_id, poll until complete.

+ +
{
+  "model": "wan-2.5-preview-text-to-video",
+  "prompt": "Commerce in Venice, Italy",
+  "duration": "10s",
+  "resolution": "720p",
+  "aspect_ratio": "16:9"
+}
+ +

Request Fields

+ + + + + + + + + + +
FieldTypeDescription
modelstring (required)Video model ID
promptstring (required)Max 2500 chars
durationstring (required)5s or 10s
resolutionstring480p, 720p, 1080p
aspect_ratiostringe.g. 16:9
image_urlstringReference image for image-to-video models
negative_promptstringWhat to avoid
audiobooleanGenerate audio (if model supports it)
+ +

Complete Video Generation Flow

+ +
# Step 1: Get a price quote (optional but recommended)
+curl -X POST https://api.venice.ai/api/v1/video/quote \
+  -H "Authorization: Bearer $VENICE_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{"model":"wan-2.6-text-to-video","duration":"10s","resolution":"720p"}'
+# → {"quote": 0.35}
+ +
# Step 2: Queue the video generation
+curl -X POST https://api.venice.ai/api/v1/video/queue \
+  -H "Authorization: Bearer $VENICE_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "wan-2.6-text-to-video",
+    "prompt": "A drone shot over Venice canals at golden hour",
+    "duration": "10s",
+    "resolution": "720p",
+    "aspect_ratio": "16:9"
+  }'
+# Response:
+# {"model": "wan-2.6-text-to-video", "queue_id": "550e8400-e29b-41d4-a716-446655440000"}
+ +
# Step 3: Poll /video/retrieve until complete
+# While processing → returns JSON:
+# {"status":"PROCESSING","average_execution_time":145000,"execution_duration":53200}
+#
+# When complete → returns raw MP4 bytes (Content-Type: video/mp4)
+
+curl -X POST https://api.venice.ai/api/v1/video/retrieve \
+  -H "Authorization: Bearer $VENICE_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "wan-2.6-text-to-video",
+    "queue_id": "550e8400-e29b-41d4-a716-446655440000",
+    "delete_media_on_completion": true
+  }' \
+  -o output.mp4
+ +

Python Polling Loop

+
import requests, time, os
+
+API = "https://api.venice.ai/api/v1"
+HEADERS = {"Authorization": f"Bearer {os.getenv('VENICE_API_KEY')}",
+           "Content-Type": "application/json"}
+
+# Queue
+q = requests.post(f"{API}/video/queue", headers=HEADERS, json={
+    "model": "wan-2.6-text-to-video",
+    "prompt": "A drone shot over Venice canals at golden hour",
+    "duration": "10s", "resolution": "720p"
+}).json()
+
+queue_id = q["queue_id"]
+print(f"Queued: {queue_id}")
+
+# Poll
+while True:
+    r = requests.post(f"{API}/video/retrieve", headers=HEADERS, json={
+        "model": "wan-2.6-text-to-video",
+        "queue_id": queue_id,
+        "delete_media_on_completion": True
+    })
+    if r.headers.get("Content-Type", "").startswith("video/"):
+        with open("output.mp4", "wb") as f:
+            f.write(r.content)
+        print("Done! Saved output.mp4")
+        break
+    else:
+        status = r.json()
+        elapsed = status.get("execution_duration", 0) / 1000
+        eta = status.get("average_execution_time", 0) / 1000
+        print(f"Processing... {elapsed:.0f}s / ~{eta:.0f}s est.")
+        time.sleep(10)
+ +

Video Retrieve Request

+ + + + + +
FieldTypeDescription
modelstring (required)Same model used in queue
queue_idstring (required)ID returned by /video/queue
delete_media_on_completionbooleanDelete from storage after download (default: false)
+ +

Video Models (Selection)

+ + + + + + + + +
ModelID (Text-to-Video)Privacy
Veo 3.1veo3.1-full-text-to-videoAnon
Sora 2 Prosora-2-pro-text-to-videoAnon
Kling V3 Prokling-v3-pro-text-to-videoAnon
Wan 2.6wan-2.6-text-to-videoAnon
Longcatlongcat-text-to-videoPrivate
LTX 2.0ltx-2-full-text-to-videoAnon
+ + +

Embeddings

+ +

POST /embeddings

+ +
{
+  "model": "text-embedding-bge-m3",
+  "input": "Privacy-first AI infrastructure",
+  "encoding_format": "float"
+}
+ +

Model: text-embedding-bge-m3. Input: $0.15/1M tokens, Output: $0.60/1M tokens. Privacy: Private. Max input: 8192 tokens. Batch: up to 2048 inputs per request.

+ +

Response

+
{
+  "object": "list",
+  "model": "text-embedding-bge-m3",
+  "data": [
+    {
+      "object": "embedding",
+      "index": 0,
+      "embedding": [0.0023064255, -0.009327292, 0.015797377, ...]  // 1024-dimensional float vector
+    }
+  ],
+  "usage": {
+    "prompt_tokens": 8,
+    "total_tokens": 8
+  }
+}
+ +

Vector dimensions: text-embedding-bge-m3 produces 1024-dimensional vectors by default. You can optionally reduce dimensions via the dimensions parameter.

+

Encoding formats: "float" (array of numbers) or "base64" (compact binary).

+

Batch input: Pass an array of strings to embed multiple inputs in one request.

+ + +

Characters API

+ +

GET /characters

+ +

List and filter AI character personas. Use a character's slug in venice_parameters.character_slug for chat completions.

+ +

Query Parameters

+ + + + + + + + + +
ParamDescription
searchSearch by name, description, or tags
categoriesFilter by category (e.g. roleplay, philosophy)
tagsFilter by tags
modelIdFilter by model
isAdulttrue or false
sortByfeatured, highestRating, mostRecent, imports, etc.
limit / offsetPagination (max 100)
+ +

Using a Character in Chat

+
{
+  "model": "venice-uncensored",
+  "messages": [{"role": "user", "content": "What is the meaning of life?"}],
+  "venice_parameters": {
+    "character_slug": "alan-watts"
+  }
+}
+ + +

Privacy Architecture

+ +
The only way to achieve reasonable user privacy is to avoid collecting information in the first place.
+ + + + +

TEE & E2EE Models

+ +

TEE (Trusted Execution Environment)

+

Models with tee-* prefix run inside hardware-secured enclaves (Intel TDX, NVIDIA CC). Venice cannot access the computation.

+ +
response = client.chat.completions.create(
+    model="tee-qwen3-5-122b-a10b",
+    messages=[{"role": "user", "content": "Explain quantum computing"}]
+)
+ +

Verify Attestation

+

GET /tee/attestation?model=...&nonce=...

+

Returns cryptographic proof the model runs in a genuine TEE. Fields: verified, nonce, tee_provider, intel_quote, nvidia_payload, signing_key, signing_address.

+ +

Verify Response Signature

+

GET /tee/signature?model=...&request_id=...

+

Proves a response came from the attested enclave.

+ +

E2EE (End-to-End Encryption)

+

Models with e2ee-* prefix add client-side encryption on top of TEE. Your prompts are encrypted before leaving your device.

+ +

Crypto stack: ECDH on secp256k1 → HKDF-SHA256 → AES-256-GCM.

+ +

E2EE Implementation Steps

+
    +
  1. Generate ephemeral key pair — secp256k1 (same curve as Ethereum/Bitcoin). Create a fresh pair per session.
  2. +
  3. Fetch TEE attestationGET /tee/attestation?model=e2ee-glm-4-7-p&nonce=<random-hex>. Response includes signing_key (model's public key).
  4. +
  5. Verify attestation — Check verified: true, nonce match, and optionally parse Intel TDX quote.
  6. +
  7. Derive shared secret — ECDH(your_private_key, model_public_key) → shared_secret.
  8. +
  9. Derive encryption key — HKDF-SHA256(shared_secret) → 256-bit AES key.
  10. +
  11. Encrypt messages — AES-256-GCM encrypt each message content. Replace plaintext with ciphertext in the request body.
  12. +
  13. Send request with headers: + +
  14. +
  15. Decrypt response — TEE encrypts response chunks with the shared secret. Decrypt each chunk with your AES key.
  16. +
+ +
+E2EE requires non-trivial client-side crypto. For a complete reference implementation with working code (Node.js ESM and Python), see the full guide: docs.venice.ai/overview/guides/tee-e2ee-models +
+ +

Available E2EE Models

+ + + + + + + + + + + + +
ModelIDContext
GLM 4.7e2ee-glm-4-7-p128K
GLM 4.7 Flashe2ee-glm-4-7-flash-p198K
GLM 5e2ee-glm-5198K
Gemma 3 27Be2ee-gemma-3-27b-p40K
GPT OSS 120Be2ee-gpt-oss-120b-p128K
Qwen3 30Be2ee-qwen3-30b-a3b-p256K
Qwen3 VL 30Be2ee-qwen3-vl-30b-a3b-p128K
Qwen3.5 122Be2ee-qwen3-5-122b-a10b128K
Venice Uncensorede2ee-venice-uncensored-24b-p32K
Qwen 2.5 7Be2ee-qwen-2-5-7b-p32K
+ + +

Prompt Caching

+ +

Reduces latency (up to 80%) and costs (up to 90%) by reusing processed input tokens on prefix-matched requests.

+ +
+Automatic for most models. Just structure prompts with static content first, dynamic content last. +
+ +

Key Rules

+ + +

prompt_cache_key

+

Routing hint for cache affinity. Same key → same server → higher hit rate.

+
{
+  "model": "claude-opus-45",
+  "prompt_cache_key": "session-abc-123",
+  "messages": [...]
+}
+ +

Cache Stats in Response

+
{
+  "usage": {
+    "prompt_tokens": 5500,
+    "prompt_tokens_details": {
+      "cached_tokens": 5000,
+      "cache_creation_input_tokens": 0
+    }
+  }
+}
+ +

Provider Cache Behavior

+ + + + + + + + + +
ProviderMin TokensLifetimeRead Discount
Anthropic (Claude)~4,0005 min90%
OpenAI (GPT)1,0245-10 min90%
Google (Gemini)~1,0241 hour75-90%
xAI (Grok)~1,0245 min75-88%
DeepSeek~1,0245 min50%
MiniMax~1,0245 min90%
Moonshot (Kimi)~1,0245 min50%
+ + +

Models Reference

+ +

Discover models programmatically: GET /models

+ +

Key response fields per model: id, type, model_spec.availableContextTokens, model_spec.capabilities (supportsFunctionCalling, supportsResponseSchema, supportsWebSearch, supportsReasoning, supportsReasoningEffort).

+ +

Recommended Models by Use Case

+ + + + + + + + + + + + + + + + + + +
Use CaseModel IDContext
Flagship / complex taskszai-org-glm-4.7198K
Flagship v2zai-org-glm-5198K
Balanced general usellama-3.3-70b128K
Fast / cost-efficientqwen3-4b40K
Uncensoredvenice-uncensored32K
Visionqwen3-vl-235b-a22b256K
Long context codingqwen3-coder-480b-a35b-instruct256K
Deep reasoningqwen3-235b-a22b-thinking-2507128K
GPT (via Venice)openai-gpt-52256K
Claude (via Venice)claude-opus-4-61000K
Gemini (via Venice)gemini-3-1-pro-preview1000K
Grok (via Venice)grok-4-20-beta2000K
Image generation (budget)venice-sd35
Image generation (quality)recraft-v4-pro
TTStts-kokoro
Embeddingstext-embedding-bge-m3
+ + +

Pricing

+ +

Prices per 1M tokens unless noted. All USD. Full pricing: docs.venice.ai/overview/pricing

+ +

Text Models (Selection)

+ + + + + + + + + + + + + +
ModelIDInputOutputCache ReadContextPrivacy
Claude Opus 4.6claude-opus-4-6$6.00$30.00$0.601000KAnon
GPT-5.2openai-gpt-52$2.19$17.50$0.22256KAnon
Grok 4.20grok-4-20-beta$2.50$7.50$0.252000KAnon
GLM 4.7zai-org-glm-4.7$0.55$2.65$0.11198KPrivate
GLM 5zai-org-glm-5$1.00$3.20$0.20198KPrivate
DeepSeek V3.2deepseek-v3.2$0.33$0.48$0.16160KPrivate
Llama 3.3 70Bllama-3.3-70b$0.70$2.80128KPrivate
Venice Uncensoredvenice-uncensored$0.20$0.9032KPrivate
Qwen 3 Coder 480Bqwen3-coder-480b-a35b-instruct$0.75$3.00256KPrivate
GLM 4.7 Flashzai-org-glm-4.7-flash$0.13$0.50128KPrivate
Qwen 3.5 9Bqwen3-5-9b$0.05$0.15256KPrivate
+ +

Payment Options

+ + + + + + +
MethodDetails
USDCredit card. Credits never expire.
CryptoCryptocurrency. Same rates as USD.
DIEM StakingEach DIEM = $1/day of credits that refresh daily.
Pro SubscriptionOne-time $10 API credit when upgrading to Pro.
+ + +

X402 Wallet Payments (Agent-Friendly)

+ +

X402 is an open standard for internet-native payments using the HTTP 402 Payment Required status code. Venice supports X402 to let agents authenticate with a crypto wallet and pay for inference automatically — no API key required.

+ +

What X402 Enables

+ + +

Prerequisites

+ + +
+Use a limited-purpose wallet — not your main treasury. The private key will be used for signing. +
+ +

Supported Endpoints

+ + + + + + +
EndpointAuth HeaderPurpose
POST /chat/completionsX-Sign-In-With-XPaid inference (only endpoint currently supported)
GET /x402/balance/{address}X-Sign-In-With-XCheck spendable balance
GET /x402/transactions/{address}X-Sign-In-With-XView transaction history
POST /x402/top-upX-402-PaymentAdd USDC balance (different header!)
+ +

Auth Flow: Building X-Sign-In-With-X

+ +

Venice X402 uses a SIWE (Sign In With Ethereum) message, signed with EIP-191, then base64-encoded:

+ +
import { Wallet } from 'ethers'
+import { SiweMessage, generateNonce } from 'siwe'
+
+const signer = new Wallet(process.env.PRIVATE_KEY)
+
+const siwe = new SiweMessage({
+  domain: 'outerface.venice.ai',
+  address: signer.address,
+  statement: 'Sign in to Venice AI',
+  uri: 'https://outerface.venice.ai/api/v1/chat/completions',  // must match the route you're calling
+  version: '1',
+  chainId: 8453,   // Base
+  nonce: generateNonce(),
+  issuedAt: new Date().toISOString(),
+  expirationTime: new Date(Date.now() + 10 * 60 * 1000).toISOString(),  // 10 min
+})
+
+const message = siwe.prepareMessage()
+const signature = await signer.signMessage(message)
+
+const headerValue = Buffer.from(JSON.stringify({
+  address: signer.address.toLowerCase(),
+  message,
+  signature,
+  chainId: 8453,
+  timestamp: Date.now(),
+}), 'utf8').toString('base64')
+ +
+The uri in the SIWE message must match the Venice route you're calling. Generate a fresh header for each different endpoint. Headers expire after 10 minutes. +
+ +

Complete Agent Flow

+ +

Step 1: Check balance

+
curl -X GET "https://api.venice.ai/api/v1/x402/balance/0xYOUR_ADDRESS" \
+  -H "X-Sign-In-With-X: $BALANCE_AUTH"
+
+# Response:
+{
+  "success": true,
+  "data": {
+    "walletAddress": "0xyour_wallet_address",
+    "balanceUsd": 12.5,
+    "canConsume": true,
+    "minimumTopUpUsd": 5,
+    "suggestedTopUpUsd": 10,
+    "diemBalanceUsd": 5.25
+  }
+}
+ +

Step 2: If canConsume is true, call inference

+
curl -X POST https://api.venice.ai/api/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "X-Sign-In-With-X: $CHAT_AUTH" \
+  -d '{
+    "model": "kimi-k2-5",
+    "messages": [{"role": "user", "content": "Hello from an x402-authenticated wallet."}]
+  }'
+ +

Step 3: If balance is low, top up with USDC

+
# First call without payment header to get requirements:
+curl -X POST https://api.venice.ai/api/v1/x402/top-up
+# → 402 response with paymentInfo:
+{
+  "error": "PAYMENT_REQUIRED",
+  "message": "Send x402 payment via X-402-Payment header",
+  "paymentInfo": {
+    "receiverWallet": "0xRECEIVER_WALLET",
+    "network": "eip155:8453",
+    "token": "USDC",
+    "tokenAddress": "0xUSDC_TOKEN_ADDRESS",
+    "minimumAmountUsd": 5,
+    "suggestedAmountUsd": 10
+  }
+}
+
+# Then retry with signed X-402-Payment header:
+curl -X POST https://api.venice.ai/api/v1/x402/top-up \
+  -H "X-402-Payment: $X402_PAYMENT"
+# → {"success":true,"data":{"amountCredited":10,"newBalance":22.5}}
+ +

Step 4: Check transaction history

+
curl -X GET "https://api.venice.ai/api/v1/x402/transactions/0xYOUR_ADDRESS?limit=10&offset=0" \
+  -H "X-Sign-In-With-X: $TX_AUTH"
+

Returns entries like TOP_UP, CHARGE, REFUND with requestId and modelId for spend correlation.

+ +

Venice X402 Client Library

+
npm install @venice-ai/x402-client
+ +
import { VeniceClient } from '@venice-ai/x402-client'
+
+const venice = new VeniceClient(process.env.WALLET_KEY)
+
+const response = await venice.chat({
+  model: 'kimi-k2-5',
+  messages: [{ role: 'user', content: 'Hello!' }]
+})
+ +

X402 Error Reference

+ + + + + + + +
ErrorHTTPCause / Fix
Authentication failed401Regenerate SIWE header. Check: valid base64, signature matches wallet, chain is 8453, not expired.
Unauthorized403Wallet in header doesn't match {address} path parameter.
Insufficient balance402Wallet auth worked but balance too low. Check /x402/balance, top up if needed.
PAYMENT_REQUIRED402Expected on first top-up call. Use returned paymentInfo to build X-402-Payment.
X402_INVALID_PAYMENT400Malformed payment header. Rebuild from scratch.
+ +

Prompt for Autonomous Agents

+

If you're sending an agent to use Venice via X402, give it this context:

+
Use the Venice API with your wallet.
+
+Auth: Build a SIWE message (domain: outerface.venice.ai, chain: 8453, URI: the Venice route
+you're calling), sign with EIP-191, base64-encode as JSON, send as X-Sign-In-With-X header.
+Headers expire after 10 minutes — regenerate per request.
+
+Inference: POST https://api.venice.ai/api/v1/chat/completions
+  Header: X-Sign-In-With-X: <base64>
+  Body: {"model":"kimi-k2-5","messages":[{"role":"user","content":"Hello"}]}
+
+Balance: GET https://api.venice.ai/api/v1/x402/balance/<address>
+Transactions: GET https://api.venice.ai/api/v1/x402/transactions/<address>
+DIEM staked on the wallet is used automatically. No DIEM = top up with USDC on Base (min $5).
+Guide: https://docs.venice.ai/overview/guides/x402-venice-api
+ +

Full documentation: docs.venice.ai/overview/guides/x402-venice-api | Client library: veniceai/x402-client

+ + +

Rate Limits

+ +

Check your limits: GET /api_keys/rate_limits

+ +

Default Limits by Model Tier

+ + + + + + +
TierRequests/minTokens/minExample Models
XS5001,000,000qwen3-4b, llama-3.2-3b
S75750,000mistral-31-24b, venice-uncensored
M50750,000llama-3.3-70b, qwen3-next-80b
L20500,000zai-org-glm-4.7, grok-41-fast, qwen3-coder-480b
+ +

Non-Text Limits

+ + + + + + + +
TypeRequests/min
Image20
Audio60
Embedding500
Video (queue)40
Video (retrieve)120
+ +

Rate Limit Response Headers

+ + + + + + + + +
HeaderDescription
x-ratelimit-limit-requestsMax requests in current window
x-ratelimit-remaining-requestsRequests remaining
x-ratelimit-reset-requestsUnix timestamp when window resets
x-ratelimit-limit-tokensMax tokens per minute
x-ratelimit-remaining-tokensTokens remaining
x-ratelimit-reset-tokensSeconds until token limit resets
+ +

Abuse Protection: 20+ failed requests in 30 seconds → blocked for 30 seconds.

+ +

Partner Tier: Significantly higher limits available. Contact api@venice.ai.

+ + +

Error Codes

+ + + + + + + + + + + + + + + + + + + + +
CodeHTTPMeaning
AUTHENTICATION_FAILED401Invalid or missing API key
AUTHENTICATION_FAILED_INACTIVE_KEY401Pro subscription inactive
INVALID_API_KEY401API key format invalid
INSUFFICIENT_BALANCE402No USD or DIEM balance remaining
UNAUTHORIZED403No access to this resource
INVALID_REQUEST400Bad parameters
INVALID_MODEL400Model doesn't exist
CHARACTER_NOT_FOUND404Character slug not found
MODEL_NOT_FOUND404Model not found
INVALID_CONTENT_TYPE415Wrong Content-Type header
INVALID_FILE_SIZE413File too large
INVALID_IMAGE_FORMAT400Unsupported image format
CORRUPTED_IMAGE400Image file unreadable
RATE_LIMIT_EXCEEDED429Too many requests
INFERENCE_FAILED500Model inference error
UPSCALE_FAILED500Upscaling error
UNKNOWN_ERROR500Unexpected server error
+ +

Error Response Format

+
// Standard error (most endpoints)
+{"error": "Rate limit exceeded"}
+
+// Detailed error (validation failures — 400)
+{
+  "error": "Invalid request parameters",
+  "details": {
+    "model": {"_errors": ["Invalid model specified"]},
+    "prompt": {"_errors": ["Field is required"]}
+  }
+}
+
+// Content violation (422 — video/audio)
+{
+  "error": "Your prompt violates the content policy",
+  "suggested_prompt": "A cinematic instrumental track inspired by stormy weather"
+}
+ +

Retry strategy: use exponential backoff for 429, 500, 503 errors. Check x-ratelimit-reset-requests header for 429.

+ + +

Response Headers

+ + + + + + + + + + + + + + + +
HeaderPurpose
CF-RAYUnique request ID (log for support)
x-venice-versionAPI version/revision
x-venice-model-idModel used for inference
x-venice-model-nameFriendly model name
x-venice-model-deprecation-warningDeprecation notice
x-venice-model-deprecation-dateWhen model will be removed
x-venice-balance-usdUSD balance before request
x-venice-balance-diemDIEM balance before request
x-venice-is-blurredImage was blurred (Safe Venice)
x-venice-is-content-violationContent policy violation
x-ratelimit-*Rate limiting info (see above)
x-pagination-*Pagination metadata
+ + +

Agent Frameworks & Integrations

+ +

Migration from OpenAI — The 2-Line Change

+

Venice is a drop-in replacement for OpenAI. Same SDK, same code — just change two values:

+
# Python
+client = OpenAI(
+    api_key="your-venice-api-key",           # ← Change 1
+    base_url="https://api.venice.ai/api/v1"  # ← Change 2
+)
+
+# Node.js
+const client = new OpenAI({
+  apiKey: 'your-venice-api-key',
+  baseURL: 'https://api.venice.ai/api/v1',
+});
+
+# Environment variables (many libraries auto-read these)
+OPENAI_API_KEY=your-venice-api-key
+OPENAI_BASE_URL=https://api.venice.ai/api/v1
+ +

OpenAI → Venice Model Mapping

+ + + + + + + + + + + +
OpenAI ModelVenice EquivalentTypePricing (In/Out per 1M)
gpt-4ozai-org-glm-4.7 PrivateText$0.55 / $2.65
gpt-4oopenai-gpt-52 AnonText$2.19 / $17.50
gpt-4o-miniqwen3-4bText$0.05 / $0.15
o1 / o3qwen3-235b-a22b-thinking-2507Reasoning$0.45 / $3.50
gpt-4-visionqwen3-vl-235b-a22bVision$0.25 / $1.50
text-embedding-3-smalltext-embedding-bge-m3Embeddings$0.15 / $0.60
dall-e-3qwen-image PrivateImage$0.01/img
whispernvidia/parakeet-tdt-0.6b-v3STT$0.0001/sec
tts-1tts-kokoroTTS$3.50/1M chars
+ +

Framework Migration Quick Reference

+ + + + + + + + + + + +
FrameworkChange Required
LangChainbase_url in ChatOpenAI
Vercel AI SDKbaseURL in createOpenAI
CrewAIOPENAI_API_BASE env var
LlamaIndexapi_base in OpenAI
AutoGenbase_url in config
Haystackapi_base_url in OpenAIGenerator
Claude CodeUse claude-code-router
CursorCustom API endpoint in settings
Continue.devapiBase in config.json
+ +

Full migration guide: docs.venice.ai/overview/guides/openai-migration

+ +
+ +

LangChain

+ +
pip install langchain langchain-openai openai
+ +

Chat Model

+
from langchain_openai import ChatOpenAI
+
+llm = ChatOpenAI(
+    model="venice-uncensored",
+    api_key="your-venice-api-key",
+    base_url="https://api.venice.ai/api/v1",
+    temperature=0.7,
+)
+
+response = llm.invoke("Explain privacy-preserving AI.")
+print(response.content)
+ +

Embeddings

+
from langchain_openai import OpenAIEmbeddings
+
+embeddings = OpenAIEmbeddings(
+    model="text-embedding-bge-m3",
+    api_key="your-venice-api-key",
+    base_url="https://api.venice.ai/api/v1",
+    check_embedding_ctx_length=False,  # Required for Venice
+)
+ +

RAG Pipeline

+
from langchain_community.vectorstores import FAISS
+from langchain_core.prompts import ChatPromptTemplate
+from langchain_core.runnables import RunnablePassthrough
+from langchain_core.output_parsers import StrOutputParser
+
+vectorstore = FAISS.from_texts(documents, embeddings)
+retriever = vectorstore.as_retriever(search_kwargs={"k": 2})
+
+rag_prompt = ChatPromptTemplate.from_messages([
+    ("system", "Answer based only on this context:\n\n{context}"),
+    ("user", "{question}"),
+])
+
+rag_chain = (
+    {"context": retriever | format_docs, "question": RunnablePassthrough()}
+    | rag_prompt | llm | StrOutputParser()
+)
+
+answer = rag_chain.invoke("What privacy levels does Venice offer?")
+ +

Function Calling Agent

+
from langchain_core.tools import tool
+from langchain.agents import create_tool_calling_agent, AgentExecutor
+
+llm = ChatOpenAI(model="zai-org-glm-4.7", api_key="...", base_url="https://api.venice.ai/api/v1")
+
+@tool
+def get_price(model_id: str) -> str:
+    """Get pricing for a Venice AI model."""
+    prices = {"venice-uncensored": "$0.20/$0.90", "zai-org-glm-4.7": "$0.55/$2.65"}
+    return prices.get(model_id, "Not found")
+
+agent = create_tool_calling_agent(llm, [get_price], prompt)
+executor = AgentExecutor(agent=agent, tools=[get_price])
+result = executor.invoke({"input": "What's the cheapest model?"})
+ +

Web Search

+
llm_with_search = ChatOpenAI(
+    model="venice-uncensored",
+    api_key="...",
+    base_url="https://api.venice.ai/api/v1",
+    extra_body={"venice_parameters": {"enable_web_search": "auto"}}
+)
+ +

Full guide: docs.venice.ai/overview/guides/langchain

+ +
+ +

Vercel AI SDK

+ +
npm install ai @ai-sdk/openai
+ +

Provider Setup

+
// lib/venice.ts
+import { createOpenAI } from '@ai-sdk/openai';
+
+const openai = createOpenAI({
+  apiKey: process.env.VENICE_API_KEY!,
+  baseURL: 'https://api.venice.ai/api/v1',
+});
+
+// Use .chat() to ensure compatibility with Venice's chat completions endpoint
+export const venice = (modelId: string) => openai.chat(modelId);
+ +
+Use .chat() — the default openai('model') syntax may use newer OpenAI endpoints Venice doesn't support yet. +
+ +

Streaming Chat (Next.js App Router)

+
// app/api/chat/route.ts
+import { streamText } from 'ai';
+import { venice } from '@/lib/venice';
+
+export async function POST(req: Request) {
+  const { messages } = await req.json();
+  const result = streamText({
+    model: venice('venice-uncensored'),
+    system: 'You are a helpful, privacy-respecting AI assistant.',
+    messages,
+  });
+  return result.toDataStreamResponse();
+}
+ +

Tool Calling

+
import { streamText, tool } from 'ai';
+import { z } from 'zod';
+
+const result = streamText({
+  model: venice('zai-org-glm-4.7'),
+  messages: [{ role: 'user', content: 'Weather in Tokyo?' }],
+  tools: {
+    getWeather: tool({
+      description: 'Get current weather',
+      parameters: z.object({ location: z.string() }),
+      execute: async ({ location }) => ({ temperature: 22, condition: 'Sunny', location }),
+    }),
+  },
+});
+ +

Structured Output

+
import { generateObject } from 'ai';
+import { z } from 'zod';
+
+const { object } = await generateObject({
+  model: venice('venice-uncensored'),
+  schema: z.object({
+    recipe: z.object({
+      name: z.string(),
+      ingredients: z.array(z.string()),
+      steps: z.array(z.string()),
+    }),
+  }),
+  prompt: 'Generate a recipe for chocolate chip cookies.',
+});
+ +

Embeddings

+
import { embed } from 'ai';
+import { createOpenAI } from '@ai-sdk/openai';
+
+const openai = createOpenAI({
+  apiKey: process.env.VENICE_API_KEY!,
+  baseURL: 'https://api.venice.ai/api/v1',
+});
+
+const { embedding } = await embed({
+  model: openai.textEmbeddingModel('text-embedding-bge-m3'),
+  value: 'Privacy-first AI infrastructure',
+});
+ +

Full guide: docs.venice.ai/overview/guides/vercel-ai-sdk

+ +
+ +

OpenClaw

+ +

OpenClaw is an open-source AI gateway connecting messaging platforms (WhatsApp, Telegram, Discord, Slack, iMessage) to AI models. Venice is a built-in provider.

+ +
# Install
+curl -fsSL https://openclaw.ai/install.sh | bash
+# Or: npm install -g openclaw@latest
+
+# Onboard (select Venice as provider, paste API key)
+openclaw onboard
+
+# Set model
+openclaw models set venice/zai-org-glm-5
+
+# Start
+openclaw tui          # Terminal UI
+openclaw dashboard    # Web dashboard
+openclaw gateway      # Messaging channels
+ +

Recommended OpenClaw Models

+ + + + + + + +
Use CaseModelPrivacy
Generalvenice/zai-org-glm-5Private
Reasoningvenice/kimi-k2-5Private
Codingvenice/claude-opus-4-6Anon
Visionvenice/qwen3-vl-235b-a22bPrivate
Uncensoredvenice/venice-uncensoredPrivate
+ +
# Install image/video generation skill
+openclaw skills install nhannah/venice-ai-media
+ +

Full guide: docs.venice.ai/overview/guides/openclaw-bot | OpenClaw Venice provider docs

+ +
+ +

Other Integrations

+ + + + + + + + + + + + +
IntegrationTypeSetup
Eliza (ai16z)Agent frameworkSet modelProvider: "venice" in character config. Configure VENICE_API_KEY + model env vars.
Coinbase AgentKitAgent frameworkNative Venice support.
Cursor IDECodingCustom API endpoint in settings.
Cline (VS Code)CodingSet Venice base URL + API key.
ROO Code (VS Code)CodingSet Venice base URL + API key.
VOID IDECodingSet Venice base URL + API key.
Brave LeoBrowserVenice as Leo AI backend.
AiderCodingAI pair programming in terminal.
Open WebUIAssistantSelf-hosted chat UI with Venice.
LibreChatAssistantMulti-provider chat with Venice.
+ +

Eliza Setup

+
git clone https://github.com/ai16z/eliza.git
+cp .env.example .env
+# Edit .env: set VENICE_API_KEY, SMALL_VENICE_MODEL, MEDIUM_VENICE_MODEL, LARGE_VENICE_MODEL
+# Create character in /characters/your_char.character.json with modelProvider: "venice"
+pnpm i && pnpm build && pnpm start
+pnpm start --characters="characters/your_char.character.json"
+ + +

Claude Code Router

+ +

Use Claude Code CLI with Venice for pay-per-token access to Claude Opus/Sonnet models.

+ +

The claude-code-router is an open-source local proxy that intercepts Claude Code requests and redirects them to Venice.

+ +

Setup

+
# Install Claude Code + Router
+npm install -g @anthropic-ai/claude-code
+npm install -g @musistudio/claude-code-router
+
+# Create config
+mkdir -p ~/.claude-code-router
+ +

Create ~/.claude-code-router/config.json:

+
{
+  "APIKEY": "",
+  "LOG": true,
+  "LOG_LEVEL": "info",
+  "API_TIMEOUT_MS": 600000,
+  "HOST": "127.0.0.1",
+  "Providers": [{
+    "name": "venice",
+    "api_base_url": "https://api.venice.ai/api/v1/chat/completions",
+    "api_key": "your-venice-api-key-here",
+    "models": ["claude-opus-45", "claude-sonnet-45", "claude-opus-4-6", "claude-sonnet-4-6"],
+    "transformer": {"use": ["anthropic"]}
+  }],
+  "Router": {
+    "default": "venice,claude-opus-45",
+    "think": "venice,claude-opus-45",
+    "background": "venice,claude-opus-45",
+    "longContext": "venice,claude-opus-45",
+    "longContextThreshold": 100000
+  }
+}
+ +
# Launch
+ccr start
+ccr code
+# Or: eval "$(ccr activate)" && claude
+ +

Supported Models via Router

+ + + + + + +
ModelVenice IDBest For
Claude Opus 4.5claude-opus-45Complex reasoning, large refactors
Claude Sonnet 4.5claude-sonnet-45Fast iteration, everyday coding
Claude Opus 4.6claude-opus-4-6Complex reasoning, large refactors
Claude Sonnet 4.6claude-sonnet-4-6Fast iteration, everyday coding
+ +

Router Features

+ + + +

Venice MCP Server

+ +

Official MCP (Model Context Protocol) server for Claude Code, Cline, and AI agents: veniceai/venice-mcp-server

+ +

Allows AI agents to interact with Venice API endpoints as MCP tools.

+ + +

Best Practices

+ +

Production Checklist

+
    +
  1. Rate Limiting: Monitor x-ratelimit-remaining-* headers. Implement exponential backoff.
  2. +
  3. Balance Monitoring: Track x-venice-balance-usd / x-venice-balance-diem to avoid service interruptions.
  4. +
  5. Request Logging: Log CF-RAY header values for troubleshooting with Venice support.
  6. +
  7. Model Deprecation: Check x-venice-model-deprecation-warning headers proactively.
  8. +
  9. System Prompts: Test with and without Venice's system prompts (include_venice_system_prompt: false).
  10. +
  11. API Keys: Keep keys secure, rotate regularly, never expose in client-side code.
  12. +
  13. Prompt Caching: Put static content first. Use prompt_cache_key for conversations.
  14. +
  15. Structured Output: Always set strict: true and additionalProperties: false.
  16. +
+ +

Model Selection Guide

+ + +

Endpoint Quick Reference

+ + + + + + + + + + + + + + + + + + + + + + + +
EndpointMethodPurpose
/chat/completionsPOSTText generation, vision, tools, streaming
/image/generatePOSTText-to-image generation
/image/editPOSTAI image editing / inpainting
/image/multi-editPOSTMulti-image layered editing
/image/upscalePOSTImage upscaling (2x or 4x)
/image/background-removePOSTRemove image backgrounds
/audio/speechPOSTText-to-speech (50+ voices)
/audio/transcriptionsPOSTSpeech-to-text
/audio/queuePOSTMusic generation (async)
/audio/retrieveGETRetrieve generated audio
/audio/quotePOSTGet audio generation price quote
/video/queuePOSTVideo generation (async)
/video/retrieveGETRetrieve generated video
/video/quotePOSTGet video generation price quote
/embeddingsPOSTGenerate vector embeddings
/modelsGETList all available models
/charactersGETList AI character personas
/api_keysGET/POSTManage API keys
/api_keys/rate_limitsGETCheck rate limit status
/tee/attestationGETVerify TEE attestation
/tee/signatureGETVerify TEE response signature
+ +
+ +

Resources

+ + +

+ Generated from docs.venice.ai on April 3, 2026.
+ Venice API version: 20260403 · This page is designed to be consumed by AI agents. +

+ + + diff --git a/docs.json b/docs.json index f36d3ae..c499309 100644 --- a/docs.json +++ b/docs.json @@ -8,6 +8,10 @@ "dark": "#DD3300" }, "favicon": "/favicon.png", + "banner": { + "content": "**AI Agent?** Get the complete, single-page API reference at [docs.venice.ai/agent-reference.html](https://docs.venice.ai/agent-reference.html) — token-optimized, every endpoint, zero crawling needed.", + "dismissible": true + }, "styling": { "codeblocks": "system" }, @@ -20,6 +24,11 @@ "group": "Overview", "pages": [ "overview/about-venice", + { + "title": "Agent API Reference", + "href": "https://docs.venice.ai/agent-reference.html", + "icon": "robot" + }, "overview/getting-started", "overview/privacy", "overview/pricing", @@ -238,7 +247,9 @@ "twitter:image": "https://venice.ai/images/venice_social_preview_x.png", "twitter:site": "@AskVenice", "og:image:width": "1200", - "og:image:height": "630" + "og:image:height": "630", + "ai-agent-reference": "https://docs.venice.ai/agent-reference.html", + "ai-agent-reference-description": "Complete, token-optimized single-page Venice API reference for AI agents" }, "indexing": "navigable" }, diff --git a/llms.txt b/llms.txt index d1b3580..cfbf9a0 100644 --- a/llms.txt +++ b/llms.txt @@ -1,5 +1,7 @@ # Venice API +> **AGENT NOTICE:** A complete, token-optimized single-page API reference is available at https://docs.venice.ai/agent-reference.html — it covers every endpoint, every model type, every gotcha, and every code example in one page. Use it instead of crawling these docs. + > Venice is a privacy-first, uncensored AI API platform offering text generation, image generation, audio synthesis, video generation, music, and embeddings with zero data retention and OpenAI SDK compatibility. Venice provides permissionless access to AI models with no content filtering, making it ideal for developers building applications that require uncensored outputs, privacy guarantees, and full control over AI interactions. The API is fully compatible with OpenAI's SDK—just change the base URL to `https://api.venice.ai/api/v1`. diff --git a/overview/about-venice.mdx b/overview/about-venice.mdx index 12f7dfe..f2bac92 100644 --- a/overview/about-venice.mdx +++ b/overview/about-venice.mdx @@ -3,6 +3,10 @@ title: Venice API "og:title": "Venice API Docs" --- + +**AI Agent Reference:** If you are an AI agent or building with an agent framework (OpenClaw, Eliza, LangChain, CrewAI, etc.), see the [Complete Agent API Reference](https://docs.venice.ai/agent-reference.html) — a single-page, token-optimized resource covering every Venice API endpoint, model, and integration pattern. + + Build AI with no data retention, permissionless access, and compute you permanently own. diff --git a/overview/guides/ai-agents.mdx b/overview/guides/ai-agents.mdx index 0fb07b1..108bf8a 100644 --- a/overview/guides/ai-agents.mdx +++ b/overview/guides/ai-agents.mdx @@ -5,6 +5,15 @@ description: "Venice is supported with the following AI Agent communities." "og:description": "Venice is supported with the following AI Agent communities" --- +## Agent API Reference + +For a complete, single-page API reference optimized for AI agents, see: +**[docs.venice.ai/agent-reference.html](https://docs.venice.ai/agent-reference.html)** + +This page covers every endpoint, model type, code example, and common pitfall in a single token-efficient document. + +## Agent Communities + * [Coinbase Agentkit](https://www.coinbase.com/developer-platform/discover/launches/introducing-agentkit) * [Eliza](https://github.com/ai16z/eliza) - Venice support introduced via this [PR](https://github.com/ai16z/eliza/pull/1008).