diff --git a/agent-reference.html b/agent-reference.html new file mode 100644 index 0000000..0bd94d2 --- /dev/null +++ b/agent-reference.html @@ -0,0 +1,2109 @@ + + + +
+ + + + + + + +Everything an AI agent needs to know to build with Venice.ai. Privacy-first, uncensored, OpenAI-compatible.
+Base URL: https://api.venice.ai/api/v1
+Auth: Authorization: Bearer VENICE_API_KEY
+Swagger: swagger.yaml ·
+Docs: docs.venice.ai ·
+LLMs.txt: llms.txt
Venice has excellent capabilities but several default behaviors that trip up agents. Read these before making your first call.
+ +GET /models returns ~68 text models. Image, video, audio, and embedding models are hidden unless you pass a ?type= filter. This is the #1 reason agents fail to discover non-text capabilities.
# ❌ Returns text models only
+GET /api/v1/models
+
+# ✅ Filter by type to discover other models
+GET /api/v1/models?type=image # 26+ image models
+GET /api/v1/models?type=video # 65+ video models
+GET /api/v1/models?type=tts # 3 TTS models
+GET /api/v1/models?type=embedding # 1 embedding model
+GET /api/v1/models?type=music # music generation models
+The TTS model type is tts, NOT audio. Using ?type=audio returns an empty list.
# ❌ Returns 0 models
+GET /api/v1/models?type=audio
+
+# ✅ Correct
+GET /api/v1/models?type=tts
+Speech-to-text models (nvidia/parakeet-tdt-0.6b-v3, openai/whisper-large-v3) do NOT appear in any /models query. You must know the model names in advance. They are hardcoded here for your convenience.
Venice's image endpoints use different paths and response formats than OpenAI:
+| OpenAI | Venice | |
|---|---|---|
| Endpoint | /images/generations | /image/generate |
| Response | data[0].b64_json | images[0] (raw base64) |
| SDK | client.images.generate() | Use requests.post() or fetch() directly |
Agents using the OpenAI SDK's client.images.generate() will fail. Use raw HTTP requests for images.
Unlike text/image/TTS (which return results immediately), video and music use an async pattern:
+POST /video/queue → get queue_idPOST /video/retrieve with queue_id → poll until Content-Type: video/mp4There is no synchronous video/music endpoint. See the Video and Audio sections for complete polling examples.
+| Type | Returns | Count |
|---|---|---|
text (default) | Chat/completion LLMs | ~68 |
image | Image generation models | ~26 |
video | Text-to-video and image-to-video | ~65 |
tts | Text-to-speech models | 3 |
embedding | Vector embedding models | 1 |
music | Music generation models | ~6 |
| (none / omitted) | Text models only | ~68 |
nvidia/parakeet-tdt-0.6b-v3 or openai/whisper-large-v3 directly — they work but won't appear in any /models query.
+GET /models
+ +Returns models filtered by type. Defaults to text-only.
+ +# Discover image models
+curl "https://api.venice.ai/api/v1/models?type=image" \
+ -H "Authorization: Bearer $VENICE_API_KEY"
+
+# Discover TTS models (NOT ?type=audio!)
+curl "https://api.venice.ai/api/v1/models?type=tts" \
+ -H "Authorization: Bearer $VENICE_API_KEY"
+
+# Discover video models
+curl "https://api.venice.ai/api/v1/models?type=video" \
+ -H "Authorization: Bearer $VENICE_API_KEY"
+
+{
+ "id": "zai-org-glm-4.7",
+ "type": "text",
+ "object": "model",
+ "owned_by": "venice.ai",
+ "model_spec": {
+ "availableContextTokens": 198000,
+ "capabilities": {
+ "supportsFunctionCalling": true,
+ "supportsResponseSchema": true,
+ "supportsWebSearch": true,
+ "supportsReasoning": true,
+ "supportsReasoningEffort": true
+ }
+ }
+}
+
+| Type | Recommended Model | Why |
|---|---|---|
| Text (general) | zai-org-glm-4.7 | Best balance of cost, speed, and capability. Private. |
| Text (uncensored) | venice-uncensored | No content filtering. Private. |
| Text (cheap/fast) | zai-org-glm-4.7-flash | $0.13/M input. Great for classification. |
| Text (vision) | qwen3-vl-235b-a22b | Image understanding + text. |
| Image | venice-sd35 | $0.01/image, Private, works with all features. |
| Image (quality) | recraft-v4-pro | $0.29/image, highest quality. |
| TTS | tts-kokoro | 50+ voices, cheapest ($3.50/1M chars). |
| STT | nvidia/parakeet-tdt-0.6b-v3 | Fast, accurate. NOT in /models API. |
| Embedding | text-embedding-bge-m3 | Only option. 1024 dimensions. |
| Video | wan-2.6-text-to-video | Good quality, reasonable price. |
| Music | ace-step-15 | $0.03-0.08 per song. Cheapest. |
Venice is a privacy-first, uncensored AI API platform offering text generation, image creation, audio synthesis, video generation, music, and embeddings — all with zero data retention and full OpenAI SDK compatibility.
+ +Venice provides permissionless access to AI models with no content filtering, making it ideal for developers building applications that require uncensored outputs, privacy guarantees, and full control over AI interactions.
+ +| Tier | How It Works |
|---|---|
| Anonymized | Third-party models (Claude, GPT, Gemini, Grok) with all identifying metadata stripped before forwarding. |
| Private | Zero data retention. Self-hosted open-source models. No logs, no storage. |
| TEE | Models running inside hardware-secured enclaves (Intel TDX / NVIDIA CC). Venice cannot access the computation. |
| E2EE | End-to-end encrypted. Prompts encrypted client-side before sending. Only the TEE can decrypt them. |
venice-uncensored)base_url and you're donevenice_parametersGenerate at venice.ai/settings/api.
+ +export VENICE_API_KEY='your-api-key-here'
+
+# Python
+pip install openai
+
+# Node.js
+npm install openai
+
+from openai import OpenAI
+
+client = OpenAI(
+ api_key="your-api-key",
+ base_url="https://api.venice.ai/api/v1"
+)
+
+response = client.chat.completions.create(
+ model="venice-uncensored",
+ messages=[{"role": "user", "content": "Hello World!"}]
+)
+
+print(response.choices[0].message.content)
+
+import OpenAI from "openai";
+
+const client = new OpenAI({
+ apiKey: process.env.VENICE_API_KEY,
+ baseURL: "https://api.venice.ai/api/v1",
+});
+
+const completion = await client.chat.completions.create({
+ model: "venice-uncensored",
+ messages: [{ role: "user", content: "Hello World!" }],
+});
+
+console.log(completion.choices[0].message.content);
+
+curl https://api.venice.ai/api/v1/chat/completions \
+ -H "Authorization: Bearer $VENICE_API_KEY" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "venice-uncensored",
+ "messages": [{"role": "user", "content": "Hello World!"}]
+ }'
+
+
+All requests require HTTP Bearer authentication:
+Authorization: Bearer VENICE_API_KEY
+
+API keys are managed at venice.ai/settings/api. Keep your key secret — never expose it in client-side code.
+ +You can also manage keys programmatically via the /api/v1/api_keys endpoints.
Venice implements the OpenAI API specification. Any OpenAI client library works — just change the base URL:
+ +| Language | Config Change |
|---|---|
| Python | base_url="https://api.venice.ai/api/v1" |
| JavaScript | baseURL: "https://api.venice.ai/api/v1" |
| Go | client.BaseURL = "https://api.venice.ai/api/v1" |
| cURL | Replace https://api.openai.com/v1 with https://api.venice.ai/api/v1 |
| PHP / C# / Java / Swift | Set base URL to https://api.venice.ai/api/v1 |
venice_parameters — additional config for web search, characters, reasoning control, etc.include_venice_system_prompt: false).zai-org-glm-4.7), not OpenAI's.?type=image, ?type=tts, etc. to discover non-text models. See Agent Pitfalls./image/generate vs /images/generations) and response format (images[] vs data[].b64_json). Do not use client.images.generate().| Endpoint | Compatible? | Notes |
|---|---|---|
/chat/completions | ✅ Yes | Full drop-in. Tools, streaming, structured output all work. |
/audio/speech | ✅ Yes | Same request/response format as OpenAI TTS. |
/audio/transcriptions | ✅ Yes | Same multipart format. Use Venice model names. |
/embeddings | ✅ Yes | Same request/response format. |
/models | ⚠️ Partial | Same format but defaults to text-only. Must filter by type. |
/image/generate | ❌ No | Different path AND response format. Use raw HTTP. |
/video/queue | ❌ No | Venice-specific async pattern. |
POST /chat/completions
+ +The primary text generation endpoint. Supports text, vision, tool calling, streaming, and multimodal inputs (images, audio, video).
+ +| Field | Type | Description |
|---|---|---|
model | string (required) | Model ID, e.g. zai-org-glm-4.7 |
messages | array (required) | Array of message objects with role and content |
temperature | number | Sampling temperature (0-2). Default: model-specific |
max_tokens | integer | Max tokens in the response |
top_p | number | Nucleus sampling (0-1) |
frequency_penalty | number | Penalize repeated tokens (-2 to 2) |
presence_penalty | number | Penalize new topic tokens (-2 to 2) |
stream | boolean | Enable SSE streaming |
tools | array | Function definitions for tool calling |
response_format | object | Structured output schema (JSON mode) |
reasoning_effort | string | Control reasoning depth: none, low, medium, high, xhigh, max |
venice_parameters | object | Venice-specific extensions (see below) |
| Role | Purpose |
|---|---|
system | Instructions for model behavior |
user | Prompts or questions |
assistant | Previous model responses (multi-turn) |
tool | Function calling results |
The venice_parameters object extends the OpenAI spec with Venice-specific features. Pass it as a top-level field in your request body.
| Parameter | Type | Default | Description |
|---|---|---|---|
enable_web_search | "off" | "on" | "auto" | "off" | Enable real-time web search. Additional pricing applies ($10/1K requests). |
enable_web_scraping | boolean | false | Scrape up to 5 URLs detected in user message. $10/1K URLs. |
enable_x_search | boolean | false | Enable xAI native search (web + X/Twitter) for Grok models. |
enable_web_citations | boolean | false | Include [REF]0[/REF] citations in web search results. |
include_search_results_in_stream | boolean | false | Experimental: emit search results as first stream chunk. |
return_search_results_as_documents | boolean | false | Return results as venice_web_search_documents tool call (LangChain compatible). |
include_venice_system_prompt | boolean | true | Include Venice's default system prompts alongside yours. |
strip_thinking_response | boolean | false | Strip <think> blocks from response (legacy tag format). |
disable_thinking | boolean | false | Disable reasoning and strip thinking blocks entirely. |
character_slug | string | — | Use a specific AI character persona. |
Venice parameters can also be appended directly to the model name as URL-style suffixes. Useful for SDKs that don't support extra body parameters:
+"model": "zai-org-glm-4.7:enable_web_search=auto&enable_web_citations=true"
+
+| SDK | Syntax |
|---|---|
| cURL / raw JSON | "venice_parameters": { ... } at top level |
| Python OpenAI | extra_body={"venice_parameters": { ... }} |
| JavaScript OpenAI | venice_parameters: { ... } at top level (TypeScript: // @ts-ignore) |
| Go / PHP / C# / Java | Use model suffix syntax |
Set stream: true for Server-Sent Events (SSE) streaming:
# Python
+stream = client.chat.completions.create(
+ model="venice-uncensored",
+ messages=[{"role": "user", "content": "Write a story"}],
+ stream=True
+)
+for chunk in stream:
+ if chunk.choices and chunk.choices[0].delta.content is not None:
+ print(chunk.choices[0].delta.content, end="")
+
+// JavaScript
+const stream = await client.chat.completions.create({
+ model: "venice-uncensored",
+ messages: [{ role: "user", content: "Write a story" }],
+ stream: true
+});
+for await (const chunk of stream) {
+ if (chunk.choices?.[0]?.delta?.content) {
+ process.stdout.write(chunk.choices[0].delta.content);
+ }
+}
+
+
+Force the model to output JSON matching a specific schema using response_format:
{
+ "model": "venice-uncensored",
+ "messages": [
+ {"role": "system", "content": "You are a helpful math tutor."},
+ {"role": "user", "content": "solve 8x + 31 = 2"}
+ ],
+ "response_format": {
+ "type": "json_schema",
+ "json_schema": {
+ "name": "math_response",
+ "strict": true,
+ "schema": {
+ "type": "object",
+ "properties": {
+ "steps": {
+ "type": "array",
+ "items": {
+ "type": "object",
+ "properties": {
+ "explanation": {"type": "string"},
+ "output": {"type": "string"}
+ },
+ "required": ["explanation", "output"],
+ "additionalProperties": false
+ }
+ },
+ "final_answer": {"type": "string"}
+ },
+ "required": ["steps", "final_answer"],
+ "additionalProperties": false
+ }
+ }
+ }
+}
+
+strict must be trueadditionalProperties must be false at every levelrequired tag. Use "type": ["string", "null"] for optional fields.supportsResponseSchema: true in /v1/modelsSome models produce visible chain-of-thought reasoning. Thinking appears in a separate reasoning_content field, keeping content clean.
response = client.chat.completions.create(
+ model="zai-org-glm-4.7",
+ messages=[{"role": "user", "content": "What is 15% of 240?"}]
+)
+thinking = response.choices[0].message.reasoning_content
+answer = response.choices[0].message.content
+
+| Value | Description |
|---|---|
none | Disables reasoning |
minimal | Basic reasoning |
low | Light reasoning for simple problems |
medium | Balanced (recommended default) |
high | Deep reasoning for complex problems |
xhigh | Extra-high depth |
max | Maximum capability |
# Pass via reasoning object
+extra_body={"reasoning": {"effort": "high"}}
+
+# Or flat format
+extra_body={"reasoning_effort": "high"}
+
+| Model | Supported Values |
|---|---|
| GPT-5.2 | none, low, medium, high, xhigh |
| Claude Opus 4.6 | low, medium, high, max |
| Claude Opus 4.5, Sonnet 4.5/4.6 | low, medium, high |
| Gemini 3 Pro | low, high |
| Gemini 3.1 Pro | low, medium, high |
| GLM 4.7, Qwen 3 Thinking, Kimi K2.5, MiniMax | low, medium, high |
| Grok models | Not supported |
| DeepSeek R1 | Built-in only, not configurable |
# Recommended: Venice-level toggle
+extra_body={"reasoning": {"enabled": False}}
+
+# Alternative: provider-level (only some models)
+extra_body={"reasoning": {"effort": "none"}}
+
+
+Pass images alongside text using vision-capable models. Images can be URLs or base64 data URIs.
+ +response = client.chat.completions.create(
+ model="qwen3-vl-235b-a22b",
+ messages=[{
+ "role": "user",
+ "content": [
+ {"type": "text", "text": "What is in this image?"},
+ {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
+ ]
+ }]
+)
+
+Vision models: qwen3-vl-235b-a22b, mistral-small-3-2-24b-instruct (with suffix), and E2EE variants.
Define tools for models to call external APIs. Works with the OpenAI function calling spec. Below is a complete round-trip showing the full loop.
+ +import json
+from openai import OpenAI
+
+client = OpenAI(api_key="your-key", base_url="https://api.venice.ai/api/v1")
+
+tools = [{
+ "type": "function",
+ "function": {
+ "name": "get_weather",
+ "description": "Get current weather for a location",
+ "parameters": {
+ "type": "object",
+ "properties": {
+ "location": {"type": "string", "description": "City name"}
+ },
+ "required": ["location"]
+ }
+ }
+}]
+
+messages = [{"role": "user", "content": "What's the weather in NYC?"}]
+
+response = client.chat.completions.create(
+ model="zai-org-glm-4.7",
+ messages=messages,
+ tools=tools,
+ tool_choice="auto" # "auto" | "required" | {"type":"function","function":{"name":"get_weather"}}
+)
+
+# The response.choices[0].message looks like:
+# {
+# "role": "assistant",
+# "content": null,
+# "tool_calls": [
+# {
+# "id": "call_abc123",
+# "type": "function",
+# "function": {
+# "name": "get_weather",
+# "arguments": "{\"location\": \"New York\"}"
+# }
+# }
+# ]
+# }
+
+assistant_message = response.choices[0].message
+tool_call = assistant_message.tool_calls[0]
+function_args = json.loads(tool_call.function.arguments)
+
+# Execute your actual function
+weather_result = {"temperature": 72, "condition": "sunny", "humidity": 45}
+
+# Append the assistant's tool call message + the tool result
+messages.append(assistant_message)
+messages.append({
+ "role": "tool",
+ "tool_call_id": tool_call.id, # must match the tool_call id
+ "content": json.dumps(weather_result) # result as a JSON string
+})
+
+# Get the final response
+final_response = client.chat.completions.create(
+ model="zai-org-glm-4.7",
+ messages=messages,
+ tools=tools
+)
+
+print(final_response.choices[0].message.content)
+# "The current weather in New York is 72°F and sunny with 45% humidity."
+
+| Value | Behavior |
|---|---|
"auto" | Model decides whether to call a tool (default) |
"required" | Model must call at least one tool |
"none" | Model must not call any tools |
{"type": "function", "function": {"name": "get_weather"}} | Force a specific tool |
role: "tool" message for each tool_call_id. Parallel tool calls are not compatible with structured response_format.
+Models with function calling: zai-org-glm-4.7, zai-org-glm-5, qwen3-4b, mistral-small-3-2-24b-instruct, llama-3.2-3b, and all Claude / GPT / Gemini / Grok models. Check supportsFunctionCalling in the /models endpoint.
Enable real-time web search on any text model:
+ +# Via venice_parameters
+{
+ "model": "zai-org-glm-4.7",
+ "messages": [{"role": "user", "content": "Latest AI news"}],
+ "venice_parameters": {
+ "enable_web_search": "auto",
+ "enable_web_citations": true
+ }
+}
+
+# Via model suffix
+{
+ "model": "zai-org-glm-4.7:enable_web_search=on&enable_web_citations=true",
+ "messages": [{"role": "user", "content": "Latest AI news"}]
+}
+
+Automatically scrapes up to 5 URLs detected in the user message:
+"venice_parameters": {
+ "enable_web_scraping": true
+}
+
+For Grok models, enables xAI's native search across web + X/Twitter:
+"venice_parameters": {
+ "enable_x_search": true
+}
+
+| Feature | Price |
|---|---|
| Web Search | $10.00 / 1K requests |
| Web Scraping | $10.00 / 1K URLs |
| X Search | $10.00 / 1K results |
POST /image/generate
+ +/image/generate (not /images/generations) and returns images[0] (not data[0].b64_json). Do NOT use the OpenAI SDK's client.images.generate() — use raw HTTP requests instead.
+curl https://api.venice.ai/api/v1/image/generate \
+ -H "Authorization: Bearer $VENICE_API_KEY" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "venice-sd35",
+ "prompt": "A cyberpunk city with neon lights and rain",
+ "width": 1024,
+ "height": 1024,
+ "format": "webp"
+ }'
+
+| Field | Type | Default | Description |
|---|---|---|---|
model | string (required) | — | Image model ID |
prompt | string (required) | — | Image description (max 7500 chars) |
width | integer | 1024 | Width in pixels (max 1280) |
height | integer | 1024 | Height in pixels (max 1280) |
format | jpeg|png|webp | webp | Output format |
negative_prompt | string | — | What to exclude from the image |
cfg_scale | number | 7.5 | Prompt adherence (0-20) |
seed | integer | random | Reproducibility seed |
variants | integer | 1 | Number of images (1-4, requires return_binary: false) |
style_preset | string | — | e.g. "3D Model", "Anime", etc. |
aspect_ratio | string | — | e.g. "1:1", "16:9" (certain models) |
resolution | string | — | "1K", "2K", "4K" (certain models like Nano Banana) |
safe_mode | boolean | true | Blur adult content |
return_binary | boolean | false | Return raw image bytes instead of base64 JSON |
embed_exif_metadata | boolean | false | Embed prompt info in EXIF |
hide_watermark | boolean | false | Hide Venice watermark |
{
+ "id": "generate-image-1234567890",
+ "images": [
+ "/9j/4AAQSkZJRgABAQ..." // base64-encoded image data
+ ],
+ "timing": {
+ "total": 3200,
+ "inferenceDuration": 2800,
+ "inferencePreprocessingTime": 150,
+ "inferenceQueueTime": 250
+ }
+}
+
+import base64, requests, os
+
+resp = requests.post("https://api.venice.ai/api/v1/image/generate",
+ headers={"Authorization": f"Bearer {os.getenv('VENICE_API_KEY')}",
+ "Content-Type": "application/json"},
+ json={"model": "venice-sd35", "prompt": "A sunset over Venice", "width": 1024, "height": 1024}
+).json()
+
+with open("output.webp", "wb") as f:
+ f.write(base64.b64decode(resp["images"][0]))
+
+Returns raw image bytes with Content-Type: image/webp (or image/png, image/jpeg depending on format). Save directly:
curl https://api.venice.ai/api/v1/image/generate \
+ -H "Authorization: Bearer $VENICE_API_KEY" \
+ -H "Content-Type: application/json" \
+ -d '{"model":"venice-sd35","prompt":"A sunset","return_binary":true}' \
+ -o output.webp
+
+| Model | ID | Price | Privacy |
|---|---|---|---|
| Recraft V4 Pro | recraft-v4-pro | $0.29/img | Anonymized |
| GPT Image 1.5 | gpt-image-1-5 | $0.26/img | Anonymized |
| Nano Banana Pro | nano-banana-pro | $0.18-$0.35 | Anonymized |
| Qwen Image 2 Pro | qwen-image-2-pro | $0.10/img | Anonymized |
| Flux 2 Max | flux-2-max | $0.09/img | Anonymized |
| Venice SD35 | venice-sd35 | $0.01/img | Private |
| Qwen Image | qwen-image | $0.01/img | Private |
| Chroma | chroma | $0.01/img | Private |
| Z-Image Turbo | z-image-turbo | $0.01/img | Private |
POST /image/edit
+AI-powered inpainting. Send base64-encoded image + text prompt. Returns edited image as raw binary (Content-Type: image/png).
# Python — edit an image and save result
+import base64, requests, os
+
+with open("photo.jpg", "rb") as f:
+ image_b64 = base64.b64encode(f.read()).decode()
+
+resp = requests.post("https://api.venice.ai/api/v1/image/edit",
+ headers={"Authorization": f"Bearer {os.getenv('VENICE_API_KEY')}",
+ "Content-Type": "application/json"},
+ json={"prompt": "Make it look like a watercolor painting", "image": image_b64}
+)
+
+with open("edited.png", "wb") as f:
+ f.write(resp.content)
+
+# cURL
+curl -X POST https://api.venice.ai/api/v1/image/edit \
+ -H "Authorization: Bearer $VENICE_API_KEY" \
+ -H "Content-Type: application/json" \
+ -d '{"prompt": "Colorize", "image": "'$(base64 -i photo.jpg)'"}' \
+ -o edited.png
+
+Edit models: qwen-image (default), flux-2-max-edit, gpt-image-1-5-edit, qwen-image-2-edit, nano-banana-pro-edit, seedream-v4-edit, grok-imagine-edit.
POST /image/multi-edit
+Combine and edit up to 3 images with layered inputs. Send an array of base64 images with per-layer prompts. Returns binary image data.
+ +POST /image/upscale
+Returns upscaled image as raw binary (Content-Type: image/png).
# Python
+import base64, requests, os
+
+with open("small.jpg", "rb") as f:
+ image_b64 = base64.b64encode(f.read()).decode()
+
+resp = requests.post("https://api.venice.ai/api/v1/image/upscale",
+ headers={"Authorization": f"Bearer {os.getenv('VENICE_API_KEY')}",
+ "Content-Type": "application/json"},
+ json={"image": image_b64, "scale": 4} # 2 or 4
+)
+
+with open("upscaled.png", "wb") as f:
+ f.write(resp.content)
+
+# cURL
+curl -X POST https://api.venice.ai/api/v1/image/upscale \
+ -H "Authorization: Bearer $VENICE_API_KEY" \
+ -H "Content-Type: application/json" \
+ -d '{"image": "'$(base64 -i small.jpg)'", "scale": 2}' \
+ -o upscaled.png
+
+Pricing: 2x = $0.02, 4x = $0.08. Model: upscaler.
POST /image/background-remove
+Returns a PNG with transparent background. Accepts base64, file upload (multipart), or URL.
+ +# Via URL (easiest)
+curl -X POST https://api.venice.ai/api/v1/image/background-remove \
+ -H "Authorization: Bearer $VENICE_API_KEY" \
+ -H "Content-Type: application/json" \
+ -d '{"image_url": "https://example.com/photo.jpg"}' \
+ -o no-bg.png
+
+# Via base64
+curl -X POST https://api.venice.ai/api/v1/image/background-remove \
+ -H "Authorization: Bearer $VENICE_API_KEY" \
+ -H "Content-Type: application/json" \
+ -d '{"image": "'$(base64 -i photo.jpg)'"}' \
+ -o no-bg.png
+
+Model: bria-bg-remover. $0.03/image. Max file size: 25MB.
POST /audio/speech
+ +{
+ "input": "Hello, welcome to Venice Voice.",
+ "model": "tts-kokoro",
+ "voice": "af_sky",
+ "response_format": "mp3",
+ "speed": 1.0
+}
+
+| Field | Type | Description |
|---|---|---|
input | string (required) | Text to speak (max 4096 chars) |
model | string | tts-kokoro, tts-qwen3-0-6b, or tts-qwen3-1-7b |
voice | string | Voice ID (60+ options). Default: af_sky |
response_format | string | mp3, opus, aac, flac, wav, pcm |
speed | number | 0.25 to 4.0 (default 1.0) |
streaming | boolean | Stream sentence by sentence |
language | string | Qwen 3 TTS only: Auto, English, Chinese, Spanish, French, etc. |
prompt | string | Qwen 3 TTS only: emotion/style prompt (max 500 chars) |
af_sky, af_nova, af_bella, af_heart, am_adam, am_echo, am_liam, am_michael, bf_emma, bf_lily, bm_george, zf_xiaobei, jm_kumo, ff_siwis, pf_dora, and many more.
Vivian, Serena, Dylan, Eric, Ryan, Aiden, Ono_Anna, Sohee, Uncle_Fu.
Content-Type matches your response_format (e.g. audio/mpeg for mp3). Save directly to file.
+# cURL — save TTS output
+curl -X POST https://api.venice.ai/api/v1/audio/speech \
+ -H "Authorization: Bearer $VENICE_API_KEY" \
+ -H "Content-Type: application/json" \
+ -d '{"input":"Hello world","model":"tts-kokoro","voice":"af_sky"}' \
+ -o speech.mp3
+
+POST /audio/transcriptions
+Multipart form upload. Supports WAV, FLAC, MP3, M4A, AAC, MP4.
+ +# cURL
+curl -X POST https://api.venice.ai/api/v1/audio/transcriptions \
+ -H "Authorization: Bearer $VENICE_API_KEY" \
+ -F file=@recording.mp3 \
+ -F model=nvidia/parakeet-tdt-0.6b-v3 \
+ -F response_format=json
+
+# Python
+from openai import OpenAI
+
+client = OpenAI(api_key="your-key", base_url="https://api.venice.ai/api/v1")
+
+with open("recording.mp3", "rb") as f:
+ transcript = client.audio.transcriptions.create(
+ model="nvidia/parakeet-tdt-0.6b-v3",
+ file=f,
+ response_format="json"
+ )
+print(transcript.text)
+
+Models: nvidia/parakeet-tdt-0.6b-v3, openai/whisper-large-v3. Add timestamps=true for word-level timing.
POST /audio/queue → poll /audio/retrieve
+ +Same async pattern as video. Queue a job, get a queue_id, poll until audio bytes are returned.
# Step 1: Get a price quote (optional)
+curl -X POST https://api.venice.ai/api/v1/audio/quote \
+ -H "Authorization: Bearer $VENICE_API_KEY" \
+ -H "Content-Type: application/json" \
+ -d '{"model":"ace-step-15","duration_seconds":60}'
+# → {"quote": 0.03}
+
+# Step 2: Queue generation
+curl -X POST https://api.venice.ai/api/v1/audio/queue \
+ -H "Authorization: Bearer $VENICE_API_KEY" \
+ -H "Content-Type: application/json" \
+ -d '{"model":"ace-step-15","prompt":"An upbeat electronic track with synth leads"}'
+# → {"model":"ace-step-15","queue_id":"abc-123","status":"QUEUED"}
+
+# Step 3: Poll until complete (returns audio bytes when done)
+curl -X POST https://api.venice.ai/api/v1/audio/retrieve \
+ -H "Authorization: Bearer $VENICE_API_KEY" \
+ -H "Content-Type: application/json" \
+ -d '{"model":"ace-step-15","queue_id":"abc-123"}' \
+ -o music.mp3
+# While processing: → {"status":"PROCESSING","average_execution_time":20000,"execution_duration":5200}
+# When complete: → raw audio bytes (Content-Type: audio/mpeg)
+
+| Field | Type | Description |
|---|---|---|
model | string (required) | Music model ID |
prompt | string (required) | Description of the audio to generate |
lyrics_prompt | string | Lyrics for lyric-capable models |
duration_seconds | integer | Duration hint in seconds |
force_instrumental | boolean | Force instrumental (no vocals) |
voice | string | Voice selection for voice-enabled models |
Models: ace-step-15, elevenlabs-music, minimax-music-v2, stable-audio-25. Sound effects: elevenlabs-sound-effects-v2, mmaudio-v2-text-to-audio.
POST /video/queue → poll /video/retrieve
+ +Asynchronous: queue a job, get a queue_id, poll until complete.
{
+ "model": "wan-2.5-preview-text-to-video",
+ "prompt": "Commerce in Venice, Italy",
+ "duration": "10s",
+ "resolution": "720p",
+ "aspect_ratio": "16:9"
+}
+
+| Field | Type | Description |
|---|---|---|
model | string (required) | Video model ID |
prompt | string (required) | Max 2500 chars |
duration | string (required) | 5s or 10s |
resolution | string | 480p, 720p, 1080p |
aspect_ratio | string | e.g. 16:9 |
image_url | string | Reference image for image-to-video models |
negative_prompt | string | What to avoid |
audio | boolean | Generate audio (if model supports it) |
# Step 1: Get a price quote (optional but recommended)
+curl -X POST https://api.venice.ai/api/v1/video/quote \
+ -H "Authorization: Bearer $VENICE_API_KEY" \
+ -H "Content-Type: application/json" \
+ -d '{"model":"wan-2.6-text-to-video","duration":"10s","resolution":"720p"}'
+# → {"quote": 0.35}
+
+# Step 2: Queue the video generation
+curl -X POST https://api.venice.ai/api/v1/video/queue \
+ -H "Authorization: Bearer $VENICE_API_KEY" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "wan-2.6-text-to-video",
+ "prompt": "A drone shot over Venice canals at golden hour",
+ "duration": "10s",
+ "resolution": "720p",
+ "aspect_ratio": "16:9"
+ }'
+# Response:
+# {"model": "wan-2.6-text-to-video", "queue_id": "550e8400-e29b-41d4-a716-446655440000"}
+
+# Step 3: Poll /video/retrieve until complete
+# While processing → returns JSON:
+# {"status":"PROCESSING","average_execution_time":145000,"execution_duration":53200}
+#
+# When complete → returns raw MP4 bytes (Content-Type: video/mp4)
+
+curl -X POST https://api.venice.ai/api/v1/video/retrieve \
+ -H "Authorization: Bearer $VENICE_API_KEY" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "wan-2.6-text-to-video",
+ "queue_id": "550e8400-e29b-41d4-a716-446655440000",
+ "delete_media_on_completion": true
+ }' \
+ -o output.mp4
+
+import requests, time, os
+
+API = "https://api.venice.ai/api/v1"
+HEADERS = {"Authorization": f"Bearer {os.getenv('VENICE_API_KEY')}",
+ "Content-Type": "application/json"}
+
+# Queue
+q = requests.post(f"{API}/video/queue", headers=HEADERS, json={
+ "model": "wan-2.6-text-to-video",
+ "prompt": "A drone shot over Venice canals at golden hour",
+ "duration": "10s", "resolution": "720p"
+}).json()
+
+queue_id = q["queue_id"]
+print(f"Queued: {queue_id}")
+
+# Poll
+while True:
+ r = requests.post(f"{API}/video/retrieve", headers=HEADERS, json={
+ "model": "wan-2.6-text-to-video",
+ "queue_id": queue_id,
+ "delete_media_on_completion": True
+ })
+ if r.headers.get("Content-Type", "").startswith("video/"):
+ with open("output.mp4", "wb") as f:
+ f.write(r.content)
+ print("Done! Saved output.mp4")
+ break
+ else:
+ status = r.json()
+ elapsed = status.get("execution_duration", 0) / 1000
+ eta = status.get("average_execution_time", 0) / 1000
+ print(f"Processing... {elapsed:.0f}s / ~{eta:.0f}s est.")
+ time.sleep(10)
+
+| Field | Type | Description |
|---|---|---|
model | string (required) | Same model used in queue |
queue_id | string (required) | ID returned by /video/queue |
delete_media_on_completion | boolean | Delete from storage after download (default: false) |
| Model | ID (Text-to-Video) | Privacy |
|---|---|---|
| Veo 3.1 | veo3.1-full-text-to-video | Anon |
| Sora 2 Pro | sora-2-pro-text-to-video | Anon |
| Kling V3 Pro | kling-v3-pro-text-to-video | Anon |
| Wan 2.6 | wan-2.6-text-to-video | Anon |
| Longcat | longcat-text-to-video | Private |
| LTX 2.0 | ltx-2-full-text-to-video | Anon |
POST /embeddings
+ +{
+ "model": "text-embedding-bge-m3",
+ "input": "Privacy-first AI infrastructure",
+ "encoding_format": "float"
+}
+
+Model: text-embedding-bge-m3. Input: $0.15/1M tokens, Output: $0.60/1M tokens. Privacy: Private. Max input: 8192 tokens. Batch: up to 2048 inputs per request.
{
+ "object": "list",
+ "model": "text-embedding-bge-m3",
+ "data": [
+ {
+ "object": "embedding",
+ "index": 0,
+ "embedding": [0.0023064255, -0.009327292, 0.015797377, ...] // 1024-dimensional float vector
+ }
+ ],
+ "usage": {
+ "prompt_tokens": 8,
+ "total_tokens": 8
+ }
+}
+
+Vector dimensions: text-embedding-bge-m3 produces 1024-dimensional vectors by default. You can optionally reduce dimensions via the dimensions parameter.
Encoding formats: "float" (array of numbers) or "base64" (compact binary).
Batch input: Pass an array of strings to embed multiple inputs in one request.
+ + +GET /characters
+ +List and filter AI character personas. Use a character's slug in venice_parameters.character_slug for chat completions.
| Param | Description |
|---|---|
search | Search by name, description, or tags |
categories | Filter by category (e.g. roleplay, philosophy) |
tags | Filter by tags |
modelId | Filter by model |
isAdult | true or false |
sortBy | featured, highestRating, mostRecent, imports, etc. |
limit / offset | Pagination (max 100) |
{
+ "model": "venice-uncensored",
+ "messages": [{"role": "user", "content": "What is the meaning of life?"}],
+ "venice_parameters": {
+ "character_slug": "alan-watts"
+ }
+}
+
+
+The only way to achieve reasonable user privacy is to avoid collecting information in the first place.+ +
Models with tee-* prefix run inside hardware-secured enclaves (Intel TDX, NVIDIA CC). Venice cannot access the computation.
response = client.chat.completions.create(
+ model="tee-qwen3-5-122b-a10b",
+ messages=[{"role": "user", "content": "Explain quantum computing"}]
+)
+
+GET /tee/attestation?model=...&nonce=...
+Returns cryptographic proof the model runs in a genuine TEE. Fields: verified, nonce, tee_provider, intel_quote, nvidia_payload, signing_key, signing_address.
GET /tee/signature?model=...&request_id=...
+Proves a response came from the attested enclave.
+ +Models with e2ee-* prefix add client-side encryption on top of TEE. Your prompts are encrypted before leaving your device.
Crypto stack: ECDH on secp256k1 → HKDF-SHA256 → AES-256-GCM.
+ +GET /tee/attestation?model=e2ee-glm-4-7-p&nonce=<random-hex>. Response includes signing_key (model's public key).verified: true, nonce match, and optionally parse Intel TDX quote.X-Venice-TEE-Client-Pub-Key — your public key (hex)X-Venice-TEE-Model-Pub-Key — model's public key from attestation (hex)X-Venice-TEE-Signing-Algo — "secp256k1"| Model | ID | Context |
|---|---|---|
| GLM 4.7 | e2ee-glm-4-7-p | 128K |
| GLM 4.7 Flash | e2ee-glm-4-7-flash-p | 198K |
| GLM 5 | e2ee-glm-5 | 198K |
| Gemma 3 27B | e2ee-gemma-3-27b-p | 40K |
| GPT OSS 120B | e2ee-gpt-oss-120b-p | 128K |
| Qwen3 30B | e2ee-qwen3-30b-a3b-p | 256K |
| Qwen3 VL 30B | e2ee-qwen3-vl-30b-a3b-p | 128K |
| Qwen3.5 122B | e2ee-qwen3-5-122b-a10b | 128K |
| Venice Uncensored | e2ee-venice-uncensored-24b-p | 32K |
| Qwen 2.5 7B | e2ee-qwen-2-5-7b-p | 32K |
Reduces latency (up to 80%) and costs (up to 90%) by reusing processed input tokens on prefix-matched requests.
+ +Routing hint for cache affinity. Same key → same server → higher hit rate.
+{
+ "model": "claude-opus-45",
+ "prompt_cache_key": "session-abc-123",
+ "messages": [...]
+}
+
+{
+ "usage": {
+ "prompt_tokens": 5500,
+ "prompt_tokens_details": {
+ "cached_tokens": 5000,
+ "cache_creation_input_tokens": 0
+ }
+ }
+}
+
+| Provider | Min Tokens | Lifetime | Read Discount |
|---|---|---|---|
| Anthropic (Claude) | ~4,000 | 5 min | 90% |
| OpenAI (GPT) | 1,024 | 5-10 min | 90% |
| Google (Gemini) | ~1,024 | 1 hour | 75-90% |
| xAI (Grok) | ~1,024 | 5 min | 75-88% |
| DeepSeek | ~1,024 | 5 min | 50% |
| MiniMax | ~1,024 | 5 min | 90% |
| Moonshot (Kimi) | ~1,024 | 5 min | 50% |
Discover models programmatically: GET /models
+ +Key response fields per model: id, type, model_spec.availableContextTokens, model_spec.capabilities (supportsFunctionCalling, supportsResponseSchema, supportsWebSearch, supportsReasoning, supportsReasoningEffort).
| Use Case | Model ID | Context |
|---|---|---|
| Flagship / complex tasks | zai-org-glm-4.7 | 198K |
| Flagship v2 | zai-org-glm-5 | 198K |
| Balanced general use | llama-3.3-70b | 128K |
| Fast / cost-efficient | qwen3-4b | 40K |
| Uncensored | venice-uncensored | 32K |
| Vision | qwen3-vl-235b-a22b | 256K |
| Long context coding | qwen3-coder-480b-a35b-instruct | 256K |
| Deep reasoning | qwen3-235b-a22b-thinking-2507 | 128K |
| GPT (via Venice) | openai-gpt-52 | 256K |
| Claude (via Venice) | claude-opus-4-6 | 1000K |
| Gemini (via Venice) | gemini-3-1-pro-preview | 1000K |
| Grok (via Venice) | grok-4-20-beta | 2000K |
| Image generation (budget) | venice-sd35 | — |
| Image generation (quality) | recraft-v4-pro | — |
| TTS | tts-kokoro | — |
| Embeddings | text-embedding-bge-m3 | — |
Prices per 1M tokens unless noted. All USD. Full pricing: docs.venice.ai/overview/pricing
+ +| Model | ID | Input | Output | Cache Read | Context | Privacy |
|---|---|---|---|---|---|---|
| Claude Opus 4.6 | claude-opus-4-6 | $6.00 | $30.00 | $0.60 | 1000K | Anon |
| GPT-5.2 | openai-gpt-52 | $2.19 | $17.50 | $0.22 | 256K | Anon |
| Grok 4.20 | grok-4-20-beta | $2.50 | $7.50 | $0.25 | 2000K | Anon |
| GLM 4.7 | zai-org-glm-4.7 | $0.55 | $2.65 | $0.11 | 198K | Private |
| GLM 5 | zai-org-glm-5 | $1.00 | $3.20 | $0.20 | 198K | Private |
| DeepSeek V3.2 | deepseek-v3.2 | $0.33 | $0.48 | $0.16 | 160K | Private |
| Llama 3.3 70B | llama-3.3-70b | $0.70 | $2.80 | — | 128K | Private |
| Venice Uncensored | venice-uncensored | $0.20 | $0.90 | — | 32K | Private |
| Qwen 3 Coder 480B | qwen3-coder-480b-a35b-instruct | $0.75 | $3.00 | — | 256K | Private |
| GLM 4.7 Flash | zai-org-glm-4.7-flash | $0.13 | $0.50 | — | 128K | Private |
| Qwen 3.5 9B | qwen3-5-9b | $0.05 | $0.15 | — | 256K | Private |
| Method | Details |
|---|---|
| USD | Credit card. Credits never expire. |
| Crypto | Cryptocurrency. Same rates as USD. |
| DIEM Staking | Each DIEM = $1/day of credits that refresh daily. |
| Pro Subscription | One-time $10 API credit when upgrading to Pro. |
X402 is an open standard for internet-native payments using the HTTP 402 Payment Required status code. Venice supports X402 to let agents authenticate with a crypto wallet and pay for inference automatically — no API key required.
| Endpoint | Auth Header | Purpose |
|---|---|---|
POST /chat/completions | X-Sign-In-With-X | Paid inference (only endpoint currently supported) |
GET /x402/balance/{address} | X-Sign-In-With-X | Check spendable balance |
GET /x402/transactions/{address} | X-Sign-In-With-X | View transaction history |
POST /x402/top-up | X-402-Payment | Add USDC balance (different header!) |
Venice X402 uses a SIWE (Sign In With Ethereum) message, signed with EIP-191, then base64-encoded:
+ +import { Wallet } from 'ethers'
+import { SiweMessage, generateNonce } from 'siwe'
+
+const signer = new Wallet(process.env.PRIVATE_KEY)
+
+const siwe = new SiweMessage({
+ domain: 'outerface.venice.ai',
+ address: signer.address,
+ statement: 'Sign in to Venice AI',
+ uri: 'https://outerface.venice.ai/api/v1/chat/completions', // must match the route you're calling
+ version: '1',
+ chainId: 8453, // Base
+ nonce: generateNonce(),
+ issuedAt: new Date().toISOString(),
+ expirationTime: new Date(Date.now() + 10 * 60 * 1000).toISOString(), // 10 min
+})
+
+const message = siwe.prepareMessage()
+const signature = await signer.signMessage(message)
+
+const headerValue = Buffer.from(JSON.stringify({
+ address: signer.address.toLowerCase(),
+ message,
+ signature,
+ chainId: 8453,
+ timestamp: Date.now(),
+}), 'utf8').toString('base64')
+
+uri in the SIWE message must match the Venice route you're calling. Generate a fresh header for each different endpoint. Headers expire after 10 minutes.
+curl -X GET "https://api.venice.ai/api/v1/x402/balance/0xYOUR_ADDRESS" \
+ -H "X-Sign-In-With-X: $BALANCE_AUTH"
+
+# Response:
+{
+ "success": true,
+ "data": {
+ "walletAddress": "0xyour_wallet_address",
+ "balanceUsd": 12.5,
+ "canConsume": true,
+ "minimumTopUpUsd": 5,
+ "suggestedTopUpUsd": 10,
+ "diemBalanceUsd": 5.25
+ }
+}
+
+curl -X POST https://api.venice.ai/api/v1/chat/completions \
+ -H "Content-Type: application/json" \
+ -H "X-Sign-In-With-X: $CHAT_AUTH" \
+ -d '{
+ "model": "kimi-k2-5",
+ "messages": [{"role": "user", "content": "Hello from an x402-authenticated wallet."}]
+ }'
+
+# First call without payment header to get requirements:
+curl -X POST https://api.venice.ai/api/v1/x402/top-up
+# → 402 response with paymentInfo:
+{
+ "error": "PAYMENT_REQUIRED",
+ "message": "Send x402 payment via X-402-Payment header",
+ "paymentInfo": {
+ "receiverWallet": "0xRECEIVER_WALLET",
+ "network": "eip155:8453",
+ "token": "USDC",
+ "tokenAddress": "0xUSDC_TOKEN_ADDRESS",
+ "minimumAmountUsd": 5,
+ "suggestedAmountUsd": 10
+ }
+}
+
+# Then retry with signed X-402-Payment header:
+curl -X POST https://api.venice.ai/api/v1/x402/top-up \
+ -H "X-402-Payment: $X402_PAYMENT"
+# → {"success":true,"data":{"amountCredited":10,"newBalance":22.5}}
+
+curl -X GET "https://api.venice.ai/api/v1/x402/transactions/0xYOUR_ADDRESS?limit=10&offset=0" \
+ -H "X-Sign-In-With-X: $TX_AUTH"
+Returns entries like TOP_UP, CHARGE, REFUND with requestId and modelId for spend correlation.
npm install @venice-ai/x402-client
+
+import { VeniceClient } from '@venice-ai/x402-client'
+
+const venice = new VeniceClient(process.env.WALLET_KEY)
+
+const response = await venice.chat({
+ model: 'kimi-k2-5',
+ messages: [{ role: 'user', content: 'Hello!' }]
+})
+
+| Error | HTTP | Cause / Fix |
|---|---|---|
Authentication failed | 401 | Regenerate SIWE header. Check: valid base64, signature matches wallet, chain is 8453, not expired. |
Unauthorized | 403 | Wallet in header doesn't match {address} path parameter. |
Insufficient balance | 402 | Wallet auth worked but balance too low. Check /x402/balance, top up if needed. |
PAYMENT_REQUIRED | 402 | Expected on first top-up call. Use returned paymentInfo to build X-402-Payment. |
X402_INVALID_PAYMENT | 400 | Malformed payment header. Rebuild from scratch. |
If you're sending an agent to use Venice via X402, give it this context:
+Use the Venice API with your wallet.
+
+Auth: Build a SIWE message (domain: outerface.venice.ai, chain: 8453, URI: the Venice route
+you're calling), sign with EIP-191, base64-encode as JSON, send as X-Sign-In-With-X header.
+Headers expire after 10 minutes — regenerate per request.
+
+Inference: POST https://api.venice.ai/api/v1/chat/completions
+ Header: X-Sign-In-With-X: <base64>
+ Body: {"model":"kimi-k2-5","messages":[{"role":"user","content":"Hello"}]}
+
+Balance: GET https://api.venice.ai/api/v1/x402/balance/<address>
+Transactions: GET https://api.venice.ai/api/v1/x402/transactions/<address>
+DIEM staked on the wallet is used automatically. No DIEM = top up with USDC on Base (min $5).
+Guide: https://docs.venice.ai/overview/guides/x402-venice-api
+
+Full documentation: docs.venice.ai/overview/guides/x402-venice-api | Client library: veniceai/x402-client
+ + +Check your limits: GET /api_keys/rate_limits
+ +| Tier | Requests/min | Tokens/min | Example Models |
|---|---|---|---|
| XS | 500 | 1,000,000 | qwen3-4b, llama-3.2-3b |
| S | 75 | 750,000 | mistral-31-24b, venice-uncensored |
| M | 50 | 750,000 | llama-3.3-70b, qwen3-next-80b |
| L | 20 | 500,000 | zai-org-glm-4.7, grok-41-fast, qwen3-coder-480b |
| Type | Requests/min |
|---|---|
| Image | 20 |
| Audio | 60 |
| Embedding | 500 |
| Video (queue) | 40 |
| Video (retrieve) | 120 |
| Header | Description |
|---|---|
x-ratelimit-limit-requests | Max requests in current window |
x-ratelimit-remaining-requests | Requests remaining |
x-ratelimit-reset-requests | Unix timestamp when window resets |
x-ratelimit-limit-tokens | Max tokens per minute |
x-ratelimit-remaining-tokens | Tokens remaining |
x-ratelimit-reset-tokens | Seconds until token limit resets |
Abuse Protection: 20+ failed requests in 30 seconds → blocked for 30 seconds.
+ +Partner Tier: Significantly higher limits available. Contact api@venice.ai.
+ + +| Code | HTTP | Meaning |
|---|---|---|
AUTHENTICATION_FAILED | 401 | Invalid or missing API key |
AUTHENTICATION_FAILED_INACTIVE_KEY | 401 | Pro subscription inactive |
INVALID_API_KEY | 401 | API key format invalid |
INSUFFICIENT_BALANCE | 402 | No USD or DIEM balance remaining |
UNAUTHORIZED | 403 | No access to this resource |
INVALID_REQUEST | 400 | Bad parameters |
INVALID_MODEL | 400 | Model doesn't exist |
CHARACTER_NOT_FOUND | 404 | Character slug not found |
MODEL_NOT_FOUND | 404 | Model not found |
INVALID_CONTENT_TYPE | 415 | Wrong Content-Type header |
INVALID_FILE_SIZE | 413 | File too large |
INVALID_IMAGE_FORMAT | 400 | Unsupported image format |
CORRUPTED_IMAGE | 400 | Image file unreadable |
RATE_LIMIT_EXCEEDED | 429 | Too many requests |
INFERENCE_FAILED | 500 | Model inference error |
UPSCALE_FAILED | 500 | Upscaling error |
UNKNOWN_ERROR | 500 | Unexpected server error |
// Standard error (most endpoints)
+{"error": "Rate limit exceeded"}
+
+// Detailed error (validation failures — 400)
+{
+ "error": "Invalid request parameters",
+ "details": {
+ "model": {"_errors": ["Invalid model specified"]},
+ "prompt": {"_errors": ["Field is required"]}
+ }
+}
+
+// Content violation (422 — video/audio)
+{
+ "error": "Your prompt violates the content policy",
+ "suggested_prompt": "A cinematic instrumental track inspired by stormy weather"
+}
+
+Retry strategy: use exponential backoff for 429, 500, 503 errors. Check x-ratelimit-reset-requests header for 429.
| Header | Purpose |
|---|---|
CF-RAY | Unique request ID (log for support) |
x-venice-version | API version/revision |
x-venice-model-id | Model used for inference |
x-venice-model-name | Friendly model name |
x-venice-model-deprecation-warning | Deprecation notice |
x-venice-model-deprecation-date | When model will be removed |
x-venice-balance-usd | USD balance before request |
x-venice-balance-diem | DIEM balance before request |
x-venice-is-blurred | Image was blurred (Safe Venice) |
x-venice-is-content-violation | Content policy violation |
x-ratelimit-* | Rate limiting info (see above) |
x-pagination-* | Pagination metadata |
Venice is a drop-in replacement for OpenAI. Same SDK, same code — just change two values:
+# Python
+client = OpenAI(
+ api_key="your-venice-api-key", # ← Change 1
+ base_url="https://api.venice.ai/api/v1" # ← Change 2
+)
+
+# Node.js
+const client = new OpenAI({
+ apiKey: 'your-venice-api-key',
+ baseURL: 'https://api.venice.ai/api/v1',
+});
+
+# Environment variables (many libraries auto-read these)
+OPENAI_API_KEY=your-venice-api-key
+OPENAI_BASE_URL=https://api.venice.ai/api/v1
+
+| OpenAI Model | Venice Equivalent | Type | Pricing (In/Out per 1M) |
|---|---|---|---|
| gpt-4o | zai-org-glm-4.7 Private | Text | $0.55 / $2.65 |
| gpt-4o | openai-gpt-52 Anon | Text | $2.19 / $17.50 |
| gpt-4o-mini | qwen3-4b | Text | $0.05 / $0.15 |
| o1 / o3 | qwen3-235b-a22b-thinking-2507 | Reasoning | $0.45 / $3.50 |
| gpt-4-vision | qwen3-vl-235b-a22b | Vision | $0.25 / $1.50 |
| text-embedding-3-small | text-embedding-bge-m3 | Embeddings | $0.15 / $0.60 |
| dall-e-3 | qwen-image Private | Image | $0.01/img |
| whisper | nvidia/parakeet-tdt-0.6b-v3 | STT | $0.0001/sec |
| tts-1 | tts-kokoro | TTS | $3.50/1M chars |
| Framework | Change Required |
|---|---|
| LangChain | base_url in ChatOpenAI |
| Vercel AI SDK | baseURL in createOpenAI |
| CrewAI | OPENAI_API_BASE env var |
| LlamaIndex | api_base in OpenAI |
| AutoGen | base_url in config |
| Haystack | api_base_url in OpenAIGenerator |
| Claude Code | Use claude-code-router |
| Cursor | Custom API endpoint in settings |
| Continue.dev | apiBase in config.json |
Full migration guide: docs.venice.ai/overview/guides/openai-migration
+ +pip install langchain langchain-openai openai
+
+from langchain_openai import ChatOpenAI
+
+llm = ChatOpenAI(
+ model="venice-uncensored",
+ api_key="your-venice-api-key",
+ base_url="https://api.venice.ai/api/v1",
+ temperature=0.7,
+)
+
+response = llm.invoke("Explain privacy-preserving AI.")
+print(response.content)
+
+from langchain_openai import OpenAIEmbeddings
+
+embeddings = OpenAIEmbeddings(
+ model="text-embedding-bge-m3",
+ api_key="your-venice-api-key",
+ base_url="https://api.venice.ai/api/v1",
+ check_embedding_ctx_length=False, # Required for Venice
+)
+
+from langchain_community.vectorstores import FAISS
+from langchain_core.prompts import ChatPromptTemplate
+from langchain_core.runnables import RunnablePassthrough
+from langchain_core.output_parsers import StrOutputParser
+
+vectorstore = FAISS.from_texts(documents, embeddings)
+retriever = vectorstore.as_retriever(search_kwargs={"k": 2})
+
+rag_prompt = ChatPromptTemplate.from_messages([
+ ("system", "Answer based only on this context:\n\n{context}"),
+ ("user", "{question}"),
+])
+
+rag_chain = (
+ {"context": retriever | format_docs, "question": RunnablePassthrough()}
+ | rag_prompt | llm | StrOutputParser()
+)
+
+answer = rag_chain.invoke("What privacy levels does Venice offer?")
+
+from langchain_core.tools import tool
+from langchain.agents import create_tool_calling_agent, AgentExecutor
+
+llm = ChatOpenAI(model="zai-org-glm-4.7", api_key="...", base_url="https://api.venice.ai/api/v1")
+
+@tool
+def get_price(model_id: str) -> str:
+ """Get pricing for a Venice AI model."""
+ prices = {"venice-uncensored": "$0.20/$0.90", "zai-org-glm-4.7": "$0.55/$2.65"}
+ return prices.get(model_id, "Not found")
+
+agent = create_tool_calling_agent(llm, [get_price], prompt)
+executor = AgentExecutor(agent=agent, tools=[get_price])
+result = executor.invoke({"input": "What's the cheapest model?"})
+
+llm_with_search = ChatOpenAI(
+ model="venice-uncensored",
+ api_key="...",
+ base_url="https://api.venice.ai/api/v1",
+ extra_body={"venice_parameters": {"enable_web_search": "auto"}}
+)
+
+Full guide: docs.venice.ai/overview/guides/langchain
+ +npm install ai @ai-sdk/openai
+
+// lib/venice.ts
+import { createOpenAI } from '@ai-sdk/openai';
+
+const openai = createOpenAI({
+ apiKey: process.env.VENICE_API_KEY!,
+ baseURL: 'https://api.venice.ai/api/v1',
+});
+
+// Use .chat() to ensure compatibility with Venice's chat completions endpoint
+export const venice = (modelId: string) => openai.chat(modelId);
+
+.chat() — the default openai('model') syntax may use newer OpenAI endpoints Venice doesn't support yet.
+// app/api/chat/route.ts
+import { streamText } from 'ai';
+import { venice } from '@/lib/venice';
+
+export async function POST(req: Request) {
+ const { messages } = await req.json();
+ const result = streamText({
+ model: venice('venice-uncensored'),
+ system: 'You are a helpful, privacy-respecting AI assistant.',
+ messages,
+ });
+ return result.toDataStreamResponse();
+}
+
+import { streamText, tool } from 'ai';
+import { z } from 'zod';
+
+const result = streamText({
+ model: venice('zai-org-glm-4.7'),
+ messages: [{ role: 'user', content: 'Weather in Tokyo?' }],
+ tools: {
+ getWeather: tool({
+ description: 'Get current weather',
+ parameters: z.object({ location: z.string() }),
+ execute: async ({ location }) => ({ temperature: 22, condition: 'Sunny', location }),
+ }),
+ },
+});
+
+import { generateObject } from 'ai';
+import { z } from 'zod';
+
+const { object } = await generateObject({
+ model: venice('venice-uncensored'),
+ schema: z.object({
+ recipe: z.object({
+ name: z.string(),
+ ingredients: z.array(z.string()),
+ steps: z.array(z.string()),
+ }),
+ }),
+ prompt: 'Generate a recipe for chocolate chip cookies.',
+});
+
+import { embed } from 'ai';
+import { createOpenAI } from '@ai-sdk/openai';
+
+const openai = createOpenAI({
+ apiKey: process.env.VENICE_API_KEY!,
+ baseURL: 'https://api.venice.ai/api/v1',
+});
+
+const { embedding } = await embed({
+ model: openai.textEmbeddingModel('text-embedding-bge-m3'),
+ value: 'Privacy-first AI infrastructure',
+});
+
+Full guide: docs.venice.ai/overview/guides/vercel-ai-sdk
+ +OpenClaw is an open-source AI gateway connecting messaging platforms (WhatsApp, Telegram, Discord, Slack, iMessage) to AI models. Venice is a built-in provider.
+ +# Install
+curl -fsSL https://openclaw.ai/install.sh | bash
+# Or: npm install -g openclaw@latest
+
+# Onboard (select Venice as provider, paste API key)
+openclaw onboard
+
+# Set model
+openclaw models set venice/zai-org-glm-5
+
+# Start
+openclaw tui # Terminal UI
+openclaw dashboard # Web dashboard
+openclaw gateway # Messaging channels
+
+| Use Case | Model | Privacy |
|---|---|---|
| General | venice/zai-org-glm-5 | Private |
| Reasoning | venice/kimi-k2-5 | Private |
| Coding | venice/claude-opus-4-6 | Anon |
| Vision | venice/qwen3-vl-235b-a22b | Private |
| Uncensored | venice/venice-uncensored | Private |
# Install image/video generation skill
+openclaw skills install nhannah/venice-ai-media
+
+Full guide: docs.venice.ai/overview/guides/openclaw-bot | OpenClaw Venice provider docs
+ +| Integration | Type | Setup |
|---|---|---|
| Eliza (ai16z) | Agent framework | Set modelProvider: "venice" in character config. Configure VENICE_API_KEY + model env vars. |
| Coinbase AgentKit | Agent framework | Native Venice support. |
| Cursor IDE | Coding | Custom API endpoint in settings. |
| Cline (VS Code) | Coding | Set Venice base URL + API key. |
| ROO Code (VS Code) | Coding | Set Venice base URL + API key. |
| VOID IDE | Coding | Set Venice base URL + API key. |
| Brave Leo | Browser | Venice as Leo AI backend. |
| Aider | Coding | AI pair programming in terminal. |
| Open WebUI | Assistant | Self-hosted chat UI with Venice. |
| LibreChat | Assistant | Multi-provider chat with Venice. |
git clone https://github.com/ai16z/eliza.git
+cp .env.example .env
+# Edit .env: set VENICE_API_KEY, SMALL_VENICE_MODEL, MEDIUM_VENICE_MODEL, LARGE_VENICE_MODEL
+# Create character in /characters/your_char.character.json with modelProvider: "venice"
+pnpm i && pnpm build && pnpm start
+pnpm start --characters="characters/your_char.character.json"
+
+
+Use Claude Code CLI with Venice for pay-per-token access to Claude Opus/Sonnet models.
+ +The claude-code-router is an open-source local proxy that intercepts Claude Code requests and redirects them to Venice.
+ +# Install Claude Code + Router
+npm install -g @anthropic-ai/claude-code
+npm install -g @musistudio/claude-code-router
+
+# Create config
+mkdir -p ~/.claude-code-router
+
+Create ~/.claude-code-router/config.json:
{
+ "APIKEY": "",
+ "LOG": true,
+ "LOG_LEVEL": "info",
+ "API_TIMEOUT_MS": 600000,
+ "HOST": "127.0.0.1",
+ "Providers": [{
+ "name": "venice",
+ "api_base_url": "https://api.venice.ai/api/v1/chat/completions",
+ "api_key": "your-venice-api-key-here",
+ "models": ["claude-opus-45", "claude-sonnet-45", "claude-opus-4-6", "claude-sonnet-4-6"],
+ "transformer": {"use": ["anthropic"]}
+ }],
+ "Router": {
+ "default": "venice,claude-opus-45",
+ "think": "venice,claude-opus-45",
+ "background": "venice,claude-opus-45",
+ "longContext": "venice,claude-opus-45",
+ "longContextThreshold": 100000
+ }
+}
+
+# Launch
+ccr start
+ccr code
+# Or: eval "$(ccr activate)" && claude
+
+| Model | Venice ID | Best For |
|---|---|---|
| Claude Opus 4.5 | claude-opus-45 | Complex reasoning, large refactors |
| Claude Sonnet 4.5 | claude-sonnet-45 | Fast iteration, everyday coding |
| Claude Opus 4.6 | claude-opus-4-6 | Complex reasoning, large refactors |
| Claude Sonnet 4.6 | claude-sonnet-4-6 | Fast iteration, everyday coding |
/model venice,claude-sonnet-45 inside Claude Codeccr ui for browser-based editordefault, think, background, longContextOfficial MCP (Model Context Protocol) server for Claude Code, Cline, and AI agents: veniceai/venice-mcp-server
+ +Allows AI agents to interact with Venice API endpoints as MCP tools.
+ + +x-ratelimit-remaining-* headers. Implement exponential backoff.x-venice-balance-usd / x-venice-balance-diem to avoid service interruptions.CF-RAY header values for troubleshooting with Venice support.x-venice-model-deprecation-warning headers proactively.include_venice_system_prompt: false).prompt_cache_key for conversations.strict: true and additionalProperties: false.venice-uncensored (Private, zero data retention)zai-org-glm-4.7-flash ($0.13 input) or qwen3-5-9b ($0.05 input)claude-opus-4-6 or openai-gpt-52grok-4-20-beta (2M tokens) or gemini-3-1-pro-preview (1M tokens)e2ee-* models for end-to-end encryptionqwen3-vl-235b-a22bzai-org-glm-4.7, any Claude, GPT, or Grok model| Endpoint | Method | Purpose |
|---|---|---|
/chat/completions | POST | Text generation, vision, tools, streaming |
/image/generate | POST | Text-to-image generation |
/image/edit | POST | AI image editing / inpainting |
/image/multi-edit | POST | Multi-image layered editing |
/image/upscale | POST | Image upscaling (2x or 4x) |
/image/background-remove | POST | Remove image backgrounds |
/audio/speech | POST | Text-to-speech (50+ voices) |
/audio/transcriptions | POST | Speech-to-text |
/audio/queue | POST | Music generation (async) |
/audio/retrieve | GET | Retrieve generated audio |
/audio/quote | POST | Get audio generation price quote |
/video/queue | POST | Video generation (async) |
/video/retrieve | GET | Retrieve generated video |
/video/quote | POST | Get video generation price quote |
/embeddings | POST | Generate vector embeddings |
/models | GET | List all available models |
/characters | GET | List AI character personas |
/api_keys | GET/POST | Manage API keys |
/api_keys/rate_limits | GET | Check rate limit status |
/tee/attestation | GET | Verify TEE attestation |
/tee/signature | GET | Verify TEE response signature |
+ Generated from docs.venice.ai on April 3, 2026.
+ Venice API version: 20260403 · This page is designed to be consumed by AI agents.
+