diff --git a/examples/inference/chat_completions.mdx b/examples/inference/chat_completions.mdx index 74be871..9dcfceb 100644 --- a/examples/inference/chat_completions.mdx +++ b/examples/inference/chat_completions.mdx @@ -1,105 +1,473 @@ --- title: '💬 Chat Completions' -description: 'How to integrate an LLM into your application through the chat completions endpoint.' +description: 'Integrate an LLM into your application through the `/chat/completions` API.' --- -### Chat Completions +## Quick Start -The chat completions API is compatible with OpenAI, so you can use any supported client to interact with the model. Simply pass in the model you want to use and use https://hub.oxen.ai/api as your base_url. +The Oxen.ai chat completions API is fully [OpenAI-compatible](https://platform.openai.com/docs/api-reference/chat). You can use the OpenAI SDK, `curl`, or any HTTP client that speaks the OpenAI chat format. + +**Base URL:** `https://hub.oxen.ai/api` + +**Endpoint:** `POST /chat/completions` + +Browse [all available models](https://www.oxen.ai/ai/models). ```bash cURL curl -X POST https://hub.oxen.ai/api/chat/completions \ --H "Authorization: Bearer $OXEN_API_KEY" \ --H "Content-Type: application/json" \ --d '{ - "model": "moonshotai/Kimi-K2-Thinking", - "messages": [{"role": "user", "content": "Hello, how are you?"}] -}' + -H "Authorization: Bearer $OXEN_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{ + "model": "claude-sonnet-4-6", + "messages": [ + {"role": "user", "content": "What is a great name for an ox?"} + ] + }' ``` -```python Python -import openai +```python Python (OpenAI SDK) +from openai import OpenAI import os -client = openai.OpenAI( - api_key=os.getenv("OXEN_API_KEY"), - base_url="https://hub.oxen.ai/api" +client = OpenAI( + api_key=os.environ["OXEN_API_KEY"], + base_url="https://hub.oxen.ai/api", ) response = client.chat.completions.create( - model="moonshotai/Kimi-K2-Thinking", - messages=[{"role": "user", "content": "What is a great name for an ox that also manages your AI infrastructure?"}] + model="claude-sonnet-4-6", + messages=[ + {"role": "user", "content": "What is a great name for an ox?"} + ] ) -print(response.output['content'][0]['text']) +print(response.choices[0].message.content) ``` -If you want to send an image to a model that supports vision such as GPT-4o or Claude, you can add a message with the `image_url` type. + +## Authentication + +Every request requires a Bearer token in the `Authorization` header. You can find your API key in your [account settings](https://www.oxen.ai/settings/profile). + +```bash +Authorization: Bearer $OXEN_API_KEY +``` + +API key + +## Response Format + +The API returns an OpenAI-compatible JSON response: + +```json +{ + "id": "chatcmpl-af41f027-e4d5-4c4b-ac40-625fb4ebfb1e", + "object": "chat.completion", + "created": 1774040155, + "model": "claude-sonnet-4-6", + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": "How about \"Beauregard\"?" + }, + "finish_reason": "stop" + } + ], + "usage": { + "prompt_tokens": 11, + "completion_tokens": 4, + "total_tokens": 15 + } +} +``` + +| Field | Description | +|---|---| +| `id` | Unique identifier for the completion | +| `object` | Always `"chat.completion"` | +| `created` | Unix timestamp of when the completion was created | +| `model` | The model that generated the response | +| `choices` | Array of completion choices (typically one) | +| `choices[].message.content` | The generated text | +| `choices[].finish_reason` | Why generation stopped: `"stop"` (natural end) or `"length"` (hit `max_tokens`) | +| `usage` | Token counts for the request | + +## Parameters + +| Parameter | Type | Default | Description | +|---|---|---|---| +| `model` | string | *required* | Model name, e.g. `"claude-sonnet-4-6"`, `"gpt-4o"`, `"gemini-3-flash-preview"` | +| `messages` | array | *required* | Array of message objects with `role` and `content` | +| `max_tokens` | integer | model default | Maximum number of tokens to generate | +| `temperature` | float | model default | Sampling temperature (0-2). Lower is more deterministic. | +| `stream` | boolean | `false` | Enable [streaming](#streaming) with server-sent events | + +### Messages + +Each message in the `messages` array has a `role` and `content`: + +| Role | Description | +|---|---| +| `system` | Sets the behavior and context for the model | +| `user` | The user's input | +| `assistant` | Previous model responses (for multi-turn conversations) | + +```json +{ + "messages": [ + {"role": "system", "content": "You are a helpful assistant."}, + {"role": "user", "content": "What is the capital of France?"}, + {"role": "assistant", "content": "The capital of France is Paris."}, + {"role": "user", "content": "What is its population?"} + ] +} +``` + +## Streaming + +Set `"stream": true` to receive responses as server-sent events (SSE). Each event is a `chat.completion.chunk` object with a `delta` instead of a `message`. -```bash cURL (image url) +```bash cURL curl -X POST https://hub.oxen.ai/api/chat/completions \ --H "Authorization: Bearer $OXEN_API_KEY" \ --H "Content-Type: application/json" \ --d '{ - "model": "gpt-4o", - "messages": [ - { - "role": "user", - "content": [ - { - "type": "text", - "text": "What is in this image?" - }, + -H "Authorization: Bearer $OXEN_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{ + "model": "claude-sonnet-4-6", + "messages": [ + {"role": "user", "content": "Write a haiku about data."} + ], + "stream": true + }' +``` + +```python Python (OpenAI SDK) +from openai import OpenAI +import os + +client = OpenAI( + api_key=os.environ["OXEN_API_KEY"], + base_url="https://hub.oxen.ai/api", +) + +stream = client.chat.completions.create( + model="claude-sonnet-4-6", + messages=[ + {"role": "user", "content": "Write a haiku about data."} + ], + stream=True +) + +for chunk in stream: + if chunk.choices[0].delta.content: + print(chunk.choices[0].delta.content, end="", flush=True) +print() +``` + + + +Each SSE line is prefixed with `data: ` and contains a JSON chunk: + +```json +data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1774040190,"model":"claude-sonnet-4-6","choices":[{"index":0,"delta":{"content":"hello"},"finish_reason":null}]} +``` + +The stream ends with: + +``` +data: [DONE] +``` + +## Vision + +Models that support vision (such as `gpt-4o` or `claude-sonnet-4-6`) accept images in the `messages` array. For full details and examples including base64 encoding and video understanding, see [Vision Language Models](/examples/inference/vision_language_models). + + + +```bash cURL +curl -X POST https://hub.oxen.ai/api/chat/completions \ + -H "Authorization: Bearer $OXEN_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{ + "model": "gpt-4o", + "messages": [ + { + "role": "user", + "content": [ + {"type": "text", "text": "What is in this image?"}, + {"type": "image_url", "image_url": {"url": "https://oxen.ai/assets/images/homepage/hero-ox.png"}} + ] + } + ] + }' +``` + +```python Python (OpenAI SDK) +from openai import OpenAI +import os + +client = OpenAI( + api_key=os.environ["OXEN_API_KEY"], + base_url="https://hub.oxen.ai/api", +) + +response = client.chat.completions.create( + model="gpt-4o", + messages=[ { - "type": "image_url", - "image_url": { - "url": "https://oxen.ai/assets/images/homepage/hero-ox.png" + "role": "user", + "content": [ + {"type": "text", "text": "What is in this image?"}, + {"type": "image_url", "image_url": {"url": "https://oxen.ai/assets/images/homepage/hero-ox.png"}}, + ], + } + ] +) + +print(response.choices[0].message.content) +``` + + + +## Tool use + +Tool calling (function calling) follows the same [OpenAI Chat Completions tool format](https://platform.openai.com/docs/guides/function-calling). You send a `tools` array describing each function’s JSON Schema; the model may reply with `tool_calls` instead of plain text. You execute those functions in your app, then send the results back in new `tool` messages so the model can finish the answer. + +| Concept | Description | +|---|---| +| `tools` | Array of `{ "type": "function", "function": { "name", "description", "parameters" } }` objects. `parameters` is a JSON Schema object for the arguments. | +| `tool_choice` | Optional. `"auto"` (default) lets the model decide; `"none"` disables tools; or force a specific function with `{"type": "function", "function": {"name": "..."}}`. | +| Assistant `tool_calls` | When `finish_reason` is `"tool_calls"`, `choices[0].message.tool_calls` lists each call with `id`, `function.name`, and `function.arguments` (a JSON string). | +| `tool` messages | Each result uses `role: "tool"`, `tool_call_id` matching the call’s `id`, and `content` as a string (often JSON your tool returned). | + +### Raw `curl`: first request (tools only) + +The model may respond with `tool_calls` instead of user-facing `content`: + +```bash +curl -X POST https://hub.oxen.ai/api/chat/completions \ + -H "Authorization: Bearer $OXEN_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{ + "model": "claude-sonnet-4-6", + "messages": [ + {"role": "user", "content": "What is the weather in Paris?"} + ], + "tools": [ + { + "type": "function", + "function": { + "name": "get_weather", + "description": "Get current weather for a city", + "parameters": { + "type": "object", + "properties": { + "city": {"type": "string", "description": "City name"} + }, + "required": ["city"] } } - ] + } + ] + }' +``` + +Example assistant payload (abbreviated): + +```json +{ + "choices": [ + { + "finish_reason": "tool_calls", + "index": 0, + "message": { + "content": "I'll check the current weather in Paris for you right away!", + "role": "assistant", + "tool_calls": [ + { + "function": { + "arguments": "{\"city\":\"Paris\"}", + "name": "get_weather" + }, + "id": "toolu_014F6XpjMvKbTgV7D5wBzqCn", + "index": 1, + "type": "function" + } + ] + } } - ] -}' + ], + "created": 1774809792, + "id": "chatcmpl-1ce4aeac-6c34-468a-ba6b-b96c5372a1dc", + "model": "claude-sonnet-4-6", + "object": "chat.completion", + "usage": { + "completion_tokens": 67, + "prompt_tokens": 572, + "total_tokens": 639 + } +} ``` -```bash cURL (base64 encoded image) +Run your function locally, then call the API again with the full transcript: original messages, the assistant message including `tool_calls`, and one `tool` message per call. Replace IDs and `tool_calls` with values from the first response. Repeat until `finish_reason` is `"stop"` (or `"length"`) and there are no new `tool_calls`. + +### Follow-up request: `curl` and OpenAI Python SDK + +The follow-up HTTP body matches what the OpenAI SDK builds when you append assistant and `tool` messages in a loop. + + + +```bash cURL curl -X POST https://hub.oxen.ai/api/chat/completions \ --H "Authorization: Bearer $OXEN_API_KEY" \ --H "Content-Type: application/json" \ --d '{ - "model": "claude-3-7-sonnet", - "messages": [ - { - "role": "user", - "content": [ - { - "type": "text", - "text": "What is in this image?" - }, - { - "type": "image_url", - "image_url": { - "url": "data:image/jpeg;base64,YOUR_BASE64_ENCODED_IMAGE_HERE" + -H "Authorization: Bearer $OXEN_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{ + "model": "claude-sonnet-4-6", + "messages": [ + { + "role": "user", + "content": "What is the weather in Paris?" + }, + { + "role": "assistant", + "content": null, + "tool_calls": [ + { + "id": "toolu_01ABC", + "type": "function", + "function": { + "name": "get_weather", + "arguments": "{\"city\": \"Paris\"}" + } + } + ] + }, + { + "role": "tool", + "tool_call_id": "toolu_01ABC", + "content": "{\"temperature_c\": 18, \"conditions\": \"Partly cloudy\"}" + } + ], + "tools": [ + { + "type": "function", + "function": { + "name": "get_weather", + "description": "Get current weather for a city", + "parameters": { + "type": "object", + "properties": { + "city": { + "type": "string", + "description": "City name" + } + }, + "required": ["city"] } } - ] - } - ] -}' + } + ] + }' +``` + +```python Python (OpenAI SDK) +from openai import OpenAI +import json +import os + +client = OpenAI( + api_key=os.environ["OXEN_API_KEY"], + base_url="https://hub.oxen.ai/api", +) + +tools = [ + { + "type": "function", + "function": { + "name": "get_weather", + "description": "Get current weather for a city", + "parameters": { + "type": "object", + "properties": { + "city": {"type": "string", "description": "City name"}, + }, + "required": ["city"], + }, + }, + }, +] + +messages = [{"role": "user", "content": "What is the weather in Paris?"}] + +def get_weather(city: str) -> str: + # Your real implementation would call a weather API. + return json.dumps({"temperature_c": 18, "conditions": "Partly cloudy"}) + +while True: + response = client.chat.completions.create( + model="claude-sonnet-4-6", + messages=messages, + tools=tools + ) + choice = response.choices[0] + msg = choice.message + + if not msg.tool_calls: + print(msg.content) + break + + messages.append(msg) + for call in msg.tool_calls: + name = call.function.name + args = json.loads(call.function.arguments or "{}") + if name == "get_weather": + output = get_weather(args["city"]) + else: + output = json.dumps({"error": f"unknown tool: {name}"}) + messages.append( + { + "role": "tool", + "tool_call_id": call.id, + "content": output, + } + ) ``` -## Playground Interface +## Errors -The model playground allows you to quickly test out the boundaries of any model by chatting with it in the UI. This is a great way to kick the tires of a model you [fine-tuned](/getting-started/fine-tuning) after deploying it. +The API returns errors as JSON with an `error` object and a standard HTTP status code. -Chat Interface +| Status | Meaning | +|---|---| +| `400` | Bad request (missing model, empty messages, invalid parameters) | +| `401` | Invalid or missing API key | +| `429` | Rate limit exceeded | +| `500` | Internal server error | +```json +{ + "error": { + "message": "You must specify a model to call" + } +} +``` +## Playground + +The [model playground](https://www.oxen.ai/ai/models) lets you test any model interactively before writing code. This is also a great way to test models you've [fine-tuned](/getting-started/fine-tuning) after deploying them. + +Chat Interface diff --git a/examples/inference/vision_language_models.mdx b/examples/inference/vision_language_models.mdx index 452ca06..398a141 100644 --- a/examples/inference/vision_language_models.mdx +++ b/examples/inference/vision_language_models.mdx @@ -1,6 +1,6 @@ --- title: '👁️ Vision Language Models' -description: 'How to integrate a vision language model into your application through the `/chat/completions` endpoint.' +description: 'Leverage image understanding with the `/chat/completions` endpoint.' --- ## What are VLMs? @@ -13,7 +13,7 @@ Here is the [list of supported models](https://www.oxen.ai/ai/models?modalities= ## Image Understanding -The `/chat/completions` endpoint supports vision language models for image understanding. If you want to send an image to a model that supports vision such as Qwen3-VL or gpt-4o, you can add a message with the `image_url` type. +The `/chat/completions` endpoint supports vision language models for image understanding. If you want to send an image to a model that supports vision such as Qwen3-VL, Qwen3.5, or Gemini 3 Pro/Flash, you can add a message with the `image_url` type. ### Using Image URLs @@ -24,7 +24,7 @@ curl -X POST https://hub.oxen.ai/api/chat/completions \ -H "Authorization: Bearer $OXEN_API_KEY" \ -H "Content-Type: application/json" \ -d '{ - "model": "gpt-4o", + "model": "gemini-3-1-pro-preview", "messages": [ { "role": "user", @@ -57,7 +57,7 @@ curl -X POST https://hub.oxen.ai/api/chat/completions \ -H "Authorization: Bearer $OXEN_API_KEY" \ -H "Content-Type: application/json" \ -d '{ - "model": "claude-3-7-sonnet", + "model": "claude-sonnet-4-6", "messages": [ { "role": "user", @@ -105,7 +105,7 @@ base64_image = encode_image("path/to/your/image.jpg") # Send the request with base64 encoded image response = client.chat.completions.create( - model="claude-3-7-sonnet", + model="claude-sonnet-4-6", messages=[ { "role": "user", @@ -250,7 +250,7 @@ print(response.choices[0].message.content) ## Playground Interface -Want to test out prompts without writing any code? You can use the [playground interface](https://www.oxen.ai/ai/models/gpt-4o) to chat with a model. This is a great way to kick the tires of a base model or a model you [fine-tuned](/getting-started/fine-tuning) after deploying it. +Want to test out prompts without writing any code? You can use the [playground interface](https://www.oxen.ai/ai/models/claude-sonnet-4-6) to chat with a model. This is a great way to kick the tires of a base model or a model you [fine-tuned](/getting-started/fine-tuning) after deploying it. Chat Interface @@ -262,5 +262,5 @@ Oxen.ai also allows you to fine-tune vision language models on your own data. Th Once the model has been fine-tuned, you can easily deploy the model behind an inference endpoint and start the evaluation loop over again. -Learn more about [fine-tuning VLMs](/examples/fine-tuning/image_generation). +Learn more about [fine-tuning VLMs](/examples/fine-tuning/image_understanding). diff --git a/images/auth_key.png b/images/auth_key.png index 09d4d22..cbc0126 100644 Binary files a/images/auth_key.png and b/images/auth_key.png differ