Skip to content

Commit aa6192d

Browse files
committed
server : add Anthropic Messages API support
1 parent 583cb83 commit aa6192d

File tree

8 files changed

+1712
-6
lines changed

8 files changed

+1712
-6
lines changed

tools/server/README.md

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ Set of LLM REST APIs and a simple web front end to interact with llama.cpp.
77
**Features:**
88
* LLM inference of F16 and quantized models on GPU and CPU
99
* [OpenAI API](https://github.com/openai/openai-openapi) compatible chat completions and embeddings routes
10+
* [Anthropic Messages API](https://docs.anthropic.com/en/api/messages) compatible chat completions
1011
* Reranking endpoint (https://github.com/ggml-org/llama.cpp/pull/9510)
1112
* Parallel decoding with multi-user support
1213
* Continuous batching
@@ -1343,6 +1344,77 @@ See [OpenAI Embeddings API documentation](https://platform.openai.com/docs/api-r
13431344
}'
13441345
```
13451346

1347+
### POST `/v1/messages`: Anthropic-compatible Messages API
1348+
1349+
Given a list of `messages`, returns the assistant's response. Streaming is supported via Server-Sent Events. While no strong claims of compatibility with the Anthropic API spec are made, in our experience it suffices to support many apps.
1350+
1351+
*Options:*
1352+
1353+
See [Anthropic Messages API documentation](https://docs.anthropic.com/en/api/messages). Tool use requires `--jinja` flag.
1354+
1355+
`model`: Model identifier (required)
1356+
1357+
`messages`: Array of message objects with `role` and `content` (required)
1358+
1359+
`max_tokens`: Maximum tokens to generate (default: 4096)
1360+
1361+
`system`: System prompt as string or array of content blocks
1362+
1363+
`temperature`: Sampling temperature 0-1 (default: 1.0)
1364+
1365+
`top_p`: Nucleus sampling (default: 1.0)
1366+
1367+
`top_k`: Top-k sampling
1368+
1369+
`stop_sequences`: Array of stop sequences
1370+
1371+
`stream`: Enable streaming (default: false)
1372+
1373+
`tools`: Array of tool definitions (requires `--jinja`)
1374+
1375+
`tool_choice`: Tool selection mode (`{"type": "auto"}`, `{"type": "any"}`, or `{"type": "tool", "name": "..."}`)
1376+
1377+
*Examples:*
1378+
1379+
```shell
1380+
curl http://localhost:8080/v1/messages \
1381+
-H "Content-Type: application/json" \
1382+
-H "x-api-key: your-api-key" \
1383+
-d '{
1384+
"model": "gpt-4",
1385+
"max_tokens": 1024,
1386+
"system": "You are a helpful assistant.",
1387+
"messages": [
1388+
{"role": "user", "content": "Hello!"}
1389+
]
1390+
}'
1391+
```
1392+
1393+
### POST `/v1/messages/count_tokens`: Token Counting
1394+
1395+
Counts the number of tokens in a request without generating a response.
1396+
1397+
Accepts the same parameters as `/v1/messages`. The `max_tokens` parameter is not required.
1398+
1399+
*Example:*
1400+
1401+
```shell
1402+
curl http://localhost:8080/v1/messages/count_tokens \
1403+
-H "Content-Type: application/json" \
1404+
-d '{
1405+
"model": "gpt-4",
1406+
"messages": [
1407+
{"role": "user", "content": "Hello!"}
1408+
]
1409+
}'
1410+
```
1411+
1412+
*Response:*
1413+
1414+
```json
1415+
{"input_tokens": 10}
1416+
```
1417+
13461418
## More examples
13471419

13481420
### Interactive mode

0 commit comments

Comments
 (0)