feat(inference): add chat completion message listing endpoint.#5459
feat(inference): add chat completion message listing endpoint.#5459skamenan7 wants to merge 36 commits intoogx-ai:mainfrom
Conversation
✱ Stainless preview buildsThis PR will update the Edit this comment to update it. It will appear in the SDK's changelogs. ✅ llama-stack-client-openapi studio · code · diff
✅ llama-stack-client-python studio · code · diff
✅ llama-stack-client-go studio · conflict
✅ llama-stack-client-node studio · conflict
This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push. |
|
This pull request has merge conflicts that must be resolved before it can be merged. @skamenan7 please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork |
617ff74 to
ed847f6
Compare
8c2b05d to
f051ef4
Compare
|
This pull request has merge conflicts that must be resolved before it can be merged. @skamenan7 please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork |
2e577bf to
57af7f5
Compare
c4aa828 to
eb9adcb
Compare
|
Recording workflow completed Providers: azure Recordings have been generated and will be committed automatically by the companion workflow. Fork PR: Recordings will be committed if you have "Allow edits from maintainers" enabled. |
|
✅ Recordings committed successfully Recordings from the integration tests have been committed to this PR. |
62a79e6 to
d138aa4
Compare
|
This pull request has merge conflicts that must be resolved before it can be merged. @skamenan7 please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork |
cdoern
left a comment
There was a problem hiding this comment.
small review, initially looking at this I see new conformance issues we should fix
| ### Chat | ||
|
|
||
| **Score:** 98.5% · **Issues:** 5 · **Missing:** 1 | ||
| **Score:** 98.2% · **Issues:** 7 · **Missing:** 1 |
There was a problem hiding this comment.
I kept this PR scoped to the actual schema fix, so I only changed the API side here.
There’s still one remaining Chat conformance item on GET /chat/completions/{completion_id}/messages. That looks like a schema-shape mismatch in the OpenAI conformance check, not an actual endpoint behavior issue. I didn’t make that conformance-side change here, but I can update it separately if you want.
…formance tooling. Keep the API-side schema fix in this change while leaving the remaining messages conformance false-positive to a separate OpenAI coverage follow-up. Signed-off-by: skamenan7 <skamenan@redhat.com>
|
|
||
| | Property | Issues | | ||
| |----------|--------| | ||
| | `responses.200.content.application/json.properties.data.items` | Type added: ['object'] | |
There was a problem hiding this comment.
this is a net-new issue introduced by the new route. can we fix this? if not possible let me know.
There was a problem hiding this comment.
yeah, fixed this in 2e6284b73.
The remaining item was coming from the OpenAI coverage diff on ChatCompletionMessageList.data.items rather than the route behavior itself. OpenAI wraps that item schema differently (allOf vs our bare $ref), so I normalized that in scripts/openai_coverage.py for the conformance check.
The /chat/completions/{completion_id}/messages entry is back to 0 issues now.
PTAL @cdoern the above change.
Normalize ChatCompletionMessageList items during OpenAI coverage checks so the new messages endpoint no longer reports a false-positive schema regression. Signed-off-by: skamenan7 <skamenan@redhat.com>
Signed-off-by: skamenan7 <skamenan@redhat.com>
Signed-off-by: skamenan7 <skamenan@redhat.com>
Signed-off-by: skamenan7 <skamenan@redhat.com>
Adds GET /v1/chat/completions/{completion_id}/messages, the OpenAI-compatible
endpoint for listing messages from a stored chat completion. The route reads
from the inference store, flattens input and output messages into a single
paginated list with synthetic IDs, and supports after, limit, and order
query params.
Signed-off-by: Sumanth Kamenani <skamenan@redhat.com>
Signed-off-by: skamenan7 <skamenan@redhat.com>
…s pagination. Signed-off-by: skamenan7 <skamenan@redhat.com> Made-with: Cursor Signed-off-by: skamenan7 <skamenan@redhat.com>
Co-Authored-By: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Adds GET /v1/chat/completions/{completion_id}/messages, the OpenAI-compatible
endpoint for listing messages from a stored chat completion. The route reads
from the inference store, flattens input and output messages into a single
paginated list with synthetic IDs, and supports after, limit, and order
query params.
Signed-off-by: Sumanth Kamenani <skamenan@redhat.com>
Signed-off-by: skamenan7 <skamenan@redhat.com>
…s pagination. Signed-off-by: skamenan7 <skamenan@redhat.com> Made-with: Cursor Signed-off-by: skamenan7 <skamenan@redhat.com>
Prevent the chat completion messages listing endpoint from failing when stored input messages include multipart file content. Filter unsupported parts in the listing response and extend the regression coverage for both file-only and mixed multipart inputs. Signed-off-by: skamenan7 <skamenan@redhat.com>
…formance tooling. Keep the API-side schema fix in this change while leaving the remaining messages conformance false-positive to a separate OpenAI coverage follow-up. Signed-off-by: skamenan7 <skamenan@redhat.com>
Normalize ChatCompletionMessageList items during OpenAI coverage checks so the new messages endpoint no longer reports a false-positive schema regression. Signed-off-by: skamenan7 <skamenan@redhat.com>
Signed-off-by: skamenan7 <skamenan@redhat.com>
Signed-off-by: skamenan7 <skamenan@redhat.com>
Signed-off-by: skamenan7 <skamenan@redhat.com>
Signed-off-by: skamenan7 <skamenan@redhat.com>
What does this PR do?
Adds GET /v1/chat/completions/{completion_id}/messages, the OpenAI-compatible endpoint for listing messages from a stored chat completion.
The route reads from the inference store, flattens input and output messages into a single paginated list with synthetic IDs ({completion_id}-{index}),and supports after, limit, and order query params. It follows the same pattern as GET /conversations/{id}/items.
according to https://github.com/llamastack/llama-stack/blob/main/docs/docs/api-openai/conformance.mdx#chat, the /messages route is missing.
Test Plan
Started a local server with Ollama (remote::ollama provider, llama3.2:3b),
then ran these against it:
Create a completion:
{"id": "chatcmpl-899", "model": "ollama/llama3.2:3b", "content": "Hello!"}List messages (the new route):
{ "object": "list", "data": [ {"id": "chatcmpl-899-0", "role": "system", "content": "Be brief."}, {"id": "chatcmpl-899-1", "role": "user", "content": "Say hello"}, {"id": "chatcmpl-899-2", "role": "assistant", "content": "Hello!"} ], "first_id": "chatcmpl-899-0", "last_id": "chatcmpl-899-2", "has_more": false }Pagination (limit=1, then cursor):
{"data": [{"id": "chatcmpl-899-0", "role": "system"}], "has_more": true}{"data": [{"id": "chatcmpl-899-1", "role": "user"}, {"id": "chatcmpl-899-2", "role": "assistant"}], "has_more": false}Invalid cursor returns 400:
{"error": {"message": "Failed to list chat completion messages: cursor 'bogus' not found in completion 'chatcmpl-899'."}}Multi-turn (4 input + 1 output = 5 messages):
[ {"id": "chatcmpl-623-0", "role": "system", "content": "You are a math tutor. Be brief."}, {"id": "chatcmpl-623-1", "role": "user", "content": "What is 2+2?"}, {"id": "chatcmpl-623-2", "role": "assistant", "content": "4"}, {"id": "chatcmpl-623-3", "role": "user", "content": "What is 4+4?"}, {"id": "chatcmpl-623-4", "role": "assistant", "content": "8"} ]Unit tests
Unit tests in
tests/unit/utils/inference/test_inference_store.pycovering:uv run pytest tests/unit/utils/inference/test_inference_store.py -v -k "list_chat_completion_messages"Integration test (against running server):
uv run pytest tests/integration/inference/ -v -k "messages" \ --text-model ollama/llama3.2:3b --stack-config=http://localhost:8321Passes. Tests pagination with limit=1, after cursor, role and ID assertions.