feat: support structured outputs (response_format) in chat completions by giwaov · Pull Request #43 · OpenGradient/tee-gateway

giwaov · 2026-03-23T18:04:05Z

Summary

Implements OpenAI-compatible structured outputs support by wiring the response_format parameter through the chat completion pipeline, as requested in #14.

Changes

`tee_gateway/controllers/chat_controller.py`

Non-streaming path (_create_non_streaming_response): After tool binding, checks response_format. If the type is json_object or json_schema, binds it to the LangChain model via model.bind(response_format=...). The text type is a no-op (default behavior).
Streaming path (_create_streaming_response): Identical logic applied after tool binding.
TEE hash dict (_chat_request_to_dict): Includes response_format in the canonical serialized dict so the TEE signature covers the requested output format.

`tests/test_structured_outputs.py`

14 unit tests covering:

Parsing response_format from request dicts (text, json_object, json_schema, and absent)
Inclusion in the TEE hash dict (presence, absence, determinism, differentiation)
Model binding behavior (json_object binds, json_schema binds full schema, text does not bind, absent does not bind)
Interaction with tool calling (both bind_tools and bind(response_format=...) chain correctly)

Design Decisions

No changes to llm_backend.py: The response_format is bound per-request via model.bind() after retrieving the cached model, following the same pattern already used for tool binding. This keeps the LRU cache clean (keyed only on model/temperature/max_tokens).
Pass-through approach: The response_format dict is forwarded as-is to LangChain, which handles provider-specific translation. This maintains OpenAI API compatibility and works with all supported providers (OpenAI, Anthropic, Google, xAI).

Supported Formats

Per the OpenAPI spec already defined in the repo:

{type: text} plain text (default, no-op)
{type: json_object} JSON mode
{type: json_schema, json_schema: {name: ..., schema: {...}, strict: true}} strict schema-constrained output

Closes #14

Wire the OpenAI-compatible response_format parameter through the chat completion pipeline: - Bind response_format to LangChain model via model.bind() for json_object and json_schema types (text is a no-op) - Apply to both streaming and non-streaming code paths - Include response_format in the canonical request dict so TEE hashing covers the requested output format - Add 14 unit tests covering parsing, hash-dict serialization, model binding, and interaction with tool calling Closes OpenGradient#14

Copilot

Pull request overview

This PR adds OpenAI-compatible structured outputs support to the chat completions pipeline by threading the response_format request parameter through to LangChain model invocation and including it in the TEE request hash.

Changes:

Bind response_format onto the LangChain chat model (non-streaming + streaming) via model.bind(response_format=...) for non-text formats.
Include response_format in the canonical _chat_request_to_dict(...) used for deterministic TEE hashing/signing.
Add a new unit test module covering parsing, hashing inclusion/determinism, and non-streaming model binding behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File	Description
`tee_gateway/controllers/chat_controller.py`	Wires `response_format` into both non-streaming and streaming invocation flows and into the canonical request hash dict.
`tests/test_structured_outputs.py`	Adds unit tests for request parsing, hash dict inclusion, and non-streaming bind behavior (including tool binding interaction).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-01T00:40:02Z