feat: support structured outputs (response_format) in chat completions#43
feat: support structured outputs (response_format) in chat completions#43giwaov wants to merge 2 commits intoOpenGradient:mainfrom
Conversation
Wire the OpenAI-compatible response_format parameter through the chat completion pipeline: - Bind response_format to LangChain model via model.bind() for json_object and json_schema types (text is a no-op) - Apply to both streaming and non-streaming code paths - Include response_format in the canonical request dict so TEE hashing covers the requested output format - Add 14 unit tests covering parsing, hash-dict serialization, model binding, and interaction with tool calling Closes OpenGradient#14
There was a problem hiding this comment.
Pull request overview
This PR adds OpenAI-compatible structured outputs support to the chat completions pipeline by threading the response_format request parameter through to LangChain model invocation and including it in the TEE request hash.
Changes:
- Bind
response_formatonto the LangChain chat model (non-streaming + streaming) viamodel.bind(response_format=...)for non-textformats. - Include
response_formatin the canonical_chat_request_to_dict(...)used for deterministic TEE hashing/signing. - Add a new unit test module covering parsing, hashing inclusion/determinism, and non-streaming model binding behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
tee_gateway/controllers/chat_controller.py |
Wires response_format into both non-streaming and streaming invocation flows and into the canonical request hash dict. |
tests/test_structured_outputs.py |
Adds unit tests for request parsing, hash dict inclusion, and non-streaming bind behavior (including tool binding interaction). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Bind response_format if provided (json_object or json_schema) | ||
| if chat_request.response_format: | ||
| rf = chat_request.response_format | ||
| rf_type = ( | ||
| rf.get("type", "text") | ||
| if isinstance(rf, dict) | ||
| else getattr(rf, "type", "text") | ||
| ) | ||
| if rf_type != "text": | ||
| rf_dict = rf if isinstance(rf, dict) else {"type": rf_type} | ||
| model = model.bind(response_format=rf_dict) | ||
|
|
There was a problem hiding this comment.
The response_format binding logic is duplicated in both the non-streaming and streaming paths. Consider extracting it into a small helper (e.g., that takes model and response_format) so behavior stays consistent and future changes don't accidentally diverge between the two codepaths.
| else getattr(rf, "type", "text") | ||
| ) | ||
| if rf_type != "text": | ||
| rf_dict = rf if isinstance(rf, dict) else {"type": rf_type} |
There was a problem hiding this comment.
This block attempts to support non-dict response_format values (via getattr(rf, 'type', ...)), but then constructs rf_dict as only {type: ...} and would drop any attached json_schema payload. Either enforce that response_format must be a dict (and reject/normalize earlier) or fully serialize object forms so json_schema details are preserved.
| rf_dict = rf if isinstance(rf, dict) else {"type": rf_type} | |
| if isinstance(rf, dict): | |
| rf_dict = rf | |
| else: | |
| # Attempt to fully serialize object forms so json_schema and other | |
| # fields are preserved instead of being reduced to {"type": ...}. | |
| rf_dict = None | |
| # Prefer Pydantic-style serialization if available. | |
| model_dump = getattr(rf, "model_dump", None) | |
| if callable(model_dump): | |
| try: | |
| dumped = model_dump() | |
| if isinstance(dumped, dict): | |
| rf_dict = dumped | |
| except Exception: | |
| rf_dict = None | |
| # Fallback: use public attributes from __dict__ if present. | |
| if rf_dict is None and hasattr(rf, "__dict__"): | |
| try: | |
| rf_dict = { | |
| k: v | |
| for k, v in vars(rf).items() | |
| if not k.startswith("_") | |
| } | |
| except Exception: | |
| rf_dict = None | |
| # Final fallback: at least preserve the type. | |
| if rf_dict is None: | |
| rf_dict = {"type": rf_type} |
| # Bind response_format if provided (json_object or json_schema) | ||
| if chat_request.response_format: | ||
| rf = chat_request.response_format | ||
| rf_type = ( | ||
| rf.get("type", "text") | ||
| if isinstance(rf, dict) | ||
| else getattr(rf, "type", "text") | ||
| ) | ||
| if rf_type != "text": |
There was a problem hiding this comment.
The comment says this only binds for json_object or json_schema, but the condition actually binds for any non-text type. Either tighten the check to the supported types or update the comment to reflect the actual behavior (non-text pass-through).
| # Bind response_format if provided (json_object or json_schema) | ||
| if chat_request.response_format: | ||
| rf = chat_request.response_format | ||
| rf_type = ( | ||
| rf.get("type", "text") | ||
| if isinstance(rf, dict) | ||
| else getattr(rf, "type", "text") | ||
| ) | ||
| if rf_type != "text": | ||
| rf_dict = rf if isinstance(rf, dict) else {"type": rf_type} | ||
| model = model.bind(response_format=rf_dict) | ||
|
|
There was a problem hiding this comment.
New response_format support is added to the streaming pipeline here, but the added tests only exercise the non-streaming path. Please add a unit test that verifies model.bind(response_format=...) is applied before model.stream(...) (and that it composes correctly with bind_tools(...)).
| if chat_request.response_format: | ||
| d["response_format"] = chat_request.response_format |
There was a problem hiding this comment.
response_format is inserted into the canonical hash dict without normalization. If response_format is ever provided as a non-dict object (which the binding logic above appears to support), json.dumps(..., sort_keys=True) will fail here and the request will 500. Consider normalizing response_format into a JSON-serializable dict in one place (and using the same normalization for both hashing and model binding).
Summary
Implements OpenAI-compatible structured outputs support by wiring the
response_formatparameter through the chat completion pipeline, as requested in #14.Changes
tee_gateway/controllers/chat_controller.py_create_non_streaming_response): After tool binding, checksresponse_format. If the type isjson_objectorjson_schema, binds it to the LangChain model viamodel.bind(response_format=...). Thetexttype is a no-op (default behavior)._create_streaming_response): Identical logic applied after tool binding._chat_request_to_dict): Includesresponse_formatin the canonical serialized dict so the TEE signature covers the requested output format.tests/test_structured_outputs.py14 unit tests covering:
response_formatfrom request dicts (text, json_object, json_schema, and absent)bind_toolsandbind(response_format=...)chain correctly)Design Decisions
llm_backend.py: Theresponse_formatis bound per-request viamodel.bind()after retrieving the cached model, following the same pattern already used for tool binding. This keeps the LRU cache clean (keyed only on model/temperature/max_tokens).response_formatdict is forwarded as-is to LangChain, which handles provider-specific translation. This maintains OpenAI API compatibility and works with all supported providers (OpenAI, Anthropic, Google, xAI).Supported Formats
Per the OpenAPI spec already defined in the repo:
{type: text}plain text (default, no-op){type: json_object}JSON mode{type: json_schema, json_schema: {name: ..., schema: {...}, strict: true}}strict schema-constrained outputCloses #14