Skip to content

feat: support structured outputs (response_format) in chat completions#43

Open
giwaov wants to merge 2 commits intoOpenGradient:mainfrom
giwaov:feat/structured-outputs
Open

feat: support structured outputs (response_format) in chat completions#43
giwaov wants to merge 2 commits intoOpenGradient:mainfrom
giwaov:feat/structured-outputs

Conversation

@giwaov
Copy link
Copy Markdown
Contributor

@giwaov giwaov commented Mar 23, 2026

Summary

Implements OpenAI-compatible structured outputs support by wiring the response_format parameter through the chat completion pipeline, as requested in #14.

Changes

tee_gateway/controllers/chat_controller.py

  • Non-streaming path (_create_non_streaming_response): After tool binding, checks response_format. If the type is json_object or json_schema, binds it to the LangChain model via model.bind(response_format=...). The text type is a no-op (default behavior).
  • Streaming path (_create_streaming_response): Identical logic applied after tool binding.
  • TEE hash dict (_chat_request_to_dict): Includes response_format in the canonical serialized dict so the TEE signature covers the requested output format.

tests/test_structured_outputs.py

14 unit tests covering:

  • Parsing response_format from request dicts (text, json_object, json_schema, and absent)
  • Inclusion in the TEE hash dict (presence, absence, determinism, differentiation)
  • Model binding behavior (json_object binds, json_schema binds full schema, text does not bind, absent does not bind)
  • Interaction with tool calling (both bind_tools and bind(response_format=...) chain correctly)

Design Decisions

  • No changes to llm_backend.py: The response_format is bound per-request via model.bind() after retrieving the cached model, following the same pattern already used for tool binding. This keeps the LRU cache clean (keyed only on model/temperature/max_tokens).
  • Pass-through approach: The response_format dict is forwarded as-is to LangChain, which handles provider-specific translation. This maintains OpenAI API compatibility and works with all supported providers (OpenAI, Anthropic, Google, xAI).

Supported Formats

Per the OpenAPI spec already defined in the repo:

  • {type: text} plain text (default, no-op)
  • {type: json_object} JSON mode
  • {type: json_schema, json_schema: {name: ..., schema: {...}, strict: true}} strict schema-constrained output

Closes #14

Wire the OpenAI-compatible response_format parameter through the chat
completion pipeline:

- Bind response_format to LangChain model via model.bind() for
  json_object and json_schema types (text is a no-op)
- Apply to both streaming and non-streaming code paths
- Include response_format in the canonical request dict so TEE
  hashing covers the requested output format
- Add 14 unit tests covering parsing, hash-dict serialization,
  model binding, and interaction with tool calling

Closes OpenGradient#14
@adambalogh adambalogh requested a review from kylexqian March 23, 2026 21:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds OpenAI-compatible structured outputs support to the chat completions pipeline by threading the response_format request parameter through to LangChain model invocation and including it in the TEE request hash.

Changes:

  • Bind response_format onto the LangChain chat model (non-streaming + streaming) via model.bind(response_format=...) for non-text formats.
  • Include response_format in the canonical _chat_request_to_dict(...) used for deterministic TEE hashing/signing.
  • Add a new unit test module covering parsing, hashing inclusion/determinism, and non-streaming model binding behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
tee_gateway/controllers/chat_controller.py Wires response_format into both non-streaming and streaming invocation flows and into the canonical request hash dict.
tests/test_structured_outputs.py Adds unit tests for request parsing, hash dict inclusion, and non-streaming bind behavior (including tool binding interaction).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +85 to +96
# Bind response_format if provided (json_object or json_schema)
if chat_request.response_format:
rf = chat_request.response_format
rf_type = (
rf.get("type", "text")
if isinstance(rf, dict)
else getattr(rf, "type", "text")
)
if rf_type != "text":
rf_dict = rf if isinstance(rf, dict) else {"type": rf_type}
model = model.bind(response_format=rf_dict)

Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The response_format binding logic is duplicated in both the non-streaming and streaming paths. Consider extracting it into a small helper (e.g., that takes model and response_format) so behavior stays consistent and future changes don't accidentally diverge between the two codepaths.

Copilot uses AI. Check for mistakes.
else getattr(rf, "type", "text")
)
if rf_type != "text":
rf_dict = rf if isinstance(rf, dict) else {"type": rf_type}
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block attempts to support non-dict response_format values (via getattr(rf, 'type', ...)), but then constructs rf_dict as only {type: ...} and would drop any attached json_schema payload. Either enforce that response_format must be a dict (and reject/normalize earlier) or fully serialize object forms so json_schema details are preserved.

Suggested change
rf_dict = rf if isinstance(rf, dict) else {"type": rf_type}
if isinstance(rf, dict):
rf_dict = rf
else:
# Attempt to fully serialize object forms so json_schema and other
# fields are preserved instead of being reduced to {"type": ...}.
rf_dict = None
# Prefer Pydantic-style serialization if available.
model_dump = getattr(rf, "model_dump", None)
if callable(model_dump):
try:
dumped = model_dump()
if isinstance(dumped, dict):
rf_dict = dumped
except Exception:
rf_dict = None
# Fallback: use public attributes from __dict__ if present.
if rf_dict is None and hasattr(rf, "__dict__"):
try:
rf_dict = {
k: v
for k, v in vars(rf).items()
if not k.startswith("_")
}
except Exception:
rf_dict = None
# Final fallback: at least preserve the type.
if rf_dict is None:
rf_dict = {"type": rf_type}

Copilot uses AI. Check for mistakes.
Comment on lines +85 to +93
# Bind response_format if provided (json_object or json_schema)
if chat_request.response_format:
rf = chat_request.response_format
rf_type = (
rf.get("type", "text")
if isinstance(rf, dict)
else getattr(rf, "type", "text")
)
if rf_type != "text":
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says this only binds for json_object or json_schema, but the condition actually binds for any non-text type. Either tighten the check to the supported types or update the comment to reflect the actual behavior (non-text pass-through).

Copilot uses AI. Check for mistakes.
Comment on lines +211 to +222
# Bind response_format if provided (json_object or json_schema)
if chat_request.response_format:
rf = chat_request.response_format
rf_type = (
rf.get("type", "text")
if isinstance(rf, dict)
else getattr(rf, "type", "text")
)
if rf_type != "text":
rf_dict = rf if isinstance(rf, dict) else {"type": rf_type}
model = model.bind(response_format=rf_dict)

Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New response_format support is added to the streaming pipeline here, but the added tests only exercise the non-streaming path. Please add a unit test that verifies model.bind(response_format=...) is applied before model.stream(...) (and that it composes correctly with bind_tools(...)).

Copilot uses AI. Check for mistakes.
Comment on lines +508 to +509
if chat_request.response_format:
d["response_format"] = chat_request.response_format
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

response_format is inserted into the canonical hash dict without normalization. If response_format is ever provided as a non-dict object (which the binding logic above appears to support), json.dumps(..., sort_keys=True) will fail here and the request will 500. Consider normalizing response_format into a JSON-serializable dict in one place (and using the same normalization for both hashing and model binding).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for structured outputs

2 participants