LCORE-1262: Use context data class in responses by asimurka · Pull Request #1612 · lightspeed-core/lightspeed-stack

asimurka · 2026-04-28T11:51:05Z

Description

Introduces ResponsesContext data class that carries context attributes across function calls. Moves ResponsesApiParams model into a separate file and moves parameter echoing from request.

TODO: simplify _queue_responses_splunk_event to use contexts. Will be done in a separate PR.

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: Cursor

Related Tickets & Documents

Related Issue # LCORE-1262
Closes #

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

Refactor
- Centralized request context for responses for more consistent moderation, filtering, streaming, and telemetry across request flows.
New Features
- Introduced a typed request-parameter model to improve request validation and consistent parameter echoing.
Tests
- Unit tests updated to exercise the new context and parameter model, improving coverage and reliability.

coderabbitai · 2026-04-28T11:51:19Z

Warning

Rate limit exceeded

@asimurka has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 49 minutes and 20 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8d06f709-27ce-4c33-9c5c-97fe618b72f5

📥 Commits

Reviewing files that changed from the base of the PR and between 3d08335 and c9d74ee.

📒 Files selected for processing (3)

src/app/endpoints/responses.py
src/models/common/responses/responses_api_params.py
tests/unit/app/endpoints/test_responses.py

Walkthrough

Migration of ResponsesApiParams to a dedicated models module and addition of ResponsesContext. The /responses handlers were refactored to accept ResponsesApiParams and ResponsesContext, and multiple import paths and tests were updated accordingly.

Changes

Cohort / File(s)	Summary
Model Additions `src/models/common/responses/responses_api_params.py`, `src/models/common/responses/responses_context.py`	Added `ResponsesApiParams` Pydantic model (with custom `model_dump()` and `echoed_params()`) and `ResponsesContext` Pydantic model to hold request-scoped state (client, auth, moderation, RAG, background tasks, timing, flags).
Endpoint Refactor `src/app/endpoints/responses.py`	Refactored handlers to construct and accept `ResponsesApiParams` and `ResponsesContext`; centralized moderation, inline-RAG, streaming/non-streaming flow control, MCP chunk filtering, telemetry, and persistence to read from `context` rather than many discrete parameters.
Import Path Updates `src/app/endpoints/query.py`, `src/app/endpoints/streaming_query.py`, `src/utils/responses.py`	Updated imports to reference `ResponsesApiParams` from `models.common.responses.responses_api_params` instead of `utils.types`.
Legacy Removal `src/utils/types.py`, `src/models/requests.py`	Removed the legacy `ResponsesApiParams` definition and its `model_dump()` behavior from `utils/types.py`. Removed `ResponsesRequest.echoed_params()` from `src/models/requests.py`.
Tests — API params import changes `tests/unit/app/endpoints/test_query.py`, `tests/unit/app/endpoints/test_streaming_query.py`, `tests/unit/utils/test_types.py`	Updated tests to import `ResponsesApiParams` from new model location; no test logic changes beyond import wiring.
Tests — Handler and telemetry updates `tests/unit/app/endpoints/test_responses.py`, `tests/unit/app/endpoints/test_responses_splunk.py`	Refactored tests to use `build_api_params_and_context()` helper and pass `api_params` + `context`; updated moderation mocks to typed results; adjusted assertions to inspect `api_params` and updated MCP chunk/event tests.

Sequence Diagram(s)

sequenceDiagram
  participant Client as Client
  participant Endpoint as ResponsesEndpoint
  participant Moderation as ShieldModeration
  participant RAG as InlineRAG
  participant Llama as AsyncLlamaStackClient
  participant Telemetry as Telemetry/DB

  Client->>Endpoint: POST /responses (request)
  Endpoint->>Endpoint: build ResponsesApiParams, ResponsesContext
  Endpoint->>Moderation: async check (context.moderation_result)
  Moderation-->>Endpoint: moderation_result
  Endpoint->>RAG: fetch inline RAG docs (context.inline_rag_context)
  RAG-->>Endpoint: RAG docs
  Endpoint->>Llama: invoke model stream or non-stream with api_params
  Llama-->>Endpoint: chunks / final response
  Endpoint->>Telemetry: persist telemetry & splunk (uses context)
  Endpoint-->>Client: stream events / final response

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main refactoring change: introducing a context data class in the responses endpoint to centralize shared request-scoped data.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

✨ Simplify code

Create PR with simplified code

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

asimurka · 2026-04-28T11:53:28Z

+    available_quotas = get_available_quotas(
+        quota_limiters=configuration.quota_limiters, user_id=context.auth[0]
+    )
+    moderation_result = cast(ShieldModerationBlocked, context.moderation_result)


Only entered when blocked by moderation - type checker unable to infer this.

asimurka · 2026-04-28T11:54:01Z

        available_quotas={},
        output_text="",
-        **echoed_params,
+        **api_params.echoed_params(configuration.rag_id_mapping),


Pass rag mapping to reduce dependencies between modules.

asimurka · 2026-04-28T11:54:53Z

        # These are handled internally by LCS and should not be forwarded
        # to clients that don't understand the mcp_call item type.
        if _should_filter_mcp_chunk(
-            chunk, event_type, configured_mcp_labels, server_mcp_output_indices


event_type is an attribute of chunk

asimurka · 2026-04-28T11:55:44Z

            topic_summary=topic_summary,
        )
-    if not shield_blocked:
+    if context.moderation_result.decision == "passed":


Used positive condition

asimurka · 2026-04-28T11:56:45Z

-                else 0
-            ),
+            input_tokens=turn_summary.token_usage.input_tokens,
+            output_tokens=turn_summary.token_usage.output_tokens,


TurnSummary already has default factory for token_usage, which already has 0 defaults for tokens

asimurka · 2026-04-28T11:57:42Z

            raise ValueError("You cannot provide context by moderation response.")
        return value

-    def echoed_params(self) -> dict[str, Any]:


Replaced under ResponsesApiParams instead of request

asimurka · 2026-04-28T11:57:55Z

 type ResponseInput = str | list[ResponseItem]


-class ResponsesApiParams(BaseModel):


Replaced to a separate file

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/app/endpoints/responses.py (1)

344-355: ⚠️ Potential issue | 🔴 Critical

Don't construct ResponsesApiParams from updated_request.model_dump().

That dump has already removed nested MCP authorization, so api_params.tools never carries the credentials that ResponsesApiParams.model_dump() is trying to preserve. Any MCP request that depends on auth will reach responses.create() without it.

Suggested fix

-    api_params = ResponsesApiParams.model_validate(updated_request.model_dump())
+    api_params = ResponsesApiParams.model_validate(
+        {
+            **updated_request.model_dump(exclude={"tools"}),
+            "tools": updated_request.tools,
+        }
+    )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/responses.py` around lines 344 - 355, The code currently
builds ResponsesApiParams via
ResponsesApiParams.model_validate(updated_request.model_dump()), which loses
nested MCP authorization (so api_params.tools lacks credentials); instead
construct/validate ResponsesApiParams from the original Pydantic model (e.g.
pass updated_request itself or a deep model_copy) or explicitly merge the nested
authorization into the dict before validation so api_params.tools retains
credentials; update the creation site (the ResponsesApiParams instantiation) to
use updated_request (or reconstructed data including
updated_request.authorization) rather than updated_request.model_dump().

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/app/endpoints/responses.py`:
- Around line 820-825: In the branch that appends streamed turns (the if
checking api_params.store and api_params.previous_response_id and
latest_response_object), replace usage of the flattened context.input_text with
the original structured input api_params.input when calling
append_turn_items_to_conversation so the stored conversation matches the
non-streaming path; update the call site where
append_turn_items_to_conversation(context.client, api_params.conversation,
context.input_text, latest_response_object.output, ...) to pass api_params.input
instead and keep the rest of the parameters unchanged.

In `@src/models/common/responses/responses_api_params.py`:
- Around line 29-49: The constant _ECHOED_FIELDS should be annotated with
Final[set[str]]; import Final from typing and change the declaration to use a
Final type hint (e.g., add "from typing import Final" and update
"_ECHOED_FIELDS: Final[set[str]] = { ... }"), keeping the same string elements
so tooling recognizes it as a constant.

In `@tests/unit/app/endpoints/test_responses.py`:
- Around line 73-85: The test helper is building parameter/context objects from
serialized data and bypassing validation (using updated_request.model_dump() and
ResponsesContext.model_construct), which hides API contract regressions; change
the construction to use the live models and run validation: create
ResponsesApiParams from the updated_request object itself (not model_dump) —
e.g. pass updated_request into ResponsesApiParams.model_validate or the normal
constructor — and replace ResponsesContext.model_construct with
ResponsesContext.model_validate or the validated constructor so the
ResponsesContext (and nested authorization/mcp fields) are validated by the
model.

---

Outside diff comments:
In `@src/app/endpoints/responses.py`:
- Around line 344-355: The code currently builds ResponsesApiParams via
ResponsesApiParams.model_validate(updated_request.model_dump()), which loses
nested MCP authorization (so api_params.tools lacks credentials); instead
construct/validate ResponsesApiParams from the original Pydantic model (e.g.
pass updated_request itself or a deep model_copy) or explicitly merge the nested
authorization into the dict before validation so api_params.tools retains
credentials; update the creation site (the ResponsesApiParams instantiation) to
use updated_request (or reconstructed data including
updated_request.authorization) rather than updated_request.model_dump().

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 35cf3482-3033-42fb-ab19-f917eda7f747

📥 Commits

Reviewing files that changed from the base of the PR and between 6ea1e91 and b923543.

📒 Files selected for processing (13)

src/app/endpoints/query.py
src/app/endpoints/responses.py
src/app/endpoints/streaming_query.py
src/models/common/responses/responses_api_params.py
src/models/common/responses/responses_context.py
src/models/requests.py
src/utils/responses.py
src/utils/types.py
tests/unit/app/endpoints/test_query.py
tests/unit/app/endpoints/test_responses.py
tests/unit/app/endpoints/test_responses_splunk.py
tests/unit/app/endpoints/test_streaming_query.py
tests/unit/utils/test_types.py

💤 Files with no reviewable changes (2)

src/models/requests.py
src/utils/types.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)

GitHub Check: unit_tests (3.13)
GitHub Check: unit_tests (3.12)
GitHub Check: build-pr
GitHub Check: Pylinter
GitHub Check: integration_tests (3.12)
GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
GitHub Check: E2E: server mode / ci / group 2
GitHub Check: E2E: library mode / ci / group 1
GitHub Check: E2E: server mode / ci / group 1
GitHub Check: E2E: library mode / ci / group 2
GitHub Check: E2E: library mode / ci / group 3
GitHub Check: E2E: server mode / ci / group 3
GitHub Check: E2E Tests for Lightspeed Evaluation job

🧰 Additional context used

📓 Path-based instructions (4)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Import FastAPI dependencies with: from fastapi import APIRouter, HTTPException, Request, status, Depends
Import Llama Stack client with: from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for parameters and return types in functions
Use typing_extensions.Self for model validators in Pydantic models
Use modern union type syntax str | int instead of Union[str, int]
Use Optional[Type] for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use standard log levels with clear purposes: debug, info, warning, error
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use ABC for abstract base classes with @abstractmethod decorators
Use @model_validator and @field_validator for Pydantic model validation
Complete type annotations for all class attributes; use specific types, not Any
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections

Files:

tests/unit/app/endpoints/test_query.py
src/app/endpoints/query.py
src/app/endpoints/streaming_query.py
src/utils/responses.py
tests/unit/app/endpoints/test_streaming_query.py
tests/unit/utils/test_types.py
src/models/common/responses/responses_context.py
src/models/common/responses/responses_api_params.py
tests/unit/app/endpoints/test_responses_splunk.py
tests/unit/app/endpoints/test_responses.py
src/app/endpoints/responses.py

tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest; pytest is the standard for this project
Use pytest-mock for AsyncMock objects in tests
Use marker pytest.mark.asyncio for async tests
Unit tests require 60% coverage, integration tests 10%

Files:

tests/unit/app/endpoints/test_query.py
tests/unit/app/endpoints/test_streaming_query.py
tests/unit/utils/test_types.py
tests/unit/app/endpoints/test_responses_splunk.py
tests/unit/app/endpoints/test_responses.py

src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use FastAPI HTTPException with appropriate status codes for API endpoints

Files:

src/app/endpoints/query.py
src/app/endpoints/streaming_query.py
src/app/endpoints/responses.py

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models extend ConfigurationBase for config, BaseModel for data models

Files:

src/app/endpoints/query.py
src/app/endpoints/streaming_query.py
src/utils/responses.py
src/models/common/responses/responses_context.py
src/models/common/responses/responses_api_params.py
src/app/endpoints/responses.py

🧠 Learnings (11)

📚 Learning: 2026-02-26T13:56:51.366Z

Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1229
File: src/app/endpoints/query.py:296-305
Timestamp: 2026-02-26T13:56:51.366Z
Learning: In src/app/endpoints/query.py, responses_params.input is guaranteed to be a str (not structured ResponseInput) because it's constructed via prepare_input(query_request) which always returns str. Therefore, cast(str, responses_params.input) is safe in this endpoint's code path.

Applied to files:

tests/unit/app/endpoints/test_query.py
src/app/endpoints/query.py
src/app/endpoints/streaming_query.py

📚 Learning: 2026-04-19T15:40:25.624Z

Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to **/*.py : Import Llama Stack client with: `from llama_stack_client import AsyncLlamaStackClient`

Applied to files:

src/app/endpoints/query.py
src/app/endpoints/responses.py

📚 Learning: 2026-01-14T09:37:51.612Z

Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 988
File: src/app/endpoints/query.py:319-339
Timestamp: 2026-01-14T09:37:51.612Z
Learning: In the lightspeed-stack repository, when provider_id == "azure", the Azure provider with provider_type "remote::azure" is guaranteed to be present in the providers list. Therefore, avoid defensive StopIteration handling for next() when locating the Azure provider in providers within src/app/endpoints/query.py. This change applies specifically to this file (or nearby provider lookup code) and relies on the invariant that the Azure provider exists; if the invariant could be violated, keep the existing StopIteration handling.

Applied to files:

src/app/endpoints/query.py

📚 Learning: 2026-04-06T20:18:07.852Z

Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

src/app/endpoints/query.py
src/app/endpoints/streaming_query.py
src/app/endpoints/responses.py

📚 Learning: 2026-02-23T14:56:59.186Z

Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-192
Timestamp: 2026-02-23T14:56:59.186Z
Learning: In the lightspeed-stack codebase (lightspeed-core/lightspeed-stack), do not enforce de-duplication of duplicate client.models.list() calls in model selection flows (e.g., in src/utils/responses.py prepare_responses_params). These calls are considered relatively cheap and removing duplicates could add unnecessary complexity to the flow. Apply this guideline specifically to this file/context unless similar performance characteristics and design decisions are documented elsewhere.

Applied to files:

src/utils/responses.py

📚 Learning: 2026-01-12T10:58:40.230Z

Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

src/models/common/responses/responses_context.py
src/models/common/responses/responses_api_params.py

📚 Learning: 2026-02-25T07:46:33.545Z

Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

src/models/common/responses/responses_context.py
src/models/common/responses/responses_api_params.py

📚 Learning: 2026-04-20T15:09:48.726Z

Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1548
File: src/app/endpoints/rlsapi_v1.py:56-56
Timestamp: 2026-04-20T15:09:48.726Z
Learning: In `src/app/endpoints/rlsapi_v1.py`, the `_get_rh_identity_context = get_rh_identity_context` alias is a deliberate, temporary backward-compatibility shim introduced in PR `#1548` (part 1/3 of Splunk HEC telemetry work). It is planned for removal in part 3 once the responses endpoint is fully wired up and no tests/consumers reference the underscore-prefixed name. Do not flag this alias as unnecessary or dead code until part 3 is merged.

Applied to files:

tests/unit/app/endpoints/test_responses_splunk.py

📚 Learning: 2026-04-16T19:08:38.217Z

Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1524
File: src/app/endpoints/responses.py:523-529
Timestamp: 2026-04-16T19:08:38.217Z
Learning: In lightspeed-stack (`src/app/endpoints/responses.py`), the predicate `server_label in configured_mcp_labels` is the established, intentional pattern for identifying server-deployed MCP tools across `_sanitize_response_dict`, `_is_server_mcp_output_item`, and `_should_filter_mcp_chunk`. Client-supplied tools cannot collide with configured server labels because `server_label` is a server-side field set by lightspeed-stack during tool injection; clients send `function` tools or MCP tools pointing at their own servers with different labels. Do not flag this predicate as a false-positive collision risk in code review.

Applied to files:

tests/unit/app/endpoints/test_responses.py
src/app/endpoints/responses.py

📚 Learning: 2026-04-19T15:40:25.624Z

Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to **/*.py : Handle `APIConnectionError` from Llama Stack in error handling

Applied to files:

src/app/endpoints/responses.py

📚 Learning: 2026-04-07T15:03:11.530Z

Learnt from: jrobertboos
Repo: lightspeed-core/lightspeed-stack PR: 1396
File: src/app/endpoints/conversations_v1.py:6-6
Timestamp: 2026-04-07T15:03:11.530Z
Learning: In the `llama_stack_api` package, all imports MUST use the flat form `from llama_stack_api import <symbol>`. Sub-module imports (e.g., `from llama_stack_api.common.errors import ConversationNotFoundError`) are explicitly NOT supported and considered a code smell, as stated in `llama_stack_api/__init__.py` lines 15-19. Do not flag or suggest changing root-package imports to sub-module imports for this package.

Applied to files:

src/app/endpoints/responses.py

🔇 Additional comments (7)

src/app/endpoints/streaming_query.py (1)

62-62: Import wiring looks correct.

ResponsesApiParams now comes from the centralized models module, and the endpoint logic stays unchanged.

Also applies to: 119-119

tests/unit/app/endpoints/test_query.py (1)

15-15: Test import updated cleanly.

This keeps the spec aligned with the centralized ResponsesApiParams model used by the endpoint code.

tests/unit/utils/test_types.py (1)

13-19: Good regression coverage for ResponsesApiParams.model_dump().

The import now matches the centralized model location, and the new tests cover the auth reinjection behavior plus mixed-tool cases well.

Also applies to: 150-274

tests/unit/app/endpoints/test_streaming_query.py (1)

67-67: Import update is consistent with the refactor.

The test now references the same centralized ResponsesApiParams model as src/app/endpoints/streaming_query.py.

src/utils/responses.py (1)

94-94: Import relocation is correct.

This keeps prepare_responses_params and related helpers aligned with the new centralized ResponsesApiParams model.

src/app/endpoints/query.py (1)

26-26: Import path is aligned with the centralized model.

query.py now uses the new ResponsesApiParams source without changing handler behavior.

tests/unit/app/endpoints/test_responses_splunk.py (1)

26-27: Centralized helper wiring is consistent across the Splunk tests.

Using build_api_params_and_context(...) here matches the new request/context flow and removes a lot of duplicated inline setup.

Also applies to: 260-275, 341-356, 424-439, 485-500, 567-582, 652-667, 728-743

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/app/endpoints/responses.py`:
- Around line 703-704: The filtering logic in the turn summary population is
inverted: currently it skips non-server outputs; change the condition so that
when context.filter_server_tools is true it skips server-deployed outputs
instead. Locate the check using context.filter_server_tools and
is_server_deployed_output(item) (the line with "if context.filter_server_tools
and not is_server_deployed_output(item): continue") and invert it to "if
context.filter_server_tools and is_server_deployed_output(item): continue" so
server-tool outputs are filtered out as intended.

In `@src/models/common/responses/responses_api_params.py`:
- Around line 6-23: The imports currently use the submodule form "from
llama_stack_api.openai_responses import ..." which is unsupported; update them
to the flat root form by importing all symbols from llama_stack_api (e.g.,
OpenAIResponseInputTool, OpenAIResponseInputToolChoice, OpenAIResponsePrompt,
OpenAIResponseReasoning, OpenAIResponseText, OpenAIResponseToolMCP) and keep the
existing local aliases (InputTool, ToolChoice, Prompt, Reasoning, Text,
OutputToolMCP) so callers of those symbols (functions/classes that reference
InputTool, ToolChoice, Prompt, Reasoning, Text, OutputToolMCP) continue to work.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 177705cd-3314-4961-8071-4ba53188b2c7

📥 Commits

Reviewing files that changed from the base of the PR and between b923543 and 3d08335.

📒 Files selected for processing (2)

src/app/endpoints/responses.py
src/models/common/responses/responses_api_params.py

📜 Review details

🧰 Additional context used

📓 Path-based instructions (3)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Import FastAPI dependencies with: from fastapi import APIRouter, HTTPException, Request, status, Depends
Import Llama Stack client with: from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for parameters and return types in functions
Use typing_extensions.Self for model validators in Pydantic models
Use modern union type syntax str | int instead of Union[str, int]
Use Optional[Type] for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use standard log levels with clear purposes: debug, info, warning, error
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use ABC for abstract base classes with @abstractmethod decorators
Use @model_validator and @field_validator for Pydantic model validation
Complete type annotations for all class attributes; use specific types, not Any
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections

Files:

src/models/common/responses/responses_api_params.py
src/app/endpoints/responses.py

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models extend ConfigurationBase for config, BaseModel for data models

Files:

src/models/common/responses/responses_api_params.py
src/app/endpoints/responses.py

src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use FastAPI HTTPException with appropriate status codes for API endpoints

Files:

src/app/endpoints/responses.py

🧠 Learnings (10)

📚 Learning: 2026-02-25T07:46:33.545Z

Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

src/models/common/responses/responses_api_params.py

📚 Learning: 2026-04-19T15:40:25.624Z

Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to **/*.py : Use Final[type] as type hint for all constants

Applied to files:

src/models/common/responses/responses_api_params.py

📚 Learning: 2026-04-06T20:18:11.336Z

Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:11.336Z
Learning: In the lightspeed-stack codebase (src/app/endpoints/), `tools: Optional[list[Any]]` for MCP tool definitions is an intentional, consistent pattern used across all inference endpoints (query, responses, streaming_query, rlsapi_v1). Do not flag this as a typing issue — changing it in isolation would break consistency.

Applied to files:

src/models/common/responses/responses_api_params.py

📚 Learning: 2026-01-12T10:58:40.230Z

Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

src/models/common/responses/responses_api_params.py

📚 Learning: 2026-04-20T15:09:48.726Z

Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1548
File: src/app/endpoints/rlsapi_v1.py:56-56
Timestamp: 2026-04-20T15:09:48.726Z
Learning: In `src/app/endpoints/rlsapi_v1.py`, the `_get_rh_identity_context = get_rh_identity_context` alias is a deliberate, temporary backward-compatibility shim introduced in PR `#1548` (part 1/3 of Splunk HEC telemetry work). It is planned for removal in part 3 once the responses endpoint is fully wired up and no tests/consumers reference the underscore-prefixed name. Do not flag this alias as unnecessary or dead code until part 3 is merged.

Applied to files:

src/app/endpoints/responses.py

📚 Learning: 2026-04-19T15:40:25.624Z

Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to **/*.py : Import Llama Stack client with: `from llama_stack_client import AsyncLlamaStackClient`

Applied to files:

src/app/endpoints/responses.py

📚 Learning: 2026-04-19T15:40:25.624Z

Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to **/*.py : Handle `APIConnectionError` from Llama Stack in error handling

Applied to files:

src/app/endpoints/responses.py

📚 Learning: 2026-04-07T15:03:11.530Z

Learnt from: jrobertboos
Repo: lightspeed-core/lightspeed-stack PR: 1396
File: src/app/endpoints/conversations_v1.py:6-6
Timestamp: 2026-04-07T15:03:11.530Z
Learning: In the `llama_stack_api` package, all imports MUST use the flat form `from llama_stack_api import <symbol>`. Sub-module imports (e.g., `from llama_stack_api.common.errors import ConversationNotFoundError`) are explicitly NOT supported and considered a code smell, as stated in `llama_stack_api/__init__.py` lines 15-19. Do not flag or suggest changing root-package imports to sub-module imports for this package.

Applied to files:

src/app/endpoints/responses.py

📚 Learning: 2026-04-16T19:08:38.217Z

Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1524
File: src/app/endpoints/responses.py:523-529
Timestamp: 2026-04-16T19:08:38.217Z
Learning: In lightspeed-stack (`src/app/endpoints/responses.py`), the predicate `server_label in configured_mcp_labels` is the established, intentional pattern for identifying server-deployed MCP tools across `_sanitize_response_dict`, `_is_server_mcp_output_item`, and `_should_filter_mcp_chunk`. Client-supplied tools cannot collide with configured server labels because `server_label` is a server-side field set by lightspeed-stack during tool injection; clients send `function` tools or MCP tools pointing at their own servers with different labels. Do not flag this predicate as a false-positive collision risk in code review.

Applied to files:

src/app/endpoints/responses.py

📚 Learning: 2026-04-06T20:18:07.852Z

Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

src/app/endpoints/responses.py

🔇 Additional comments (2)

src/models/common/responses/responses_api_params.py (1)

128-175: ResponsesApiParams serialization split is clean and robust.

model_dump() restoring MCP authorization for outbound requests while echoed_params() serializing response-safe tool data is a good separation.

src/app/endpoints/responses.py (1)

825-831: Good fix: conversation append now preserves structured input.

Using api_params.input here keeps streaming/non-streaming storage aligned for non-text or mixed structured input payloads.

Also applies to: 958-963

asimurka · 2026-04-28T13:48:41Z

+        {
+            **updated_request.model_dump(exclude={"tools"}),
+            "tools": updated_request.tools,
+        }


Temporary solution that will be refined in a separate PR. This prevents authorization headers to be dropped on model dump.

Use context data class in responses

b923543

asimurka commented Apr 28, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 28, 2026

View reviewed changes

Comment thread src/app/endpoints/responses.py

Comment thread src/models/common/responses/responses_api_params.py

Comment thread tests/unit/app/endpoints/test_responses.py Outdated

asimurka marked this pull request as draft April 28, 2026 13:07

asimurka marked this pull request as ready for review April 28, 2026 13:29

asimurka force-pushed the refactor_responses_endpoint branch from 3d08335 to e3f330f Compare April 28, 2026 13:36

coderabbitai Bot reviewed Apr 28, 2026

View reviewed changes

Comment thread src/app/endpoints/responses.py

Comment thread src/models/common/responses/responses_api_params.py

Fixed validation bug

c9d74ee

asimurka force-pushed the refactor_responses_endpoint branch from e3f330f to c9d74ee Compare April 28, 2026 13:40

asimurka commented Apr 28, 2026

View reviewed changes

		type ResponseInput = str \| list[ResponseItem]


		class ResponsesApiParams(BaseModel):

Conversation

asimurka commented Apr 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Review ran into problems

Uh oh!

asimurka Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

asimurka Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

asimurka Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asimurka Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

asimurka Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

asimurka Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

asimurka Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

asimurka Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

asimurka commented Apr 28, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 28, 2026 •

edited

Loading

asimurka Apr 28, 2026 •

edited

Loading

asimurka Apr 28, 2026 •

edited

Loading