Skip to content

LCORE-1262: Use context data class in responses#1612

Open
asimurka wants to merge 2 commits intolightspeed-core:mainfrom
asimurka:refactor_responses_endpoint
Open

LCORE-1262: Use context data class in responses#1612
asimurka wants to merge 2 commits intolightspeed-core:mainfrom
asimurka:refactor_responses_endpoint

Conversation

@asimurka
Copy link
Copy Markdown
Contributor

@asimurka asimurka commented Apr 28, 2026

Description

Introduces ResponsesContext data class that carries context attributes across function calls. Moves ResponsesApiParams model into a separate file and moves parameter echoing from request.

TODO: simplify _queue_responses_splunk_event to use contexts. Will be done in a separate PR.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Cursor

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • Refactor
    • Centralized request context for responses for more consistent moderation, filtering, streaming, and telemetry across request flows.
  • New Features
    • Introduced a typed request-parameter model to improve request validation and consistent parameter echoing.
  • Tests
    • Unit tests updated to exercise the new context and parameter model, improving coverage and reliability.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 28, 2026

Warning

Rate limit exceeded

@asimurka has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 49 minutes and 20 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8d06f709-27ce-4c33-9c5c-97fe618b72f5

📥 Commits

Reviewing files that changed from the base of the PR and between 3d08335 and c9d74ee.

📒 Files selected for processing (3)
  • src/app/endpoints/responses.py
  • src/models/common/responses/responses_api_params.py
  • tests/unit/app/endpoints/test_responses.py

Walkthrough

Migration of ResponsesApiParams to a dedicated models module and addition of ResponsesContext. The /responses handlers were refactored to accept ResponsesApiParams and ResponsesContext, and multiple import paths and tests were updated accordingly.

Changes

Cohort / File(s) Summary
Model Additions
src/models/common/responses/responses_api_params.py, src/models/common/responses/responses_context.py
Added ResponsesApiParams Pydantic model (with custom model_dump() and echoed_params()) and ResponsesContext Pydantic model to hold request-scoped state (client, auth, moderation, RAG, background tasks, timing, flags).
Endpoint Refactor
src/app/endpoints/responses.py
Refactored handlers to construct and accept ResponsesApiParams and ResponsesContext; centralized moderation, inline-RAG, streaming/non-streaming flow control, MCP chunk filtering, telemetry, and persistence to read from context rather than many discrete parameters.
Import Path Updates
src/app/endpoints/query.py, src/app/endpoints/streaming_query.py, src/utils/responses.py
Updated imports to reference ResponsesApiParams from models.common.responses.responses_api_params instead of utils.types.
Legacy Removal
src/utils/types.py, src/models/requests.py
Removed the legacy ResponsesApiParams definition and its model_dump() behavior from utils/types.py. Removed ResponsesRequest.echoed_params() from src/models/requests.py.
Tests — API params import changes
tests/unit/app/endpoints/test_query.py, tests/unit/app/endpoints/test_streaming_query.py, tests/unit/utils/test_types.py
Updated tests to import ResponsesApiParams from new model location; no test logic changes beyond import wiring.
Tests — Handler and telemetry updates
tests/unit/app/endpoints/test_responses.py, tests/unit/app/endpoints/test_responses_splunk.py
Refactored tests to use build_api_params_and_context() helper and pass api_params + context; updated moderation mocks to typed results; adjusted assertions to inspect api_params and updated MCP chunk/event tests.

Sequence Diagram(s)

sequenceDiagram
  participant Client as Client
  participant Endpoint as ResponsesEndpoint
  participant Moderation as ShieldModeration
  participant RAG as InlineRAG
  participant Llama as AsyncLlamaStackClient
  participant Telemetry as Telemetry/DB

  Client->>Endpoint: POST /responses (request)
  Endpoint->>Endpoint: build ResponsesApiParams, ResponsesContext
  Endpoint->>Moderation: async check (context.moderation_result)
  Moderation-->>Endpoint: moderation_result
  Endpoint->>RAG: fetch inline RAG docs (context.inline_rag_context)
  RAG-->>Endpoint: RAG docs
  Endpoint->>Llama: invoke model stream or non-stream with api_params
  Llama-->>Endpoint: chunks / final response
  Endpoint->>Telemetry: persist telemetry & splunk (uses context)
  Endpoint-->>Client: stream events / final response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main refactoring change: introducing a context data class in the responses endpoint to centralize shared request-scoped data.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

available_quotas = get_available_quotas(
quota_limiters=configuration.quota_limiters, user_id=context.auth[0]
)
moderation_result = cast(ShieldModerationBlocked, context.moderation_result)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only entered when blocked by moderation - type checker unable to infer this.

available_quotas={},
output_text="",
**echoed_params,
**api_params.echoed_params(configuration.rag_id_mapping),
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass rag mapping to reduce dependencies between modules.

# These are handled internally by LCS and should not be forwarded
# to clients that don't understand the mcp_call item type.
if _should_filter_mcp_chunk(
chunk, event_type, configured_mcp_labels, server_mcp_output_indices
Copy link
Copy Markdown
Contributor Author

@asimurka asimurka Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

event_type is an attribute of chunk

topic_summary=topic_summary,
)
if not shield_blocked:
if context.moderation_result.decision == "passed":
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used positive condition

else 0
),
input_tokens=turn_summary.token_usage.input_tokens,
output_tokens=turn_summary.token_usage.output_tokens,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TurnSummary already has default factory for token_usage, which already has 0 defaults for tokens

Comment thread src/models/requests.py
raise ValueError("You cannot provide context by moderation response.")
return value

def echoed_params(self) -> dict[str, Any]:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced under ResponsesApiParams instead of request

Comment thread src/utils/types.py
type ResponseInput = str | list[ResponseItem]


class ResponsesApiParams(BaseModel):
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced to a separate file

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/app/endpoints/responses.py (1)

344-355: ⚠️ Potential issue | 🔴 Critical

Don't construct ResponsesApiParams from updated_request.model_dump().

That dump has already removed nested MCP authorization, so api_params.tools never carries the credentials that ResponsesApiParams.model_dump() is trying to preserve. Any MCP request that depends on auth will reach responses.create() without it.

Suggested fix
-    api_params = ResponsesApiParams.model_validate(updated_request.model_dump())
+    api_params = ResponsesApiParams.model_validate(
+        {
+            **updated_request.model_dump(exclude={"tools"}),
+            "tools": updated_request.tools,
+        }
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/responses.py` around lines 344 - 355, The code currently
builds ResponsesApiParams via
ResponsesApiParams.model_validate(updated_request.model_dump()), which loses
nested MCP authorization (so api_params.tools lacks credentials); instead
construct/validate ResponsesApiParams from the original Pydantic model (e.g.
pass updated_request itself or a deep model_copy) or explicitly merge the nested
authorization into the dict before validation so api_params.tools retains
credentials; update the creation site (the ResponsesApiParams instantiation) to
use updated_request (or reconstructed data including
updated_request.authorization) rather than updated_request.model_dump().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/app/endpoints/responses.py`:
- Around line 820-825: In the branch that appends streamed turns (the if
checking api_params.store and api_params.previous_response_id and
latest_response_object), replace usage of the flattened context.input_text with
the original structured input api_params.input when calling
append_turn_items_to_conversation so the stored conversation matches the
non-streaming path; update the call site where
append_turn_items_to_conversation(context.client, api_params.conversation,
context.input_text, latest_response_object.output, ...) to pass api_params.input
instead and keep the rest of the parameters unchanged.

In `@src/models/common/responses/responses_api_params.py`:
- Around line 29-49: The constant _ECHOED_FIELDS should be annotated with
Final[set[str]]; import Final from typing and change the declaration to use a
Final type hint (e.g., add "from typing import Final" and update
"_ECHOED_FIELDS: Final[set[str]] = { ... }"), keeping the same string elements
so tooling recognizes it as a constant.

In `@tests/unit/app/endpoints/test_responses.py`:
- Around line 73-85: The test helper is building parameter/context objects from
serialized data and bypassing validation (using updated_request.model_dump() and
ResponsesContext.model_construct), which hides API contract regressions; change
the construction to use the live models and run validation: create
ResponsesApiParams from the updated_request object itself (not model_dump) —
e.g. pass updated_request into ResponsesApiParams.model_validate or the normal
constructor — and replace ResponsesContext.model_construct with
ResponsesContext.model_validate or the validated constructor so the
ResponsesContext (and nested authorization/mcp fields) are validated by the
model.

---

Outside diff comments:
In `@src/app/endpoints/responses.py`:
- Around line 344-355: The code currently builds ResponsesApiParams via
ResponsesApiParams.model_validate(updated_request.model_dump()), which loses
nested MCP authorization (so api_params.tools lacks credentials); instead
construct/validate ResponsesApiParams from the original Pydantic model (e.g.
pass updated_request itself or a deep model_copy) or explicitly merge the nested
authorization into the dict before validation so api_params.tools retains
credentials; update the creation site (the ResponsesApiParams instantiation) to
use updated_request (or reconstructed data including
updated_request.authorization) rather than updated_request.model_dump().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 35cf3482-3033-42fb-ab19-f917eda7f747

📥 Commits

Reviewing files that changed from the base of the PR and between 6ea1e91 and b923543.

📒 Files selected for processing (13)
  • src/app/endpoints/query.py
  • src/app/endpoints/responses.py
  • src/app/endpoints/streaming_query.py
  • src/models/common/responses/responses_api_params.py
  • src/models/common/responses/responses_context.py
  • src/models/requests.py
  • src/utils/responses.py
  • src/utils/types.py
  • tests/unit/app/endpoints/test_query.py
  • tests/unit/app/endpoints/test_responses.py
  • tests/unit/app/endpoints/test_responses_splunk.py
  • tests/unit/app/endpoints/test_streaming_query.py
  • tests/unit/utils/test_types.py
💤 Files with no reviewable changes (2)
  • src/models/requests.py
  • src/utils/types.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: unit_tests (3.13)
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: build-pr
  • GitHub Check: Pylinter
  • GitHub Check: integration_tests (3.12)
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Import FastAPI dependencies with: from fastapi import APIRouter, HTTPException, Request, status, Depends
Import Llama Stack client with: from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for parameters and return types in functions
Use typing_extensions.Self for model validators in Pydantic models
Use modern union type syntax str | int instead of Union[str, int]
Use Optional[Type] for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use standard log levels with clear purposes: debug, info, warning, error
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use ABC for abstract base classes with @abstractmethod decorators
Use @model_validator and @field_validator for Pydantic model validation
Complete type annotations for all class attributes; use specific types, not Any
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections

Files:

  • tests/unit/app/endpoints/test_query.py
  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query.py
  • src/utils/responses.py
  • tests/unit/app/endpoints/test_streaming_query.py
  • tests/unit/utils/test_types.py
  • src/models/common/responses/responses_context.py
  • src/models/common/responses/responses_api_params.py
  • tests/unit/app/endpoints/test_responses_splunk.py
  • tests/unit/app/endpoints/test_responses.py
  • src/app/endpoints/responses.py
tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest; pytest is the standard for this project
Use pytest-mock for AsyncMock objects in tests
Use marker pytest.mark.asyncio for async tests
Unit tests require 60% coverage, integration tests 10%

Files:

  • tests/unit/app/endpoints/test_query.py
  • tests/unit/app/endpoints/test_streaming_query.py
  • tests/unit/utils/test_types.py
  • tests/unit/app/endpoints/test_responses_splunk.py
  • tests/unit/app/endpoints/test_responses.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use FastAPI HTTPException with appropriate status codes for API endpoints

Files:

  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/responses.py
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models extend ConfigurationBase for config, BaseModel for data models

Files:

  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query.py
  • src/utils/responses.py
  • src/models/common/responses/responses_context.py
  • src/models/common/responses/responses_api_params.py
  • src/app/endpoints/responses.py
🧠 Learnings (11)
📚 Learning: 2026-02-26T13:56:51.366Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1229
File: src/app/endpoints/query.py:296-305
Timestamp: 2026-02-26T13:56:51.366Z
Learning: In src/app/endpoints/query.py, responses_params.input is guaranteed to be a str (not structured ResponseInput) because it's constructed via prepare_input(query_request) which always returns str. Therefore, cast(str, responses_params.input) is safe in this endpoint's code path.

Applied to files:

  • tests/unit/app/endpoints/test_query.py
  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query.py
📚 Learning: 2026-04-19T15:40:25.624Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to **/*.py : Import Llama Stack client with: `from llama_stack_client import AsyncLlamaStackClient`

Applied to files:

  • src/app/endpoints/query.py
  • src/app/endpoints/responses.py
📚 Learning: 2026-01-14T09:37:51.612Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 988
File: src/app/endpoints/query.py:319-339
Timestamp: 2026-01-14T09:37:51.612Z
Learning: In the lightspeed-stack repository, when provider_id == "azure", the Azure provider with provider_type "remote::azure" is guaranteed to be present in the providers list. Therefore, avoid defensive StopIteration handling for next() when locating the Azure provider in providers within src/app/endpoints/query.py. This change applies specifically to this file (or nearby provider lookup code) and relies on the invariant that the Azure provider exists; if the invariant could be violated, keep the existing StopIteration handling.

Applied to files:

  • src/app/endpoints/query.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/responses.py
📚 Learning: 2026-02-23T14:56:59.186Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-192
Timestamp: 2026-02-23T14:56:59.186Z
Learning: In the lightspeed-stack codebase (lightspeed-core/lightspeed-stack), do not enforce de-duplication of duplicate client.models.list() calls in model selection flows (e.g., in src/utils/responses.py prepare_responses_params). These calls are considered relatively cheap and removing duplicates could add unnecessary complexity to the flow. Apply this guideline specifically to this file/context unless similar performance characteristics and design decisions are documented elsewhere.

Applied to files:

  • src/utils/responses.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

  • src/models/common/responses/responses_context.py
  • src/models/common/responses/responses_api_params.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

  • src/models/common/responses/responses_context.py
  • src/models/common/responses/responses_api_params.py
📚 Learning: 2026-04-20T15:09:48.726Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1548
File: src/app/endpoints/rlsapi_v1.py:56-56
Timestamp: 2026-04-20T15:09:48.726Z
Learning: In `src/app/endpoints/rlsapi_v1.py`, the `_get_rh_identity_context = get_rh_identity_context` alias is a deliberate, temporary backward-compatibility shim introduced in PR `#1548` (part 1/3 of Splunk HEC telemetry work). It is planned for removal in part 3 once the responses endpoint is fully wired up and no tests/consumers reference the underscore-prefixed name. Do not flag this alias as unnecessary or dead code until part 3 is merged.

Applied to files:

  • tests/unit/app/endpoints/test_responses_splunk.py
📚 Learning: 2026-04-16T19:08:38.217Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1524
File: src/app/endpoints/responses.py:523-529
Timestamp: 2026-04-16T19:08:38.217Z
Learning: In lightspeed-stack (`src/app/endpoints/responses.py`), the predicate `server_label in configured_mcp_labels` is the established, intentional pattern for identifying server-deployed MCP tools across `_sanitize_response_dict`, `_is_server_mcp_output_item`, and `_should_filter_mcp_chunk`. Client-supplied tools cannot collide with configured server labels because `server_label` is a server-side field set by lightspeed-stack during tool injection; clients send `function` tools or MCP tools pointing at their own servers with different labels. Do not flag this predicate as a false-positive collision risk in code review.

Applied to files:

  • tests/unit/app/endpoints/test_responses.py
  • src/app/endpoints/responses.py
📚 Learning: 2026-04-19T15:40:25.624Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to **/*.py : Handle `APIConnectionError` from Llama Stack in error handling

Applied to files:

  • src/app/endpoints/responses.py
📚 Learning: 2026-04-07T15:03:11.530Z
Learnt from: jrobertboos
Repo: lightspeed-core/lightspeed-stack PR: 1396
File: src/app/endpoints/conversations_v1.py:6-6
Timestamp: 2026-04-07T15:03:11.530Z
Learning: In the `llama_stack_api` package, all imports MUST use the flat form `from llama_stack_api import <symbol>`. Sub-module imports (e.g., `from llama_stack_api.common.errors import ConversationNotFoundError`) are explicitly NOT supported and considered a code smell, as stated in `llama_stack_api/__init__.py` lines 15-19. Do not flag or suggest changing root-package imports to sub-module imports for this package.

Applied to files:

  • src/app/endpoints/responses.py
🔇 Additional comments (7)
src/app/endpoints/streaming_query.py (1)

62-62: Import wiring looks correct.

ResponsesApiParams now comes from the centralized models module, and the endpoint logic stays unchanged.

Also applies to: 119-119

tests/unit/app/endpoints/test_query.py (1)

15-15: Test import updated cleanly.

This keeps the spec aligned with the centralized ResponsesApiParams model used by the endpoint code.

tests/unit/utils/test_types.py (1)

13-19: Good regression coverage for ResponsesApiParams.model_dump().

The import now matches the centralized model location, and the new tests cover the auth reinjection behavior plus mixed-tool cases well.

Also applies to: 150-274

tests/unit/app/endpoints/test_streaming_query.py (1)

67-67: Import update is consistent with the refactor.

The test now references the same centralized ResponsesApiParams model as src/app/endpoints/streaming_query.py.

src/utils/responses.py (1)

94-94: Import relocation is correct.

This keeps prepare_responses_params and related helpers aligned with the new centralized ResponsesApiParams model.

src/app/endpoints/query.py (1)

26-26: Import path is aligned with the centralized model.

query.py now uses the new ResponsesApiParams source without changing handler behavior.

tests/unit/app/endpoints/test_responses_splunk.py (1)

26-27: Centralized helper wiring is consistent across the Splunk tests.

Using build_api_params_and_context(...) here matches the new request/context flow and removes a lot of duplicated inline setup.

Also applies to: 260-275, 341-356, 424-439, 485-500, 567-582, 652-667, 728-743

Comment thread src/app/endpoints/responses.py
Comment thread src/models/common/responses/responses_api_params.py
Comment thread tests/unit/app/endpoints/test_responses.py Outdated
@asimurka asimurka marked this pull request as draft April 28, 2026 13:07
@asimurka asimurka marked this pull request as ready for review April 28, 2026 13:29
@asimurka asimurka force-pushed the refactor_responses_endpoint branch from 3d08335 to e3f330f Compare April 28, 2026 13:36
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/app/endpoints/responses.py`:
- Around line 703-704: The filtering logic in the turn summary population is
inverted: currently it skips non-server outputs; change the condition so that
when context.filter_server_tools is true it skips server-deployed outputs
instead. Locate the check using context.filter_server_tools and
is_server_deployed_output(item) (the line with "if context.filter_server_tools
and not is_server_deployed_output(item): continue") and invert it to "if
context.filter_server_tools and is_server_deployed_output(item): continue" so
server-tool outputs are filtered out as intended.

In `@src/models/common/responses/responses_api_params.py`:
- Around line 6-23: The imports currently use the submodule form "from
llama_stack_api.openai_responses import ..." which is unsupported; update them
to the flat root form by importing all symbols from llama_stack_api (e.g.,
OpenAIResponseInputTool, OpenAIResponseInputToolChoice, OpenAIResponsePrompt,
OpenAIResponseReasoning, OpenAIResponseText, OpenAIResponseToolMCP) and keep the
existing local aliases (InputTool, ToolChoice, Prompt, Reasoning, Text,
OutputToolMCP) so callers of those symbols (functions/classes that reference
InputTool, ToolChoice, Prompt, Reasoning, Text, OutputToolMCP) continue to work.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 177705cd-3314-4961-8071-4ba53188b2c7

📥 Commits

Reviewing files that changed from the base of the PR and between b923543 and 3d08335.

📒 Files selected for processing (2)
  • src/app/endpoints/responses.py
  • src/models/common/responses/responses_api_params.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Import FastAPI dependencies with: from fastapi import APIRouter, HTTPException, Request, status, Depends
Import Llama Stack client with: from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for parameters and return types in functions
Use typing_extensions.Self for model validators in Pydantic models
Use modern union type syntax str | int instead of Union[str, int]
Use Optional[Type] for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use standard log levels with clear purposes: debug, info, warning, error
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use ABC for abstract base classes with @abstractmethod decorators
Use @model_validator and @field_validator for Pydantic model validation
Complete type annotations for all class attributes; use specific types, not Any
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections

Files:

  • src/models/common/responses/responses_api_params.py
  • src/app/endpoints/responses.py
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models extend ConfigurationBase for config, BaseModel for data models

Files:

  • src/models/common/responses/responses_api_params.py
  • src/app/endpoints/responses.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use FastAPI HTTPException with appropriate status codes for API endpoints

Files:

  • src/app/endpoints/responses.py
🧠 Learnings (10)
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

  • src/models/common/responses/responses_api_params.py
📚 Learning: 2026-04-19T15:40:25.624Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to **/*.py : Use Final[type] as type hint for all constants

Applied to files:

  • src/models/common/responses/responses_api_params.py
📚 Learning: 2026-04-06T20:18:11.336Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:11.336Z
Learning: In the lightspeed-stack codebase (src/app/endpoints/), `tools: Optional[list[Any]]` for MCP tool definitions is an intentional, consistent pattern used across all inference endpoints (query, responses, streaming_query, rlsapi_v1). Do not flag this as a typing issue — changing it in isolation would break consistency.

Applied to files:

  • src/models/common/responses/responses_api_params.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

  • src/models/common/responses/responses_api_params.py
📚 Learning: 2026-04-20T15:09:48.726Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1548
File: src/app/endpoints/rlsapi_v1.py:56-56
Timestamp: 2026-04-20T15:09:48.726Z
Learning: In `src/app/endpoints/rlsapi_v1.py`, the `_get_rh_identity_context = get_rh_identity_context` alias is a deliberate, temporary backward-compatibility shim introduced in PR `#1548` (part 1/3 of Splunk HEC telemetry work). It is planned for removal in part 3 once the responses endpoint is fully wired up and no tests/consumers reference the underscore-prefixed name. Do not flag this alias as unnecessary or dead code until part 3 is merged.

Applied to files:

  • src/app/endpoints/responses.py
📚 Learning: 2026-04-19T15:40:25.624Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to **/*.py : Import Llama Stack client with: `from llama_stack_client import AsyncLlamaStackClient`

Applied to files:

  • src/app/endpoints/responses.py
📚 Learning: 2026-04-19T15:40:25.624Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to **/*.py : Handle `APIConnectionError` from Llama Stack in error handling

Applied to files:

  • src/app/endpoints/responses.py
📚 Learning: 2026-04-07T15:03:11.530Z
Learnt from: jrobertboos
Repo: lightspeed-core/lightspeed-stack PR: 1396
File: src/app/endpoints/conversations_v1.py:6-6
Timestamp: 2026-04-07T15:03:11.530Z
Learning: In the `llama_stack_api` package, all imports MUST use the flat form `from llama_stack_api import <symbol>`. Sub-module imports (e.g., `from llama_stack_api.common.errors import ConversationNotFoundError`) are explicitly NOT supported and considered a code smell, as stated in `llama_stack_api/__init__.py` lines 15-19. Do not flag or suggest changing root-package imports to sub-module imports for this package.

Applied to files:

  • src/app/endpoints/responses.py
📚 Learning: 2026-04-16T19:08:38.217Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1524
File: src/app/endpoints/responses.py:523-529
Timestamp: 2026-04-16T19:08:38.217Z
Learning: In lightspeed-stack (`src/app/endpoints/responses.py`), the predicate `server_label in configured_mcp_labels` is the established, intentional pattern for identifying server-deployed MCP tools across `_sanitize_response_dict`, `_is_server_mcp_output_item`, and `_should_filter_mcp_chunk`. Client-supplied tools cannot collide with configured server labels because `server_label` is a server-side field set by lightspeed-stack during tool injection; clients send `function` tools or MCP tools pointing at their own servers with different labels. Do not flag this predicate as a false-positive collision risk in code review.

Applied to files:

  • src/app/endpoints/responses.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

  • src/app/endpoints/responses.py
🔇 Additional comments (2)
src/models/common/responses/responses_api_params.py (1)

128-175: ResponsesApiParams serialization split is clean and robust.

model_dump() restoring MCP authorization for outbound requests while echoed_params() serializing response-safe tool data is a good separation.

src/app/endpoints/responses.py (1)

825-831: Good fix: conversation append now preserves structured input.

Using api_params.input here keeps streaming/non-streaming storage aligned for non-text or mixed structured input payloads.

Also applies to: 958-963

Comment thread src/app/endpoints/responses.py
Comment thread src/models/common/responses/responses_api_params.py
@asimurka asimurka force-pushed the refactor_responses_endpoint branch from e3f330f to c9d74ee Compare April 28, 2026 13:40
{
**updated_request.model_dump(exclude={"tools"}),
"tools": updated_request.tools,
}
Copy link
Copy Markdown
Contributor Author

@asimurka asimurka Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Temporary solution that will be refined in a separate PR. This prevents authorization headers to be dropped on model dump.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant