LCORE-1262: Use context data class in responses#1612
LCORE-1262: Use context data class in responses#1612asimurka wants to merge 2 commits intolightspeed-core:mainfrom
Conversation
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (3)
WalkthroughMigration of Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant Endpoint as ResponsesEndpoint
participant Moderation as ShieldModeration
participant RAG as InlineRAG
participant Llama as AsyncLlamaStackClient
participant Telemetry as Telemetry/DB
Client->>Endpoint: POST /responses (request)
Endpoint->>Endpoint: build ResponsesApiParams, ResponsesContext
Endpoint->>Moderation: async check (context.moderation_result)
Moderation-->>Endpoint: moderation_result
Endpoint->>RAG: fetch inline RAG docs (context.inline_rag_context)
RAG-->>Endpoint: RAG docs
Endpoint->>Llama: invoke model stream or non-stream with api_params
Llama-->>Endpoint: chunks / final response
Endpoint->>Telemetry: persist telemetry & splunk (uses context)
Endpoint-->>Client: stream events / final response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| available_quotas = get_available_quotas( | ||
| quota_limiters=configuration.quota_limiters, user_id=context.auth[0] | ||
| ) | ||
| moderation_result = cast(ShieldModerationBlocked, context.moderation_result) |
There was a problem hiding this comment.
Only entered when blocked by moderation - type checker unable to infer this.
| available_quotas={}, | ||
| output_text="", | ||
| **echoed_params, | ||
| **api_params.echoed_params(configuration.rag_id_mapping), |
There was a problem hiding this comment.
Pass rag mapping to reduce dependencies between modules.
| # These are handled internally by LCS and should not be forwarded | ||
| # to clients that don't understand the mcp_call item type. | ||
| if _should_filter_mcp_chunk( | ||
| chunk, event_type, configured_mcp_labels, server_mcp_output_indices |
There was a problem hiding this comment.
event_type is an attribute of chunk
| topic_summary=topic_summary, | ||
| ) | ||
| if not shield_blocked: | ||
| if context.moderation_result.decision == "passed": |
There was a problem hiding this comment.
Used positive condition
| else 0 | ||
| ), | ||
| input_tokens=turn_summary.token_usage.input_tokens, | ||
| output_tokens=turn_summary.token_usage.output_tokens, |
There was a problem hiding this comment.
TurnSummary already has default factory for token_usage, which already has 0 defaults for tokens
| raise ValueError("You cannot provide context by moderation response.") | ||
| return value | ||
|
|
||
| def echoed_params(self) -> dict[str, Any]: |
There was a problem hiding this comment.
Replaced under ResponsesApiParams instead of request
| type ResponseInput = str | list[ResponseItem] | ||
|
|
||
|
|
||
| class ResponsesApiParams(BaseModel): |
There was a problem hiding this comment.
Replaced to a separate file
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/app/endpoints/responses.py (1)
344-355:⚠️ Potential issue | 🔴 CriticalDon't construct
ResponsesApiParamsfromupdated_request.model_dump().That dump has already removed nested MCP
authorization, soapi_params.toolsnever carries the credentials thatResponsesApiParams.model_dump()is trying to preserve. Any MCP request that depends on auth will reachresponses.create()without it.Suggested fix
- api_params = ResponsesApiParams.model_validate(updated_request.model_dump()) + api_params = ResponsesApiParams.model_validate( + { + **updated_request.model_dump(exclude={"tools"}), + "tools": updated_request.tools, + } + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/responses.py` around lines 344 - 355, The code currently builds ResponsesApiParams via ResponsesApiParams.model_validate(updated_request.model_dump()), which loses nested MCP authorization (so api_params.tools lacks credentials); instead construct/validate ResponsesApiParams from the original Pydantic model (e.g. pass updated_request itself or a deep model_copy) or explicitly merge the nested authorization into the dict before validation so api_params.tools retains credentials; update the creation site (the ResponsesApiParams instantiation) to use updated_request (or reconstructed data including updated_request.authorization) rather than updated_request.model_dump().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/app/endpoints/responses.py`:
- Around line 820-825: In the branch that appends streamed turns (the if
checking api_params.store and api_params.previous_response_id and
latest_response_object), replace usage of the flattened context.input_text with
the original structured input api_params.input when calling
append_turn_items_to_conversation so the stored conversation matches the
non-streaming path; update the call site where
append_turn_items_to_conversation(context.client, api_params.conversation,
context.input_text, latest_response_object.output, ...) to pass api_params.input
instead and keep the rest of the parameters unchanged.
In `@src/models/common/responses/responses_api_params.py`:
- Around line 29-49: The constant _ECHOED_FIELDS should be annotated with
Final[set[str]]; import Final from typing and change the declaration to use a
Final type hint (e.g., add "from typing import Final" and update
"_ECHOED_FIELDS: Final[set[str]] = { ... }"), keeping the same string elements
so tooling recognizes it as a constant.
In `@tests/unit/app/endpoints/test_responses.py`:
- Around line 73-85: The test helper is building parameter/context objects from
serialized data and bypassing validation (using updated_request.model_dump() and
ResponsesContext.model_construct), which hides API contract regressions; change
the construction to use the live models and run validation: create
ResponsesApiParams from the updated_request object itself (not model_dump) —
e.g. pass updated_request into ResponsesApiParams.model_validate or the normal
constructor — and replace ResponsesContext.model_construct with
ResponsesContext.model_validate or the validated constructor so the
ResponsesContext (and nested authorization/mcp fields) are validated by the
model.
---
Outside diff comments:
In `@src/app/endpoints/responses.py`:
- Around line 344-355: The code currently builds ResponsesApiParams via
ResponsesApiParams.model_validate(updated_request.model_dump()), which loses
nested MCP authorization (so api_params.tools lacks credentials); instead
construct/validate ResponsesApiParams from the original Pydantic model (e.g.
pass updated_request itself or a deep model_copy) or explicitly merge the nested
authorization into the dict before validation so api_params.tools retains
credentials; update the creation site (the ResponsesApiParams instantiation) to
use updated_request (or reconstructed data including
updated_request.authorization) rather than updated_request.model_dump().
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 35cf3482-3033-42fb-ab19-f917eda7f747
📒 Files selected for processing (13)
src/app/endpoints/query.pysrc/app/endpoints/responses.pysrc/app/endpoints/streaming_query.pysrc/models/common/responses/responses_api_params.pysrc/models/common/responses/responses_context.pysrc/models/requests.pysrc/utils/responses.pysrc/utils/types.pytests/unit/app/endpoints/test_query.pytests/unit/app/endpoints/test_responses.pytests/unit/app/endpoints/test_responses_splunk.pytests/unit/app/endpoints/test_streaming_query.pytests/unit/utils/test_types.py
💤 Files with no reviewable changes (2)
- src/models/requests.py
- src/utils/types.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: unit_tests (3.13)
- GitHub Check: unit_tests (3.12)
- GitHub Check: build-pr
- GitHub Check: Pylinter
- GitHub Check: integration_tests (3.12)
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Import FastAPI dependencies with:from fastapi import APIRouter, HTTPException, Request, status, Depends
Import Llama Stack client with:from llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for parameters and return types in functions
Usetyping_extensions.Selffor model validators in Pydantic models
Use modern union type syntaxstr | intinstead ofUnion[str, int]
UseOptional[Type]for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Useasync deffor I/O operations and external API calls
HandleAPIConnectionErrorfrom Llama Stack in error handling
Use standard log levels with clear purposes: debug, info, warning, error
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use ABC for abstract base classes with@abstractmethoddecorators
Use@model_validatorand@field_validatorfor Pydantic model validation
Complete type annotations for all class attributes; use specific types, notAny
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections
Files:
tests/unit/app/endpoints/test_query.pysrc/app/endpoints/query.pysrc/app/endpoints/streaming_query.pysrc/utils/responses.pytests/unit/app/endpoints/test_streaming_query.pytests/unit/utils/test_types.pysrc/models/common/responses/responses_context.pysrc/models/common/responses/responses_api_params.pytests/unit/app/endpoints/test_responses_splunk.pytests/unit/app/endpoints/test_responses.pysrc/app/endpoints/responses.py
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest; pytest is the standard for this project
Usepytest-mockfor AsyncMock objects in tests
Use markerpytest.mark.asynciofor async tests
Unit tests require 60% coverage, integration tests 10%
Files:
tests/unit/app/endpoints/test_query.pytests/unit/app/endpoints/test_streaming_query.pytests/unit/utils/test_types.pytests/unit/app/endpoints/test_responses_splunk.pytests/unit/app/endpoints/test_responses.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoints
Files:
src/app/endpoints/query.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/responses.py
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Pydantic models extend
ConfigurationBasefor config,BaseModelfor data models
Files:
src/app/endpoints/query.pysrc/app/endpoints/streaming_query.pysrc/utils/responses.pysrc/models/common/responses/responses_context.pysrc/models/common/responses/responses_api_params.pysrc/app/endpoints/responses.py
🧠 Learnings (11)
📚 Learning: 2026-02-26T13:56:51.366Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1229
File: src/app/endpoints/query.py:296-305
Timestamp: 2026-02-26T13:56:51.366Z
Learning: In src/app/endpoints/query.py, responses_params.input is guaranteed to be a str (not structured ResponseInput) because it's constructed via prepare_input(query_request) which always returns str. Therefore, cast(str, responses_params.input) is safe in this endpoint's code path.
Applied to files:
tests/unit/app/endpoints/test_query.pysrc/app/endpoints/query.pysrc/app/endpoints/streaming_query.py
📚 Learning: 2026-04-19T15:40:25.624Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to **/*.py : Import Llama Stack client with: `from llama_stack_client import AsyncLlamaStackClient`
Applied to files:
src/app/endpoints/query.pysrc/app/endpoints/responses.py
📚 Learning: 2026-01-14T09:37:51.612Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 988
File: src/app/endpoints/query.py:319-339
Timestamp: 2026-01-14T09:37:51.612Z
Learning: In the lightspeed-stack repository, when provider_id == "azure", the Azure provider with provider_type "remote::azure" is guaranteed to be present in the providers list. Therefore, avoid defensive StopIteration handling for next() when locating the Azure provider in providers within src/app/endpoints/query.py. This change applies specifically to this file (or nearby provider lookup code) and relies on the invariant that the Azure provider exists; if the invariant could be violated, keep the existing StopIteration handling.
Applied to files:
src/app/endpoints/query.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.
Applied to files:
src/app/endpoints/query.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/responses.py
📚 Learning: 2026-02-23T14:56:59.186Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-192
Timestamp: 2026-02-23T14:56:59.186Z
Learning: In the lightspeed-stack codebase (lightspeed-core/lightspeed-stack), do not enforce de-duplication of duplicate client.models.list() calls in model selection flows (e.g., in src/utils/responses.py prepare_responses_params). These calls are considered relatively cheap and removing duplicates could add unnecessary complexity to the flow. Apply this guideline specifically to this file/context unless similar performance characteristics and design decisions are documented elsewhere.
Applied to files:
src/utils/responses.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.
Applied to files:
src/models/common/responses/responses_context.pysrc/models/common/responses/responses_api_params.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.
Applied to files:
src/models/common/responses/responses_context.pysrc/models/common/responses/responses_api_params.py
📚 Learning: 2026-04-20T15:09:48.726Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1548
File: src/app/endpoints/rlsapi_v1.py:56-56
Timestamp: 2026-04-20T15:09:48.726Z
Learning: In `src/app/endpoints/rlsapi_v1.py`, the `_get_rh_identity_context = get_rh_identity_context` alias is a deliberate, temporary backward-compatibility shim introduced in PR `#1548` (part 1/3 of Splunk HEC telemetry work). It is planned for removal in part 3 once the responses endpoint is fully wired up and no tests/consumers reference the underscore-prefixed name. Do not flag this alias as unnecessary or dead code until part 3 is merged.
Applied to files:
tests/unit/app/endpoints/test_responses_splunk.py
📚 Learning: 2026-04-16T19:08:38.217Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1524
File: src/app/endpoints/responses.py:523-529
Timestamp: 2026-04-16T19:08:38.217Z
Learning: In lightspeed-stack (`src/app/endpoints/responses.py`), the predicate `server_label in configured_mcp_labels` is the established, intentional pattern for identifying server-deployed MCP tools across `_sanitize_response_dict`, `_is_server_mcp_output_item`, and `_should_filter_mcp_chunk`. Client-supplied tools cannot collide with configured server labels because `server_label` is a server-side field set by lightspeed-stack during tool injection; clients send `function` tools or MCP tools pointing at their own servers with different labels. Do not flag this predicate as a false-positive collision risk in code review.
Applied to files:
tests/unit/app/endpoints/test_responses.pysrc/app/endpoints/responses.py
📚 Learning: 2026-04-19T15:40:25.624Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to **/*.py : Handle `APIConnectionError` from Llama Stack in error handling
Applied to files:
src/app/endpoints/responses.py
📚 Learning: 2026-04-07T15:03:11.530Z
Learnt from: jrobertboos
Repo: lightspeed-core/lightspeed-stack PR: 1396
File: src/app/endpoints/conversations_v1.py:6-6
Timestamp: 2026-04-07T15:03:11.530Z
Learning: In the `llama_stack_api` package, all imports MUST use the flat form `from llama_stack_api import <symbol>`. Sub-module imports (e.g., `from llama_stack_api.common.errors import ConversationNotFoundError`) are explicitly NOT supported and considered a code smell, as stated in `llama_stack_api/__init__.py` lines 15-19. Do not flag or suggest changing root-package imports to sub-module imports for this package.
Applied to files:
src/app/endpoints/responses.py
🔇 Additional comments (7)
src/app/endpoints/streaming_query.py (1)
62-62: Import wiring looks correct.
ResponsesApiParamsnow comes from the centralized models module, and the endpoint logic stays unchanged.Also applies to: 119-119
tests/unit/app/endpoints/test_query.py (1)
15-15: Test import updated cleanly.This keeps the spec aligned with the centralized
ResponsesApiParamsmodel used by the endpoint code.tests/unit/utils/test_types.py (1)
13-19: Good regression coverage forResponsesApiParams.model_dump().The import now matches the centralized model location, and the new tests cover the auth reinjection behavior plus mixed-tool cases well.
Also applies to: 150-274
tests/unit/app/endpoints/test_streaming_query.py (1)
67-67: Import update is consistent with the refactor.The test now references the same centralized
ResponsesApiParamsmodel assrc/app/endpoints/streaming_query.py.src/utils/responses.py (1)
94-94: Import relocation is correct.This keeps
prepare_responses_paramsand related helpers aligned with the new centralizedResponsesApiParamsmodel.src/app/endpoints/query.py (1)
26-26: Import path is aligned with the centralized model.
query.pynow uses the newResponsesApiParamssource without changing handler behavior.tests/unit/app/endpoints/test_responses_splunk.py (1)
26-27: Centralized helper wiring is consistent across the Splunk tests.Using
build_api_params_and_context(...)here matches the new request/context flow and removes a lot of duplicated inline setup.Also applies to: 260-275, 341-356, 424-439, 485-500, 567-582, 652-667, 728-743
3d08335 to
e3f330f
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/app/endpoints/responses.py`:
- Around line 703-704: The filtering logic in the turn summary population is
inverted: currently it skips non-server outputs; change the condition so that
when context.filter_server_tools is true it skips server-deployed outputs
instead. Locate the check using context.filter_server_tools and
is_server_deployed_output(item) (the line with "if context.filter_server_tools
and not is_server_deployed_output(item): continue") and invert it to "if
context.filter_server_tools and is_server_deployed_output(item): continue" so
server-tool outputs are filtered out as intended.
In `@src/models/common/responses/responses_api_params.py`:
- Around line 6-23: The imports currently use the submodule form "from
llama_stack_api.openai_responses import ..." which is unsupported; update them
to the flat root form by importing all symbols from llama_stack_api (e.g.,
OpenAIResponseInputTool, OpenAIResponseInputToolChoice, OpenAIResponsePrompt,
OpenAIResponseReasoning, OpenAIResponseText, OpenAIResponseToolMCP) and keep the
existing local aliases (InputTool, ToolChoice, Prompt, Reasoning, Text,
OutputToolMCP) so callers of those symbols (functions/classes that reference
InputTool, ToolChoice, Prompt, Reasoning, Text, OutputToolMCP) continue to work.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 177705cd-3314-4961-8071-4ba53188b2c7
📒 Files selected for processing (2)
src/app/endpoints/responses.pysrc/models/common/responses/responses_api_params.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Import FastAPI dependencies with:from fastapi import APIRouter, HTTPException, Request, status, Depends
Import Llama Stack client with:from llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for parameters and return types in functions
Usetyping_extensions.Selffor model validators in Pydantic models
Use modern union type syntaxstr | intinstead ofUnion[str, int]
UseOptional[Type]for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Useasync deffor I/O operations and external API calls
HandleAPIConnectionErrorfrom Llama Stack in error handling
Use standard log levels with clear purposes: debug, info, warning, error
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use ABC for abstract base classes with@abstractmethoddecorators
Use@model_validatorand@field_validatorfor Pydantic model validation
Complete type annotations for all class attributes; use specific types, notAny
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections
Files:
src/models/common/responses/responses_api_params.pysrc/app/endpoints/responses.py
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Pydantic models extend
ConfigurationBasefor config,BaseModelfor data models
Files:
src/models/common/responses/responses_api_params.pysrc/app/endpoints/responses.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoints
Files:
src/app/endpoints/responses.py
🧠 Learnings (10)
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.
Applied to files:
src/models/common/responses/responses_api_params.py
📚 Learning: 2026-04-19T15:40:25.624Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to **/*.py : Use Final[type] as type hint for all constants
Applied to files:
src/models/common/responses/responses_api_params.py
📚 Learning: 2026-04-06T20:18:11.336Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:11.336Z
Learning: In the lightspeed-stack codebase (src/app/endpoints/), `tools: Optional[list[Any]]` for MCP tool definitions is an intentional, consistent pattern used across all inference endpoints (query, responses, streaming_query, rlsapi_v1). Do not flag this as a typing issue — changing it in isolation would break consistency.
Applied to files:
src/models/common/responses/responses_api_params.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.
Applied to files:
src/models/common/responses/responses_api_params.py
📚 Learning: 2026-04-20T15:09:48.726Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1548
File: src/app/endpoints/rlsapi_v1.py:56-56
Timestamp: 2026-04-20T15:09:48.726Z
Learning: In `src/app/endpoints/rlsapi_v1.py`, the `_get_rh_identity_context = get_rh_identity_context` alias is a deliberate, temporary backward-compatibility shim introduced in PR `#1548` (part 1/3 of Splunk HEC telemetry work). It is planned for removal in part 3 once the responses endpoint is fully wired up and no tests/consumers reference the underscore-prefixed name. Do not flag this alias as unnecessary or dead code until part 3 is merged.
Applied to files:
src/app/endpoints/responses.py
📚 Learning: 2026-04-19T15:40:25.624Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to **/*.py : Import Llama Stack client with: `from llama_stack_client import AsyncLlamaStackClient`
Applied to files:
src/app/endpoints/responses.py
📚 Learning: 2026-04-19T15:40:25.624Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to **/*.py : Handle `APIConnectionError` from Llama Stack in error handling
Applied to files:
src/app/endpoints/responses.py
📚 Learning: 2026-04-07T15:03:11.530Z
Learnt from: jrobertboos
Repo: lightspeed-core/lightspeed-stack PR: 1396
File: src/app/endpoints/conversations_v1.py:6-6
Timestamp: 2026-04-07T15:03:11.530Z
Learning: In the `llama_stack_api` package, all imports MUST use the flat form `from llama_stack_api import <symbol>`. Sub-module imports (e.g., `from llama_stack_api.common.errors import ConversationNotFoundError`) are explicitly NOT supported and considered a code smell, as stated in `llama_stack_api/__init__.py` lines 15-19. Do not flag or suggest changing root-package imports to sub-module imports for this package.
Applied to files:
src/app/endpoints/responses.py
📚 Learning: 2026-04-16T19:08:38.217Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1524
File: src/app/endpoints/responses.py:523-529
Timestamp: 2026-04-16T19:08:38.217Z
Learning: In lightspeed-stack (`src/app/endpoints/responses.py`), the predicate `server_label in configured_mcp_labels` is the established, intentional pattern for identifying server-deployed MCP tools across `_sanitize_response_dict`, `_is_server_mcp_output_item`, and `_should_filter_mcp_chunk`. Client-supplied tools cannot collide with configured server labels because `server_label` is a server-side field set by lightspeed-stack during tool injection; clients send `function` tools or MCP tools pointing at their own servers with different labels. Do not flag this predicate as a false-positive collision risk in code review.
Applied to files:
src/app/endpoints/responses.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.
Applied to files:
src/app/endpoints/responses.py
🔇 Additional comments (2)
src/models/common/responses/responses_api_params.py (1)
128-175:ResponsesApiParamsserialization split is clean and robust.
model_dump()restoring MCP authorization for outbound requests whileechoed_params()serializing response-safe tool data is a good separation.src/app/endpoints/responses.py (1)
825-831: Good fix: conversation append now preserves structured input.Using
api_params.inputhere keeps streaming/non-streaming storage aligned for non-text or mixed structured input payloads.Also applies to: 958-963
e3f330f to
c9d74ee
Compare
| { | ||
| **updated_request.model_dump(exclude={"tools"}), | ||
| "tools": updated_request.tools, | ||
| } |
There was a problem hiding this comment.
Temporary solution that will be refined in a separate PR. This prevents authorization headers to be dropped on model dump.
Description
Introduces
ResponsesContextdata class that carries context attributes across function calls. MovesResponsesApiParamsmodel into a separate file and moves parameter echoing from request.TODO: simplify
_queue_responses_splunk_eventto use contexts. Will be done in a separate PR.Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit