RSPEED-2849: add user_agent to ResponsesEventData for CLA/Goose differentiation#1620
RSPEED-2849: add user_agent to ResponsesEventData for CLA/Goose differentiation#1620Lifto wants to merge 3 commits intolightspeed-core:mainfrom
Conversation
…se differentiation Adds a sanitized user_agent field to ResponsesEventData and the Splunk event payload, enabling differentiation between Goose and other clients in telemetry. Extracts and sanitizes the User-Agent header (strips control characters, truncates to 128 chars) before storing. Closes RSPEED-2849
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (2)
WalkthroughThis change adds User-Agent header extraction and sanitization with length limiting, then propagates the sanitized value through the responses request handling pipeline into Splunk telemetry events. The extracted value is computed once and threaded through streaming, non-streaming, and error-path handlers. Changes
Sequence Diagram(s)sequenceDiagram
participant Request as Incoming Request
participant Handler as responses_endpoint_handler
participant Sanitizer as _get_user_agent
participant ResponseHandler as Response Handler<br/>(Streaming/Non-streaming)
participant EventBuilder as build_responses_event
participant Splunk as Splunk Event
Request->>Handler: HTTP request with User-Agent header
Handler->>Sanitizer: Extract header value
Sanitizer->>Sanitizer: Remove control chars, truncate to 128 chars
Sanitizer->>Handler: Return sanitized user_agent
Handler->>ResponseHandler: Pass user_agent through pipeline
ResponseHandler->>EventBuilder: Include user_agent in ResponsesEventData
EventBuilder->>Splunk: Emit event with user_agent field
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/app/endpoints/responses.py (1)
394-423:⚠️ Potential issue | 🟡 MinorUpdate docstrings to include the new
user_agentparameter.The three modified function signatures now accept
user_agent, but theirArgssections do not document it.Suggested patch
async def handle_streaming_response( @@ Args: @@ background_tasks: FastAPI background task manager for telemetry events rh_identity_context: Tuple of (org_id, system_id) from RH identity + user_agent: Sanitized User-Agent string from request headers, or None. Returns: StreamingResponse with SSE-formatted events @@ async def generate_response( @@ Args: @@ background_tasks: FastAPI background task manager for telemetry events rh_identity_context: Tuple of (org_id, system_id) from RH identity shield_blocked: Whether the request was blocked by a shield + user_agent: Sanitized User-Agent string from request headers, or None. Yields: SSE-formatted strings from the generator @@ async def handle_non_streaming_response( @@ Args: @@ filter_server_tools: Whether to filter server-deployed MCP tool output background_tasks: FastAPI background task manager for telemetry events rh_identity_context: Tuple of (org_id, system_id) from RH identity + user_agent: Sanitized User-Agent string from request headers, or None. Returns: ResponsesResponse with the completed responseAs per coding guidelines:
All functions require docstrings with brief descriptionsandFollow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections.Also applies to: 919-951, 998-1028
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/responses.py` around lines 394 - 423, The docstrings for the streaming response handlers (e.g., handle_streaming_response) were not updated to document the new user_agent parameter; update the Google-style docstrings for handle_streaming_response and the two other modified functions referenced in the diff to include a brief entry for user_agent under Args/Parameters (type Optional[str], brief description like "Optional user-agent string from the request"), and ensure the Parameters/Returns sections follow the existing Google convention and ordering; keep wording concise and consistent with other parameter descriptions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/app/endpoints/responses.py`:
- Line 115: The module constant _USER_AGENT_MAX_LENGTH should be annotated with
Final[int]; update its declaration to use a Final type hint (e.g., from typing
import Final) so it reads as a constant per repo standards (refer to symbol
_USER_AGENT_MAX_LENGTH and ensure you add the appropriate import for Final if
missing).
---
Outside diff comments:
In `@src/app/endpoints/responses.py`:
- Around line 394-423: The docstrings for the streaming response handlers (e.g.,
handle_streaming_response) were not updated to document the new user_agent
parameter; update the Google-style docstrings for handle_streaming_response and
the two other modified functions referenced in the diff to include a brief entry
for user_agent under Args/Parameters (type Optional[str], brief description like
"Optional user-agent string from the request"), and ensure the
Parameters/Returns sections follow the existing Google convention and ordering;
keep wording concise and consistent with other parameter descriptions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 405bfbf9-bdae-4248-ba56-9c84f24fa2d8
📒 Files selected for processing (4)
src/app/endpoints/responses.pysrc/observability/formats/responses.pytests/unit/app/endpoints/test_responses_splunk.pytests/unit/observability/formats/test_responses.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: Pylinter
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Import FastAPI dependencies with:from fastapi import APIRouter, HTTPException, Request, status, Depends
Import Llama Stack client with:from llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for parameters and return types in functions
Usetyping_extensions.Selffor model validators in Pydantic models
Use modern union type syntaxstr | intinstead ofUnion[str, int]
UseOptional[Type]for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Useasync deffor I/O operations and external API calls
HandleAPIConnectionErrorfrom Llama Stack in error handling
Use standard log levels with clear purposes: debug, info, warning, error
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use ABC for abstract base classes with@abstractmethoddecorators
Use@model_validatorand@field_validatorfor Pydantic model validation
Complete type annotations for all class attributes; use specific types, notAny
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections
Files:
src/observability/formats/responses.pytests/unit/app/endpoints/test_responses_splunk.pytests/unit/observability/formats/test_responses.pysrc/app/endpoints/responses.py
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Pydantic models extend
ConfigurationBasefor config,BaseModelfor data models
Files:
src/observability/formats/responses.pysrc/app/endpoints/responses.py
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest; pytest is the standard for this project
Usepytest-mockfor AsyncMock objects in tests
Use markerpytest.mark.asynciofor async tests
Unit tests require 60% coverage, integration tests 10%
Files:
tests/unit/app/endpoints/test_responses_splunk.pytests/unit/observability/formats/test_responses.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoints
Files:
src/app/endpoints/responses.py
🧠 Learnings (1)
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.
Applied to files:
src/app/endpoints/responses.py
🔇 Additional comments (4)
tests/unit/app/endpoints/test_responses_splunk.py (1)
721-773: Good coverage for User-Agent sanitization behavior.These tests exercise the core edge cases (empty/control-only headers, control-char stripping, and max-length truncation) and give strong regression protection for the new telemetry field path.
tests/unit/observability/formats/test_responses.py (1)
23-35: LGTM — user_agent serialization tests are complete and clear.The added fixture and assertions cover both “set” and “None” cases, which matches the new event contract.
Also applies to: 117-163
src/observability/formats/responses.py (1)
27-27: Clean event-schema extension foruser_agent.The dataclass and builder updates are aligned, and the field is propagated without breaking existing defaults.
Also applies to: 49-49
src/app/endpoints/responses.py (1)
118-137: Telemetry plumbing is consistent across success, blocked, and error paths.Good job threading
user_agentthrough every_queue_responses_splunk_eventpath so observability behavior stays uniform.Also applies to: 390-391, 447-457, 475-520, 537-538, 977-995, 1050-1180
|
Fixed the docstring nit as well — added |
Summary
user_agent: Optional[str] = Nonefield toResponsesEventDatadataclass insrc/observability/formats/responses.py_get_user_agent()helper insrc/app/endpoints/responses.pythat extracts and sanitizes theUser-Agentrequest header (strips control characters, truncates to 128 chars, returnsNonewhen absent)user_agentthrough all_queue_responses_splunk_event()call sites so it appears in every Splunk HEC payload for the/responsesendpointNonewhen absent, field present in Splunk serializationMotivation
Enables Splunk telemetry to differentiate between Goose and other clients (e.g. CLA) hitting the
/responsesendpoint. Closes RSPEED-2849 / AIA-Issue-001.Security
The raw
User-Agentstring is user-controlled input. The implementation:ord(c) < 32(control characters) and explicit\r/\nNonefor absent or empty headersTesting
make verifypasses: pylint 10.00/10, pyright 0 errors, ruff cleanmake test-unitpasses: 2118 passed, 1 pre-existing failure (test_user_data_collection_wrong_directory_path)Summary by CodeRabbit
Release Notes
Observability
Tests