RSPEED-2849: add user_agent to ResponsesEventData for CLA/Goose differentiation by Lifto · Pull Request #1620 · lightspeed-core/lightspeed-stack

Lifto · 2026-04-28T17:55:29Z

Summary

Adds user_agent: Optional[str] = None field to ResponsesEventData dataclass in src/observability/formats/responses.py
Adds _get_user_agent() helper in src/app/endpoints/responses.py that extracts and sanitizes the User-Agent request header (strips control characters, truncates to 128 chars, returns None when absent)
Passes user_agent through all _queue_responses_splunk_event() call sites so it appears in every Splunk HEC payload for the /responses endpoint
Adds unit tests covering: populated from header, sanitized/truncated, None when absent, field present in Splunk serialization

Motivation

Enables Splunk telemetry to differentiate between Goose and other clients (e.g. CLA) hitting the /responses endpoint. Closes RSPEED-2849 / AIA-Issue-001.

Security

The raw User-Agent string is user-controlled input. The implementation:

Strips all characters with ord(c) < 32 (control characters) and explicit \r/\n
Truncates to a maximum of 128 characters
Returns None for absent or empty headers

Testing

make verify passes: pylint 10.00/10, pyright 0 errors, ruff clean
make test-unit passes: 2118 passed, 1 pre-existing failure (test_user_data_collection_wrong_directory_path)

Summary by CodeRabbit

Release Notes

Observability
- User-Agent header information is now captured and included in telemetry events.
- Automatic sanitization removes control characters and enforces a 128-character limit on values.
Tests
- Added test coverage for User-Agent header extraction, sanitization, and telemetry event payload inclusion.

…se differentiation Adds a sanitized user_agent field to ResponsesEventData and the Splunk event payload, enabling differentiation between Goose and other clients in telemetry. Extracts and sanitizes the User-Agent header (strips control characters, truncates to 128 chars) before storing. Closes RSPEED-2849

coderabbitai · 2026-04-28T17:55:43Z

Warning

Rate limit exceeded

@Lifto has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 10 minutes and 20 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0e04d6e0-7ba8-4765-a076-e0fe84013b29

📥 Commits

Reviewing files that changed from the base of the PR and between 6dff618 and b0ec99a.

📒 Files selected for processing (2)

AGENTS.md
src/app/endpoints/responses.py

Walkthrough

This change adds User-Agent header extraction and sanitization with length limiting, then propagates the sanitized value through the responses request handling pipeline into Splunk telemetry events. The extracted value is computed once and threaded through streaming, non-streaming, and error-path handlers.

Changes

Cohort / File(s)	Summary
User-Agent Extraction & Pipeline `src/app/endpoints/responses.py`	Implements `_get_user_agent` function to extract, sanitize (remove control characters), and truncate User-Agent header to 128 characters. Plumbs sanitized value through `responses_endpoint_handler` into streaming/non-streaming response handlers and Splunk telemetry calls.
Telemetry Data Model `src/observability/formats/responses.py`	Extends `ResponsesEventData` dataclass with optional `user_agent` field (defaults to `None`). Updates `build_responses_event` to emit `user_agent` key in event payload.
Test Coverage `tests/unit/app/endpoints/test_responses_splunk.py`, `tests/unit/observability/formats/test_responses.py`	Adds `TestGetUserAgent` test suite covering header extraction, null handling, control character sanitization, and 128-character truncation logic. Introduces tests verifying `user_agent` field defaults, inclusion in events, and correct propagation through telemetry pipeline.

Sequence Diagram(s)

sequenceDiagram
    participant Request as Incoming Request
    participant Handler as responses_endpoint_handler
    participant Sanitizer as _get_user_agent
    participant ResponseHandler as Response Handler<br/>(Streaming/Non-streaming)
    participant EventBuilder as build_responses_event
    participant Splunk as Splunk Event

    Request->>Handler: HTTP request with User-Agent header
    Handler->>Sanitizer: Extract header value
    Sanitizer->>Sanitizer: Remove control chars, truncate to 128 chars
    Sanitizer->>Handler: Return sanitized user_agent
    Handler->>ResponseHandler: Pass user_agent through pipeline
    ResponseHandler->>EventBuilder: Include user_agent in ResponsesEventData
    EventBuilder->>Splunk: Emit event with user_agent field

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Title check	✅ Passed	The title accurately describes the main change: adding a user_agent field to ResponsesEventData for client differentiation, which aligns with the primary purpose of the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

✨ Simplify code

Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/app/endpoints/responses.py (1)

394-423: ⚠️ Potential issue | 🟡 Minor

Update docstrings to include the new user_agent parameter.

The three modified function signatures now accept user_agent, but their Args sections do not document it.

Suggested patch

 async def handle_streaming_response(
@@
     Args:
@@
         background_tasks: FastAPI background task manager for telemetry events
         rh_identity_context: Tuple of (org_id, system_id) from RH identity
+        user_agent: Sanitized User-Agent string from request headers, or None.
     Returns:
         StreamingResponse with SSE-formatted events
@@
 async def generate_response(
@@
     Args:
@@
         background_tasks: FastAPI background task manager for telemetry events
         rh_identity_context: Tuple of (org_id, system_id) from RH identity
         shield_blocked: Whether the request was blocked by a shield
+        user_agent: Sanitized User-Agent string from request headers, or None.
     Yields:
         SSE-formatted strings from the generator
@@
 async def handle_non_streaming_response(
@@
     Args:
@@
         filter_server_tools: Whether to filter server-deployed MCP tool output
         background_tasks: FastAPI background task manager for telemetry events
         rh_identity_context: Tuple of (org_id, system_id) from RH identity
+        user_agent: Sanitized User-Agent string from request headers, or None.
     Returns:
         ResponsesResponse with the completed response

As per coding guidelines: All functions require docstrings with brief descriptions and Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections.

Also applies to: 919-951, 998-1028

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/responses.py` around lines 394 - 423, The docstrings for
the streaming response handlers (e.g., handle_streaming_response) were not
updated to document the new user_agent parameter; update the Google-style
docstrings for handle_streaming_response and the two other modified functions
referenced in the diff to include a brief entry for user_agent under
Args/Parameters (type Optional[str], brief description like "Optional user-agent
string from the request"), and ensure the Parameters/Returns sections follow the
existing Google convention and ordering; keep wording concise and consistent
with other parameter descriptions.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/app/endpoints/responses.py`:
- Line 115: The module constant _USER_AGENT_MAX_LENGTH should be annotated with
Final[int]; update its declaration to use a Final type hint (e.g., from typing
import Final) so it reads as a constant per repo standards (refer to symbol
_USER_AGENT_MAX_LENGTH and ensure you add the appropriate import for Final if
missing).

---

Outside diff comments:
In `@src/app/endpoints/responses.py`:
- Around line 394-423: The docstrings for the streaming response handlers (e.g.,
handle_streaming_response) were not updated to document the new user_agent
parameter; update the Google-style docstrings for handle_streaming_response and
the two other modified functions referenced in the diff to include a brief entry
for user_agent under Args/Parameters (type Optional[str], brief description like
"Optional user-agent string from the request"), and ensure the
Parameters/Returns sections follow the existing Google convention and ordering;
keep wording concise and consistent with other parameter descriptions.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 405bfbf9-bdae-4248-ba56-9c84f24fa2d8

📥 Commits

Reviewing files that changed from the base of the PR and between c50425e and 6dff618.

📒 Files selected for processing (4)

src/app/endpoints/responses.py
src/observability/formats/responses.py
tests/unit/app/endpoints/test_responses_splunk.py
tests/unit/observability/formats/test_responses.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: E2E: server mode / ci / group 1
GitHub Check: Pylinter

🧰 Additional context used

📓 Path-based instructions (4)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Import FastAPI dependencies with: from fastapi import APIRouter, HTTPException, Request, status, Depends
Import Llama Stack client with: from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for parameters and return types in functions
Use typing_extensions.Self for model validators in Pydantic models
Use modern union type syntax str | int instead of Union[str, int]
Use Optional[Type] for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use standard log levels with clear purposes: debug, info, warning, error
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use ABC for abstract base classes with @abstractmethod decorators
Use @model_validator and @field_validator for Pydantic model validation
Complete type annotations for all class attributes; use specific types, not Any
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections

Files:

src/observability/formats/responses.py
tests/unit/app/endpoints/test_responses_splunk.py
tests/unit/observability/formats/test_responses.py
src/app/endpoints/responses.py

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models extend ConfigurationBase for config, BaseModel for data models

Files:

src/observability/formats/responses.py
src/app/endpoints/responses.py

tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest; pytest is the standard for this project
Use pytest-mock for AsyncMock objects in tests
Use marker pytest.mark.asyncio for async tests
Unit tests require 60% coverage, integration tests 10%

Files:

tests/unit/app/endpoints/test_responses_splunk.py
tests/unit/observability/formats/test_responses.py

src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use FastAPI HTTPException with appropriate status codes for API endpoints

Files:

src/app/endpoints/responses.py

🧠 Learnings (1)

📚 Learning: 2026-04-06T20:18:07.852Z

Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

src/app/endpoints/responses.py

🔇 Additional comments (4)

tests/unit/app/endpoints/test_responses_splunk.py (1)

721-773: Good coverage for User-Agent sanitization behavior.

These tests exercise the core edge cases (empty/control-only headers, control-char stripping, and max-length truncation) and give strong regression protection for the new telemetry field path.

tests/unit/observability/formats/test_responses.py (1)

23-35: LGTM — user_agent serialization tests are complete and clear.

The added fixture and assertions cover both “set” and “None” cases, which matches the new event contract.

Also applies to: 117-163

src/observability/formats/responses.py (1)

27-27: Clean event-schema extension for user_agent.

The dataclass and builder updates are aligned, and the field is propagated without breaking existing defaults.

Also applies to: 49-49

src/app/endpoints/responses.py (1)

118-137: Telemetry plumbing is consistent across success, blocked, and error paths.

Good job threading user_agent through every _queue_responses_splunk_event path so observability behavior stays uniform.

Also applies to: 390-391, 447-457, 475-520, 537-538, 977-995, 1050-1180

Lifto · 2026-04-28T18:18:22Z

Fixed the docstring nit as well — added user_agent: Sanitized User-Agent string from request headers, or None. to the Args section of handle_streaming_response, generate_response, and handle_non_streaming_response.

coderabbitai Bot reviewed Apr 28, 2026

View reviewed changes

Comment thread src/app/endpoints/responses.py Outdated

fix: add Final annotation and docstring updates for user_agent

b1b141d

Lifto changed the title ~~feat(observability): add user_agent to ResponsesEventData for CLA/Goose differentiation~~ RSPEED-2849: add user_agent to ResponsesEventData for CLA/Goose differentiation Apr 28, 2026

docs: add PR title prefix requirement to AGENTS.md

b0ec99a

Lifto mentioned this pull request Apr 28, 2026

RSPEED-2849: add endpoint label to LLM Prometheus metrics #1624

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RSPEED-2849: add user_agent to ResponsesEventData for CLA/Goose differentiation#1620

RSPEED-2849: add user_agent to ResponsesEventData for CLA/Goose differentiation#1620
Lifto wants to merge 3 commits intolightspeed-core:mainfrom
Lifto:feat/rspeed-2849-user-agent

Lifto commented Apr 28, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 28, 2026 •

edited

Loading

Rate limit exceeded

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Lifto commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Lifto commented Apr 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Security

Testing

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Lifto commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Lifto commented Apr 28, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 28, 2026 •

edited

Loading