RSPEED-2849: add endpoint label to LLM Prometheus metrics#1624
RSPEED-2849: add endpoint label to LLM Prometheus metrics#1624Lifto wants to merge 4 commits intolightspeed-core:mainfrom
Conversation
…se differentiation Adds a sanitized user_agent field to ResponsesEventData and the Splunk event payload, enabling differentiation between Goose and other clients in telemetry. Extracts and sanitizes the User-Agent header (strips control characters, truncates to 128 chars) before storing. Closes RSPEED-2849
WalkthroughThe pull request adds endpoint path tracking and user-agent handling across the API layer. It threads a constant Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/app/endpoints/streaming_query.py (1)
330-335: 🛠️ Refactor suggestion | 🟠 MajorDocstrings need the new
endpoint_pathparameter.Both updated function signatures include
endpoint_path, but their Args sections do not document it.Proposed docstring update
Args: responses_params: The Responses API parameters context: The response generator context + endpoint_path: API endpoint path used for metric labeling. @@ Args: turn_response: The streaming response from Llama Stack context: The response generator context turn_summary: TurnSummary to populate during streaming + endpoint_path: API endpoint path used for metric labeling.As per coding guidelines, "All functions require docstrings with brief descriptions and complete type annotations for parameters and return types".
Also applies to: 700-704
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/streaming_query.py` around lines 330 - 335, Two functions in this module now accept the endpoint_path parameter but their docstrings were not updated; add an Args entry for endpoint_path to each affected function's docstring describing it and its type (e.g., endpoint_path: str — the request endpoint path used by the response generator), keep the existing parameter order and type annotations intact, and ensure the Args section format matches the other parameters (including the tuple[AsyncIterator[str], TurnSummary] return description) for both functions whose signatures were changed to include endpoint_path.src/utils/shields.py (1)
74-83:⚠️ Potential issue | 🟠 Major
detect_shield_violationsuses unlabeled.inc()on a labeled Counter at line 80.The metric
llm_calls_validation_errors_totalrequires anendpointlabel (defined insrc/metrics/__init__.pyline 44), but line 80 calls.inc()without labels. This causes a Prometheus ValueError at runtime when the code path is executed. The function must receiveendpoint_pathand calllabels(endpoint_path).inc()to match the pattern used correctly inrun_shield_moderation(line 183).Proposed fix
-def detect_shield_violations(output_items: list[Any]) -> bool: +def detect_shield_violations(output_items: list[Any], endpoint_path: str) -> bool: @@ - metrics.llm_calls_validation_errors_total.inc() + metrics.llm_calls_validation_errors_total.labels(endpoint_path).inc()Update the function docstring to document the new
endpoint_pathparameter.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/shields.py` around lines 74 - 83, The detect_shield_violations function is incrementing a labeled Counter with .inc() without labels causing a Prometheus ValueError; add an endpoint_path parameter to detect_shield_violations, update its docstring to document endpoint_path, and replace metrics.llm_calls_validation_errors_total.inc() with metrics.llm_calls_validation_errors_total.labels(endpoint_path).inc() so the metric is incremented with the required endpoint label (referencing detect_shield_violations and metrics.llm_calls_validation_errors_total).src/utils/responses.py (1)
1450-1499:⚠️ Potential issue | 🟠 MajorKeep
build_turn_summary()free of metric side effects.This helper still calls
extract_token_usage(), so it incrementsllm_calls_totaland token counters while building a summary. Callers insrc/app/endpoints/responses.py(Lines 1079-1081) andsrc/app/endpoints/rlsapi_v1.py(Line 722) already extracted usage earlier, so those paths are counted twice; the non-streaming blocked/v1/responsespath also turns a synthetic zero-usage response into a real LLM call.♻️ Suggested direction
def build_turn_summary( # pylint: disable=too-many-arguments,too-many-positional-arguments response: Optional[OpenAIResponseObject], model: str, endpoint_path: str, vector_store_ids: Optional[list[str]] = None, rag_id_mapping: Optional[dict[str, str]] = None, filter_server_tools: bool = False, + token_usage: Optional[TokenCounter] = None, ) -> TurnSummary: @@ - summary.rag_chunks = parse_rag_chunks(response, vector_store_ids, rag_id_mapping) - summary.token_usage = extract_token_usage(response.usage, model, endpoint_path) + summary.rag_chunks = parse_rag_chunks(response, vector_store_ids, rag_id_mapping) + summary.token_usage = token_usage or extract_token_usage( + response.usage, model, endpoint_path + ) return summaryThen pass the already-computed
token_usagefrom call sites that need it before summary construction.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/responses.py` around lines 1450 - 1499, build_turn_summary currently has a side effect of calling extract_token_usage (incrementing metrics); remove that call and add an explicit token_usage parameter (e.g., token_usage: Optional[TokenUsage] = None) so callers supply the already-computed usage instead of rebuilding it; update call sites that previously relied on build_turn_summary (notably the handlers in src/app/endpoints/responses.py and src/app/endpoints/rlsapi_v1.py) to compute token_usage once via extract_token_usage(response.usage, model, endpoint_path) and pass it into build_turn_summary, and ensure build_turn_summary simply assigns the provided token_usage to summary.token_usage without calling extract_token_usage itself.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/app/endpoints/streaming_query.py`:
- Around line 318-323: The function signature for retrieve_response_generator
currently uses an empty-string default for endpoint_path which can emit
accidental endpoint="" metrics; change the signature to require endpoint_path:
str (remove the = "" default) in retrieve_response_generator and the other
similar function at the second occurrence (lines ~688-693), then update all call
sites to pass a concrete endpoint_path value and adjust any tests or callers
that relied on the default; also update related docstrings/comments to reflect
the parameter is required.
In `@src/metrics/__init__.py`:
- Around line 31-58: The endpoint label values used by the Prometheus counters
llm_calls_total, llm_calls_failures_total, llm_calls_validation_errors_total,
llm_token_sent_total and llm_token_received_total must be centralized: define a
shared set/enum/Literal of allowed endpoint strings in constants.py (or reuse
existing ones there) and replace any hardcoded/inline endpoint values at each
metric emission site to reference that central constant to prevent new series
from typos/empties; update any code that constructs the label value for the
"endpoint" label to validate/normalize against that central constant set and
fall back to a single explicit "unknown" constant if needed.
---
Outside diff comments:
In `@src/app/endpoints/streaming_query.py`:
- Around line 330-335: Two functions in this module now accept the endpoint_path
parameter but their docstrings were not updated; add an Args entry for
endpoint_path to each affected function's docstring describing it and its type
(e.g., endpoint_path: str — the request endpoint path used by the response
generator), keep the existing parameter order and type annotations intact, and
ensure the Args section format matches the other parameters (including the
tuple[AsyncIterator[str], TurnSummary] return description) for both functions
whose signatures were changed to include endpoint_path.
In `@src/utils/responses.py`:
- Around line 1450-1499: build_turn_summary currently has a side effect of
calling extract_token_usage (incrementing metrics); remove that call and add an
explicit token_usage parameter (e.g., token_usage: Optional[TokenUsage] = None)
so callers supply the already-computed usage instead of rebuilding it; update
call sites that previously relied on build_turn_summary (notably the handlers in
src/app/endpoints/responses.py and src/app/endpoints/rlsapi_v1.py) to compute
token_usage once via extract_token_usage(response.usage, model, endpoint_path)
and pass it into build_turn_summary, and ensure build_turn_summary simply
assigns the provided token_usage to summary.token_usage without calling
extract_token_usage itself.
In `@src/utils/shields.py`:
- Around line 74-83: The detect_shield_violations function is incrementing a
labeled Counter with .inc() without labels causing a Prometheus ValueError; add
an endpoint_path parameter to detect_shield_violations, update its docstring to
document endpoint_path, and replace
metrics.llm_calls_validation_errors_total.inc() with
metrics.llm_calls_validation_errors_total.labels(endpoint_path).inc() so the
metric is incremented with the required endpoint label (referencing
detect_shield_violations and metrics.llm_calls_validation_errors_total).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 98aed2a2-7c3d-43f5-9f88-cb41d554dc91
📒 Files selected for processing (14)
AGENTS.mdsrc/app/endpoints/query.pysrc/app/endpoints/responses.pysrc/app/endpoints/rlsapi_v1.pysrc/app/endpoints/streaming_query.pysrc/metrics/__init__.pysrc/observability/formats/responses.pysrc/utils/responses.pysrc/utils/shields.pytests/unit/app/endpoints/test_metrics.pytests/unit/app/endpoints/test_responses_splunk.pytests/unit/observability/formats/test_responses.pytests/unit/utils/test_responses.pytests/unit/utils/test_shields.py
💤 Files with no reviewable changes (1)
- tests/unit/app/endpoints/test_metrics.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: build-pr
🧰 Additional context used
📓 Path-based instructions (5)
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use absolute imports for internal modules (e.g.,
from authentication import get_auth_dependency)Import FastAPI dependencies from
fastapimodule (e.g.,APIRouter, HTTPException, Request, status, Depends)Import Llama Stack client as
from llama_stack_client import AsyncLlamaStackClientAll modules must start with descriptive docstrings explaining purpose
Use
logger = get_logger(__name__)fromlog.pyfor module loggingUse
Final[type]as type hint for all constantsUse
@field_validatorand@model_validatorfor custom validation in Pydantic modelsAll functions require docstrings with brief descriptions and complete type annotations for parameters and return types
Use
typing_extensions.Selftype hint for model validatorsUse modern Union syntax
str | intinstead ofUnion[str, int]for type hintsUse
Optional[Type]for nullable types in function signaturesUse snake_case naming with descriptive, action-oriented function names (e.g., get_, validate_, check_)
Avoid in-place parameter modification anti-patterns - return new data structures instead of modifying input parameters
Use
async deffor I/O operations and external API calls in Python functionsHandle
APIConnectionErrorfrom Llama Stack in error handling logicUse standard log levels with clear purposes:
debug()for diagnostics,info()for general execution,warning()for unexpected events,error()for serious problemsAll classes require descriptive docstrings explaining purpose
Use PascalCase class names with descriptive names and standard suffixes:
Configuration,Error/Exception,Resolver,InterfacePydantic configuration classes must extend
ConfigurationBase, data models must extendBaseModelAbstract classes must use ABC with
@abstractmethoddecorators for interface definitionsPydantic models must use
@model_validatorand@field_validatorfor validation logicUse complete type annotations for all class attributes with specific types...
Files:
src/observability/formats/responses.pysrc/utils/shields.pysrc/app/endpoints/query.pysrc/app/endpoints/streaming_query.pysrc/metrics/__init__.pysrc/app/endpoints/responses.pysrc/utils/responses.pysrc/app/endpoints/rlsapi_v1.py
AGENTS.md
📄 CodeRabbit inference engine (CLAUDE.md)
Document agent implementations and their configurations in AGENTS.md
Files:
AGENTS.md
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use pytest for all unit and integration tests, do not use unittest framework
Use
pytest-mockfor AsyncMock objects in test filesUse
pytest.mark.asynciomarker for asynchronous test functions
Files:
tests/unit/app/endpoints/test_responses_splunk.pytests/unit/observability/formats/test_responses.pytests/unit/utils/test_responses.pytests/unit/utils/test_shields.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/query.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/responses.pysrc/app/endpoints/rlsapi_v1.py
src/**/__init__.py
📄 CodeRabbit inference engine (AGENTS.md)
Package
__init__.pyfiles must contain brief package descriptions
Files:
src/metrics/__init__.py
🧠 Learnings (9)
📓 Common learnings
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack
Timestamp: 2026-04-28T21:14:00.744Z
Learning: Always check `pyproject.toml` for existing dependencies before adding new ones
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack
Timestamp: 2026-04-28T21:14:00.744Z
Learning: Always verify current library versions in `pyproject.toml` rather than assuming versions
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack
Timestamp: 2026-04-28T21:14:00.744Z
Learning: Check `constants.py` for shared constants before defining new ones
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack
Timestamp: 2026-04-28T21:14:00.744Z
Learning: Unit tests require 60% code coverage, integration tests require 10% coverage
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack
Timestamp: 2026-04-28T21:14:00.744Z
Learning: Run `uv run make format` to auto-format code with black and ruff before completion
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack
Timestamp: 2026-04-28T21:14:00.744Z
Learning: Run `uv run make verify` to execute all linters (black, pylint, pyright, ruff, docstyle, check-types) before completion
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack
Timestamp: 2026-04-28T21:14:00.744Z
Learning: Create unit tests for all new code
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack
Timestamp: 2026-04-28T21:14:00.744Z
Learning: Ensure all tests pass before submission
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack
Timestamp: 2026-04-28T21:14:00.744Z
Learning: Use pylint with `source-roots = "src"` configuration for static analysis
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack
Timestamp: 2026-04-28T21:14:00.744Z
Learning: Never commit secrets/keys, use environment variables for sensitive data
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack
Timestamp: 2026-04-28T21:14:00.744Z
Learning: Use bandit for security issue detection in code
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack
Timestamp: 2026-04-28T21:14:00.744Z
Learning: PR titles MUST start with a JIRA issue key prefix (LCORE-, RSPEED-, MGTM-, OLS-, RHDHPAI-, LEADS-)
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack
Timestamp: 2026-04-28T21:14:00.744Z
Learning: Use `uv sync --group dev --group llslibdev` for setting up development dependencies
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack
Timestamp: 2026-04-28T21:14:00.744Z
Learning: Always use `uv run` prefix for executing commands in the project
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack
Timestamp: 2026-04-28T21:14:00.744Z
Learning: Follow existing code patterns in the module you're modifying
📚 Learning: 2026-03-02T16:38:30.287Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T16:38:30.287Z
Learning: Applies to AGENTS.md : Document agent implementations and their configurations in AGENTS.md
Applied to files:
AGENTS.md
📚 Learning: 2026-03-02T15:57:10.746Z
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-stack PR: 1250
File: AGENTS.md:11-12
Timestamp: 2026-03-02T15:57:10.746Z
Learning: In AGENTS.md, Linting Tools section should list actual tool names (e.g., mypy, pylint, pyright, ruff, black) instead of make target names (e.g., check-types). Ensure the section documents the individual tools that perform linting, not the Makefile targets that invoke them, so readers understand which tools are used and how to configure/run them directly.
Applied to files:
AGENTS.md
📚 Learning: 2026-01-14T09:37:51.612Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 988
File: src/app/endpoints/query.py:319-339
Timestamp: 2026-01-14T09:37:51.612Z
Learning: In the lightspeed-stack repository, when provider_id == "azure", the Azure provider with provider_type "remote::azure" is guaranteed to be present in the providers list. Therefore, avoid defensive StopIteration handling for next() when locating the Azure provider in providers within src/app/endpoints/query.py. This change applies specifically to this file (or nearby provider lookup code) and relies on the invariant that the Azure provider exists; if the invariant could be violated, keep the existing StopIteration handling.
Applied to files:
src/app/endpoints/query.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.
Applied to files:
src/app/endpoints/query.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/responses.pysrc/app/endpoints/rlsapi_v1.py
📚 Learning: 2025-10-29T13:05:22.438Z
Learnt from: luis5tb
Repo: lightspeed-core/lightspeed-stack PR: 727
File: src/app/endpoints/a2a.py:43-43
Timestamp: 2025-10-29T13:05:22.438Z
Learning: In the lightspeed-stack repository, endpoint files in src/app/endpoints/ intentionally use a shared logger name "app.endpoints.handlers" rather than __name__, allowing unified logging configuration across all endpoint handlers (query.py, streaming_query.py, a2a.py).
Applied to files:
src/app/endpoints/streaming_query.py
📚 Learning: 2026-04-19T15:40:25.624Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to **/*.py : Import FastAPI dependencies with: `from fastapi import APIRouter, HTTPException, Request, status, Depends`
Applied to files:
src/app/endpoints/responses.py
📚 Learning: 2026-02-23T14:56:59.186Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-192
Timestamp: 2026-02-23T14:56:59.186Z
Learning: In the lightspeed-stack codebase (lightspeed-core/lightspeed-stack), do not enforce de-duplication of duplicate client.models.list() calls in model selection flows (e.g., in src/utils/responses.py prepare_responses_params). These calls are considered relatively cheap and removing duplicates could add unnecessary complexity to the flow. Apply this guideline specifically to this file/context unless similar performance characteristics and design decisions are documented elsewhere.
Applied to files:
src/utils/responses.py
📚 Learning: 2026-04-07T14:44:42.022Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1469
File: src/models/config.py:1928-1933
Timestamp: 2026-04-07T14:44:42.022Z
Learning: In lightspeed-core/lightspeed-stack, `allow_verbose_infer` (previously `customization.allow_verbose_infer`, now `rlsapi_v1.allow_verbose_infer`) is only used internally by the `rlsapi_v1` `/infer` endpoint and has a single known consumer (the PR author). Backward compatibility for this config field relocation is intentionally not required and should not be flagged in future reviews.
Applied to files:
src/app/endpoints/rlsapi_v1.py
🪛 LanguageTool
AGENTS.md
[uncategorized] ~198-~198: The official name of this software platform is spelled with a capital “H”.
Context: ...es this via pr-title-checker (config: .github/pr-title-checker-config.json). Allowe...
(GITHUB)
🔇 Additional comments (7)
AGENTS.md (1)
196-204: Nice addition for CI title enforcement visibility.This is clear and gives contributors concrete pass/fail examples for PR naming.
src/utils/shields.py (1)
122-126: Good propagation of endpoint label into shield moderation metrics.The new
endpoint_paththreading and labeled increment in the blocked branch are aligned with the endpoint-level observability objective.Also applies to: 181-184
tests/unit/app/endpoints/test_responses_splunk.py (1)
721-773: User-Agent sanitization tests are well-targeted.These cases cover the important behavioral edges (empty, control chars, truncation, valid pass-through).
tests/unit/utils/test_responses.py (1)
2232-2300: Good update of metric tests for endpoint-labeled counters.The call signatures and label assertions are consistent with the new
endpointdimension.Also applies to: 2553-2596
tests/unit/utils/test_shields.py (1)
121-123: Nice synchronization of shield tests with endpoint-labeled metrics.The updated call signatures and label assertion reduce drift with the production metric behavior.
Also applies to: 152-154, 194-201, 234-236, 265-267, 331-333, 349-350, 382-383
src/observability/formats/responses.py (1)
27-27: Telemetry schema and payload mapping foruser_agentlook good.Defaulting to
Nonekeeps compatibility while enabling the new field.Also applies to: 49-49
tests/unit/observability/formats/test_responses.py (1)
23-35: Good coverage foruser_agentdefaulting and payload emission.These tests validate both explicit and omitted-field behavior cleanly.
Also applies to: 117-163
| async def retrieve_response_generator( | ||
| responses_params: ResponsesApiParams, | ||
| context: ResponseGeneratorContext, | ||
| endpoint_path: str = "", | ||
| ) -> tuple[AsyncIterator[str], TurnSummary]: | ||
| """ |
There was a problem hiding this comment.
Make endpoint_path required to avoid accidental empty-label metrics.
Using endpoint_path: str = "" can silently emit an extra endpoint="" series if a caller omits the argument. That breaks the intended bounded endpoint set.
Proposed fix
async def retrieve_response_generator(
responses_params: ResponsesApiParams,
context: ResponseGeneratorContext,
- endpoint_path: str = "",
+ endpoint_path: str,
) -> tuple[AsyncIterator[str], TurnSummary]:
@@
async def response_generator( # pylint: disable=too-many-branches,too-many-statements,too-many-locals
turn_response: AsyncIterator[OpenAIResponseObjectStream],
context: ResponseGeneratorContext,
turn_summary: TurnSummary,
- endpoint_path: str = "",
+ endpoint_path: str,
) -> AsyncIterator[str]:Also applies to: 688-693
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/app/endpoints/streaming_query.py` around lines 318 - 323, The function
signature for retrieve_response_generator currently uses an empty-string default
for endpoint_path which can emit accidental endpoint="" metrics; change the
signature to require endpoint_path: str (remove the = "" default) in
retrieve_response_generator and the other similar function at the second
occurrence (lines ~688-693), then update all call sites to pass a concrete
endpoint_path value and adjust any tests or callers that relied on the default;
also update related docstrings/comments to reflect the parameter is required.
| llm_calls_total = Counter( | ||
| "ls_llm_calls_total", "LLM calls counter", ["provider", "model"] | ||
| "ls_llm_calls_total", "LLM calls counter", ["provider", "model", "endpoint"] | ||
| ) | ||
|
|
||
| # Metric that counts how many LLM calls failed | ||
| llm_calls_failures_total = Counter( | ||
| "ls_llm_calls_failures_total", "LLM calls failures", ["provider", "model"] | ||
| "ls_llm_calls_failures_total", | ||
| "LLM calls failures", | ||
| ["provider", "model", "endpoint"], | ||
| ) | ||
|
|
||
| # Metric that counts how many LLM calls had validation errors | ||
| llm_calls_validation_errors_total = Counter( | ||
| "ls_llm_validation_errors_total", "LLM validation errors" | ||
| "ls_llm_validation_errors_total", "LLM validation errors", ["endpoint"] | ||
| ) | ||
|
|
||
| # TODO(lucasagomes): Add metric for token usage | ||
| # https://issues.redhat.com/browse/LCORE-411 | ||
| llm_token_sent_total = Counter( | ||
| "ls_llm_token_sent_total", "LLM tokens sent", ["provider", "model"] | ||
| "ls_llm_token_sent_total", "LLM tokens sent", ["provider", "model", "endpoint"] | ||
| ) | ||
|
|
||
| # TODO(lucasagomes): Add metric for token usage | ||
| # https://issues.redhat.com/browse/LCORE-411 | ||
| llm_token_received_total = Counter( | ||
| "ls_llm_token_received_total", "LLM tokens received", ["provider", "model"] | ||
| "ls_llm_token_received_total", | ||
| "LLM tokens received", | ||
| ["provider", "model", "endpoint"], |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Centralize the allowed endpoint label values.
These counters now depend on endpoint strings supplied from several call chains, so a typo or empty fallback will create a new Prometheus series silently. Please move the four supported endpoint paths behind shared constants (or an enum/Literal) and reuse them at every metric call site to preserve the bounded-cardinality guarantee.
Based on learnings: Check constants.py for shared constants before defining new ones.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/metrics/__init__.py` around lines 31 - 58, The endpoint label values used
by the Prometheus counters llm_calls_total, llm_calls_failures_total,
llm_calls_validation_errors_total, llm_token_sent_total and
llm_token_received_total must be centralized: define a shared set/enum/Literal
of allowed endpoint strings in constants.py (or reuse existing ones there) and
replace any hardcoded/inline endpoint values at each metric emission site to
reference that central constant to prevent new series from typos/empties; update
any code that constructs the label value for the "endpoint" label to
validate/normalize against that central constant set and fall back to a single
explicit "unknown" constant if needed.
Summary
endpointlabel to all 5 LLM Prometheus metrics (llm_calls_total,llm_calls_failures_total,llm_calls_validation_errors_total,llm_token_sent_total,llm_token_received_total) to enable per-endpoint traffic differentiation in Grafanaendpoint_path: strthrough all call chains from the 4 endpoint handlers (/v1/responses,/v1/infer,/v1/streaming_query,/v1/query) down to the metric increment sites_increment_llm_call_metric,extract_token_usage, andrun_shield_moderationWhy
RSPEED-2849 — enables CLA vs Goose traffic differentiation in Grafana dashboards by labeling each LLM metric with the originating API endpoint.
Cardinality
The
endpointlabel has exactly 4 bounded values:/v1/infer/v1/responses/v1/streaming_query/v1/queryNo cardinality explosion risk.
Stacking
This PR stacks on #1620 (
feat/rspeed-2849-user-agent). Please merge #1620 first.Summary by CodeRabbit
Release Notes
Improvements
Documentation