Skip to content

RSPEED-2943: centralize metrics recording#1622

Open
major wants to merge 1 commit intolightspeed-core:mainfrom
major:metrics-recording-refactor
Open

RSPEED-2943: centralize metrics recording#1622
major wants to merge 1 commit intolightspeed-core:mainfrom
major:metrics-recording-refactor

Conversation

@major
Copy link
Copy Markdown
Contributor

@major major commented Apr 28, 2026

Description

Centralizes Prometheus metric updates behind a small metrics.recording facade so new metrics can be added without spreading Prometheus object details across application code. Existing metric names, labels, and output stay unchanged.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: OpenCode
  • Generated by: N/A

Related Tickets & Documents

  • Related Issue # RSPEED-2943
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • uv run pytest tests/unit/metrics/test_recording.py tests/unit/app/test_main_middleware.py tests/unit/utils/test_responses.py tests/unit/utils/test_shields.py tests/unit/app/endpoints/test_streaming_query.py tests/unit/app/endpoints/test_rlsapi_v1.py
    • Result: 361 passed, 3 warnings
  • uv run black --check src/app/endpoints/rlsapi_v1.py src/app/endpoints/streaming_query.py src/app/main.py src/metrics/__init__.py src/metrics/recording.py src/utils/responses.py src/utils/shields.py tests/unit/app/endpoints/test_streaming_query.py tests/unit/app/test_main_middleware.py tests/unit/metrics/test_recording.py tests/unit/utils/test_responses.py tests/unit/utils/test_shields.py
    • Result: 12 files would be left unchanged
  • uv run make verify
    • Result: passed black, pylint, pyright, ruff, pydocstyle, and mypy

Summary by CodeRabbit

Release Notes

  • Refactor

    • Streamlined metrics recording through a centralized recording API, replacing direct metric updates across LLM calls, token usage tracking, and REST API monitoring with a unified approach featuring improved error handling to ensure metrics are captured reliably.
  • Tests

    • Updated unit tests to validate the new centralized metrics recording system.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 28, 2026

Warning

Rate limit exceeded

@major has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 14 minutes and 46 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: eef7b2c7-c04f-4d99-bd78-b0b303683762

📥 Commits

Reviewing files that changed from the base of the PR and between 27816d5 and 7a76f83.

📒 Files selected for processing (12)
  • src/app/endpoints/rlsapi_v1.py
  • src/app/endpoints/streaming_query.py
  • src/app/main.py
  • src/metrics/__init__.py
  • src/metrics/recording.py
  • src/utils/responses.py
  • src/utils/shields.py
  • tests/unit/app/endpoints/test_streaming_query.py
  • tests/unit/app/test_main_middleware.py
  • tests/unit/metrics/test_recording.py
  • tests/unit/utils/test_responses.py
  • tests/unit/utils/test_shields.py

Walkthrough

The pull request introduces a new metrics.recording facade module that centralizes Prometheus metric recording operations. Direct metric counter increments throughout the codebase are refactored to call standardized recording functions for LLM operations, REST API tracking, and response duration measurement. Corresponding test updates verify the new recording layer.

Changes

Cohort / File(s) Summary
Recording Facade
src/metrics/recording.py
New module introducing a centralized recording API with context manager measure_response_duration() and functions for REST API tracking, LLM call/failure/validation tracking, and token usage recording. Includes error handling with try/except wrapper for robustness.
Endpoint Metrics Refactoring
src/app/endpoints/rlsapi_v1.py, src/app/endpoints/streaming_query.py, src/app/main.py
Refactor direct Prometheus metric increments to use metrics.recording functions: LLM failure and call tracking in endpoints, response duration and REST API call counting in middleware.
Utility Metrics Refactoring
src/utils/responses.py, src/utils/shields.py
Replace direct metric manipulation with recording function calls: token usage recording via record_llm_token_usage() and LLM calls via record_llm_call() in responses; validation error tracking via record_llm_validation_error() in shields.
Metric Documentation
src/metrics/__init__.py
Update inline documentation comments for llm_token_sent_total and llm_token_received_total counters to clarify their purpose; no metric definitions or behavior changed.
Test Updates
tests/unit/app/endpoints/test_streaming_query.py, tests/unit/app/test_main_middleware.py, tests/unit/metrics/test_recording.py, tests/unit/utils/test_responses.py, tests/unit/utils/test_shields.py
New test module for recording facade; updates to existing tests to mock recording functions instead of direct Prometheus metrics, including verification of context manager and error-handling paths.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: centralizing metrics recording behind a facade, which aligns with the primary refactoring across the entire changeset.
Docstring Coverage ✅ Passed Docstring coverage is 84.31% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/metrics/recording.py`:
- Around line 17-35: The metrics calls in measure_response_duration and
record_rest_api_call can raise and currently bubble into request handling; wrap
the metrics interactions in try/except to prevent telemetry failures from
affecting API responses. For measure_response_duration
(metrics.response_duration_seconds.labels(path).time()) catch exceptions around
creating/entering the timing context and ensure the contextmanager still yields
once so the request proceeds; log the exception (non-fatal) instead of
re-raising. For record_rest_api_call wrap
metrics.rest_api_calls_total.labels(path, status_code).inc() in a try/except,
log any exception and swallow it so the request flow is unchanged. Use the
existing project logger or a safe fallback logger for the error messages.

In `@tests/unit/utils/test_shields.py`:
- Line 201: The test currently patches
utils.shields.recording.record_llm_validation_error but never asserts it was
invoked; update the default-message blocked-path test to assert the patched
recorder was called (e.g., assert_called_once() or assert_called_with(...)
depending on expected args) after exercising the blocked-path branch,
referencing the patched symbol
"utils.shields.recording.record_llm_validation_error" to locate the mock and
ensure the metric/instrumentation is verified.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 11106912-7360-4785-a9c3-0bff9353f7f6

📥 Commits

Reviewing files that changed from the base of the PR and between 93b2bb5 and 27816d5.

📒 Files selected for processing (12)
  • src/app/endpoints/rlsapi_v1.py
  • src/app/endpoints/streaming_query.py
  • src/app/main.py
  • src/metrics/__init__.py
  • src/metrics/recording.py
  • src/utils/responses.py
  • src/utils/shields.py
  • tests/unit/app/endpoints/test_streaming_query.py
  • tests/unit/app/test_main_middleware.py
  • tests/unit/metrics/test_recording.py
  • tests/unit/utils/test_responses.py
  • tests/unit/utils/test_shields.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: Pyright
  • GitHub Check: build-pr
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: Pylinter
  • GitHub Check: unit_tests (3.13)
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: mypy
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Import FastAPI dependencies with: from fastapi import APIRouter, HTTPException, Request, status, Depends
Import Llama Stack client with: from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for parameters and return types in functions
Use typing_extensions.Self for model validators in Pydantic models
Use modern union type syntax str | int instead of Union[str, int]
Use Optional[Type] for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use standard log levels with clear purposes: debug, info, warning, error
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use ABC for abstract base classes with @abstractmethod decorators
Use @model_validator and @field_validator for Pydantic model validation
Complete type annotations for all class attributes; use specific types, not Any
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections

Files:

  • src/app/endpoints/rlsapi_v1.py
  • src/app/endpoints/streaming_query.py
  • src/utils/shields.py
  • src/metrics/__init__.py
  • src/app/main.py
  • src/metrics/recording.py
  • src/utils/responses.py
  • tests/unit/metrics/test_recording.py
  • tests/unit/app/endpoints/test_streaming_query.py
  • tests/unit/utils/test_responses.py
  • tests/unit/utils/test_shields.py
  • tests/unit/app/test_main_middleware.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use FastAPI HTTPException with appropriate status codes for API endpoints

Files:

  • src/app/endpoints/rlsapi_v1.py
  • src/app/endpoints/streaming_query.py
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models extend ConfigurationBase for config, BaseModel for data models

Files:

  • src/app/endpoints/rlsapi_v1.py
  • src/app/endpoints/streaming_query.py
  • src/utils/shields.py
  • src/metrics/__init__.py
  • src/app/main.py
  • src/metrics/recording.py
  • src/utils/responses.py
**/__init__.py

📄 CodeRabbit inference engine (AGENTS.md)

Package __init__.py files contain brief package descriptions

Files:

  • src/metrics/__init__.py
tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest; pytest is the standard for this project
Use pytest-mock for AsyncMock objects in tests
Use marker pytest.mark.asyncio for async tests
Unit tests require 60% coverage, integration tests 10%

Files:

  • tests/unit/metrics/test_recording.py
  • tests/unit/app/endpoints/test_streaming_query.py
  • tests/unit/utils/test_responses.py
  • tests/unit/utils/test_shields.py
  • tests/unit/app/test_main_middleware.py
🧠 Learnings (6)
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

  • src/app/endpoints/rlsapi_v1.py
  • src/app/endpoints/streaming_query.py
📚 Learning: 2026-04-19T15:40:25.624Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to **/*.py : Import Llama Stack client with: `from llama_stack_client import AsyncLlamaStackClient`

Applied to files:

  • src/app/main.py
  • src/utils/responses.py
📚 Learning: 2026-04-19T15:40:25.624Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to **/*.py : Handle `APIConnectionError` from Llama Stack in error handling

Applied to files:

  • src/utils/responses.py
📚 Learning: 2026-04-07T15:03:11.530Z
Learnt from: jrobertboos
Repo: lightspeed-core/lightspeed-stack PR: 1396
File: src/app/endpoints/conversations_v1.py:6-6
Timestamp: 2026-04-07T15:03:11.530Z
Learning: In the `llama_stack_api` package, all imports MUST use the flat form `from llama_stack_api import <symbol>`. Sub-module imports (e.g., `from llama_stack_api.common.errors import ConversationNotFoundError`) are explicitly NOT supported and considered a code smell, as stated in `llama_stack_api/__init__.py` lines 15-19. Do not flag or suggest changing root-package imports to sub-module imports for this package.

Applied to files:

  • src/utils/responses.py
📚 Learning: 2026-02-23T14:56:59.186Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-192
Timestamp: 2026-02-23T14:56:59.186Z
Learning: In the lightspeed-stack codebase (lightspeed-core/lightspeed-stack), do not enforce de-duplication of duplicate client.models.list() calls in model selection flows (e.g., in src/utils/responses.py prepare_responses_params). These calls are considered relatively cheap and removing duplicates could add unnecessary complexity to the flow. Apply this guideline specifically to this file/context unless similar performance characteristics and design decisions are documented elsewhere.

Applied to files:

  • src/utils/responses.py
📚 Learning: 2026-02-25T07:46:39.608Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:39.608Z
Learning: In the lightspeed-stack codebase, src/models/requests.py uses OpenAIResponseInputTool as Tool while src/models/responses.py uses OpenAIResponseTool as Tool. This type difference is intentional - input tools and output/response tools have different schemas in llama-stack-api.

Applied to files:

  • tests/unit/utils/test_responses.py
🔇 Additional comments (12)
src/metrics/recording.py (1)

38-90: Good defensive isolation on LLM metric writes.

These helper functions correctly prevent metric-recording failures from impacting request/inference flows and keep diagnostics in logs.

src/metrics/__init__.py (1)

45-50: Comment-only clarification looks good.

The wording now clearly matches the sent/received token counters, with no behavioral changes.

tests/unit/app/endpoints/test_streaming_query.py (1)

392-392: Mock target migration is correct.

These patches now align with app.endpoints.streaming_query.recording.record_llm_call, keeping tests consistent with the new metrics facade integration.

Also applies to: 479-479, 577-577, 673-673, 773-773

src/app/endpoints/streaming_query.py (1)

61-61: LLM call instrumentation migration is correctly scoped.

Line 286 preserves existing label inputs while removing direct metric coupling from endpoint logic.

Also applies to: 286-286

src/app/endpoints/rlsapi_v1.py (1)

26-26: Inference failure metric recording is correctly centralized.

Line 450 keeps provider/model labeling semantics intact while routing through the recording facade.

Also applies to: 450-450

src/utils/shields.py (1)

20-20: Validation-error metric paths are consistently centralized.

Both shield-violation branches now record via the same facade helper, which improves instrumentation consistency.

Also applies to: 80-80, 181-181

tests/unit/utils/test_shields.py (1)

57-58: Patch target migration for shield metrics is correct.

These tests now correctly patch/assert utils.shields.recording.record_llm_validation_error across violation and non-violation paths.

Also applies to: 67-67, 71-72, 81-81, 85-86, 95-95, 99-100, 106-106, 162-163, 194-194

src/app/main.py (1)

24-24: Middleware integration with the recording facade is correct.

The updated calls preserve existing behavior while removing direct Prometheus object usage from middleware code.

Also applies to: 185-185, 190-190

tests/unit/app/test_main_middleware.py (1)

4-4: Metrics middleware tests are correctly migrated to the recording facade.

The updated patches/assertions validate the right integration points and preserve expected route/status behavior after the refactor.

Also applies to: 169-173, 182-184, 192-196, 213-214, 223-227, 242-243

tests/unit/metrics/test_recording.py (1)

1-132: New metrics.recording unit tests provide solid coverage for facade behavior.

Success and failure branches are both exercised, including timer context-manager usage and warning-on-failure semantics.

src/utils/responses.py (1)

93-93: extract_token_usage refactor cleanly centralizes metric updates without changing behavior.

The usage-present and usage-absent paths both still record LLM calls correctly, while token counters now route through the shared recording facade.

Also applies to: 925-926, 937-945

tests/unit/utils/test_responses.py (1)

2227-2230: Token usage tests are correctly aligned with the new recording facade API.

The patched targets and call assertions now match the production integration points introduced by the refactor.

Also applies to: 2236-2240, 2247-2248, 2253-2254, 2265-2269, 2273-2275, 2282-2283, 2287-2288

Comment thread src/metrics/recording.py Outdated
Comment thread tests/unit/utils/test_shields.py Outdated
Signed-off-by: Major Hayden <major@redhat.com>
@major major force-pushed the metrics-recording-refactor branch from 27816d5 to 7a76f83 Compare April 28, 2026 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant