Skip to content

fix(metrics): correct PromQL for Active Sessions and recording rules …#10

Merged
like-a-freedom merged 1 commit intomasterfrom
fix/promql-active-sessions
Dec 20, 2025
Merged

fix(metrics): correct PromQL for Active Sessions and recording rules …#10
like-a-freedom merged 1 commit intomasterfrom
fix/promql-active-sessions

Conversation

@like-a-freedom
Copy link
Copy Markdown
Owner

…(count_over_time -> sum by(session_hash))

Copilot AI review requested due to automatic review settings December 20, 2025 11:44
@like-a-freedom like-a-freedom merged commit f9e0b87 into master Dec 20, 2025
8 checks passed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical bug in the PromQL queries used to calculate active session counts across recording rules and dashboard panels. The primary issue addressed is that trackly_session_heartbeat is a counter metric with multiple labels (session_hash and activity), which requires aggregating across activities before counting unique sessions.

Key Changes:

  • Fixed session counting logic by wrapping count_over_time with sum by (session_hash) to aggregate across activity labels before counting
  • Removed duplicate fieldConfig blocks in the dashboard JSON (cleanup)
  • Added missing comma in JSON syntax

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
observability/recording_rules.yml Updated PromQL expressions for all session-related recording rules (sessions_active_, sessions_churn_risk, retention_, new_sessions_*) to use sum by (session_hash) aggregation before counting
observability/dashboards/product-metrics.json Updated Active Sessions dashboard queries to match recording rules pattern, removed duplicate fieldConfig blocks, fixed JSON syntax with missing comma

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- record: sessions_active_30d
expr: |
count(count_over_time(trackly_session_heartbeat[30d]) by (session_hash))
count((sum by (session_hash) (count_over_time(trackly_session_heartbeat[30d]))) > 0)
Copy link

Copilot AI Dec 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filtering condition > 0 is redundant in this context. The count() function already excludes time series with no samples. Since sum by (session_hash) (count_over_time(...)) will only produce results for session_hashes that have at least one sample, the > 0 filter doesn't change the result. Consider simplifying to count(sum by (session_hash) (count_over_time(trackly_session_heartbeat[30d]))) for clarity and consistency with the other recording rules that use sum by (session_hash) (count_over_time(...)) without the > 0 filter (e.g., lines 27, 29, 58-81).

Copilot uses AI. Check for mistakes.
"targets": [
{
"expr": "count(count_over_time(trackly_session_heartbeat{job=\"trackly_backend\"}[24h]) by (session_hash))",
"expr": "count((sum by (session_hash) (count_over_time(trackly_session_heartbeat{job=\"trackly_backend\"}[24h]))) > 0)",
Copy link

Copilot AI Dec 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filtering condition > 0 is redundant in this context. The count() function already excludes time series with no samples. Since sum by (session_hash) (count_over_time(...)) will only produce results for session_hashes that have at least one sample, the > 0 filter doesn't change the result. Consider simplifying to count(sum by (session_hash) (count_over_time(trackly_session_heartbeat{job="trackly_backend"}[24h]))) for clarity and consistency with the recording rules that use sum by (session_hash) (count_over_time(...)) without the > 0 filter.

Suggested change
"expr": "count((sum by (session_hash) (count_over_time(trackly_session_heartbeat{job=\"trackly_backend\"}[24h]))) > 0)",
"expr": "count(sum by (session_hash) (count_over_time(trackly_session_heartbeat{job=\"trackly_backend\"}[24h])))",

Copilot uses AI. Check for mistakes.
"targets": [
{
"expr": "count(count_over_time(trackly_session_heartbeat{job=\"trackly_backend\"}[7d]) by (session_hash))",
"expr": "count((sum by (session_hash) (count_over_time(trackly_session_heartbeat{job=\"trackly_backend\"}[7d]))) > 0)",
Copy link

Copilot AI Dec 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filtering condition > 0 is redundant in this context. The count() function already excludes time series with no samples. Since sum by (session_hash) (count_over_time(...)) will only produce results for session_hashes that have at least one sample, the > 0 filter doesn't change the result. Consider simplifying to count(sum by (session_hash) (count_over_time(trackly_session_heartbeat{job="trackly_backend"}[7d]))) for clarity and consistency with the recording rules that use sum by (session_hash) (count_over_time(...)) without the > 0 filter.

Suggested change
"expr": "count((sum by (session_hash) (count_over_time(trackly_session_heartbeat{job=\"trackly_backend\"}[7d]))) > 0)",
"expr": "count(sum by (session_hash) (count_over_time(trackly_session_heartbeat{job=\"trackly_backend\"}[7d])))",

Copilot uses AI. Check for mistakes.
"targets": [
{
"expr": "count(count_over_time(trackly_session_heartbeat{job=\"trackly_backend\"}[30d]) by (session_hash))",
"expr": "count((sum by (session_hash) (count_over_time(trackly_session_heartbeat{job=\"trackly_backend\"}[30d]))) > 0)",
Copy link

Copilot AI Dec 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filtering condition > 0 is redundant in this context. The count() function already excludes time series with no samples. Since sum by (session_hash) (count_over_time(...)) will only produce results for session_hashes that have at least one sample, the > 0 filter doesn't change the result. Consider simplifying to count(sum by (session_hash) (count_over_time(trackly_session_heartbeat{job="trackly_backend"}[30d]))) for clarity and consistency with the recording rules that use sum by (session_hash) (count_over_time(...)) without the > 0 filter.

Suggested change
"expr": "count((sum by (session_hash) (count_over_time(trackly_session_heartbeat{job=\"trackly_backend\"}[30d]))) > 0)",
"expr": "count(sum by (session_hash) (count_over_time(trackly_session_heartbeat{job=\"trackly_backend\"}[30d])))",

Copilot uses AI. Check for mistakes.
- record: sessions_active_1d
expr: |
count(count_over_time(trackly_session_heartbeat[1d]) by (session_hash))
count((sum by (session_hash) (count_over_time(trackly_session_heartbeat[1d]))) > 0)
Copy link

Copilot AI Dec 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filtering condition > 0 is redundant in this context. The count() function already excludes time series with no samples. Since sum by (session_hash) (count_over_time(...)) will only produce results for session_hashes that have at least one sample, the > 0 filter doesn't change the result. Consider simplifying to count(sum by (session_hash) (count_over_time(trackly_session_heartbeat[1d]))) for clarity and consistency with the other recording rules that use sum by (session_hash) (count_over_time(...)) without the > 0 filter (e.g., lines 27, 29, 58-81).

Copilot uses AI. Check for mistakes.
- record: sessions_active_7d
expr: |
count(count_over_time(trackly_session_heartbeat[7d]) by (session_hash))
count((sum by (session_hash) (count_over_time(trackly_session_heartbeat[7d]))) > 0)
Copy link

Copilot AI Dec 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filtering condition > 0 is redundant in this context. The count() function already excludes time series with no samples. Since sum by (session_hash) (count_over_time(...)) will only produce results for session_hashes that have at least one sample, the > 0 filter doesn't change the result. Consider simplifying to count(sum by (session_hash) (count_over_time(trackly_session_heartbeat[7d]))) for clarity and consistency with the other recording rules that use sum by (session_hash) (count_over_time(...)) without the > 0 filter (e.g., lines 27, 29, 58-81).

Copilot uses AI. Check for mistakes.
@like-a-freedom like-a-freedom deleted the fix/promql-active-sessions branch December 31, 2025 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants