fix(metrics): correct PromQL for Active Sessions and recording rules …#10
fix(metrics): correct PromQL for Active Sessions and recording rules …#10like-a-freedom merged 1 commit intomasterfrom
Conversation
…(count_over_time -> sum by(session_hash))
There was a problem hiding this comment.
Pull request overview
This PR fixes a critical bug in the PromQL queries used to calculate active session counts across recording rules and dashboard panels. The primary issue addressed is that trackly_session_heartbeat is a counter metric with multiple labels (session_hash and activity), which requires aggregating across activities before counting unique sessions.
Key Changes:
- Fixed session counting logic by wrapping
count_over_timewithsum by (session_hash)to aggregate across activity labels before counting - Removed duplicate
fieldConfigblocks in the dashboard JSON (cleanup) - Added missing comma in JSON syntax
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| observability/recording_rules.yml | Updated PromQL expressions for all session-related recording rules (sessions_active_, sessions_churn_risk, retention_, new_sessions_*) to use sum by (session_hash) aggregation before counting |
| observability/dashboards/product-metrics.json | Updated Active Sessions dashboard queries to match recording rules pattern, removed duplicate fieldConfig blocks, fixed JSON syntax with missing comma |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - record: sessions_active_30d | ||
| expr: | | ||
| count(count_over_time(trackly_session_heartbeat[30d]) by (session_hash)) | ||
| count((sum by (session_hash) (count_over_time(trackly_session_heartbeat[30d]))) > 0) |
There was a problem hiding this comment.
The filtering condition > 0 is redundant in this context. The count() function already excludes time series with no samples. Since sum by (session_hash) (count_over_time(...)) will only produce results for session_hashes that have at least one sample, the > 0 filter doesn't change the result. Consider simplifying to count(sum by (session_hash) (count_over_time(trackly_session_heartbeat[30d]))) for clarity and consistency with the other recording rules that use sum by (session_hash) (count_over_time(...)) without the > 0 filter (e.g., lines 27, 29, 58-81).
| "targets": [ | ||
| { | ||
| "expr": "count(count_over_time(trackly_session_heartbeat{job=\"trackly_backend\"}[24h]) by (session_hash))", | ||
| "expr": "count((sum by (session_hash) (count_over_time(trackly_session_heartbeat{job=\"trackly_backend\"}[24h]))) > 0)", |
There was a problem hiding this comment.
The filtering condition > 0 is redundant in this context. The count() function already excludes time series with no samples. Since sum by (session_hash) (count_over_time(...)) will only produce results for session_hashes that have at least one sample, the > 0 filter doesn't change the result. Consider simplifying to count(sum by (session_hash) (count_over_time(trackly_session_heartbeat{job="trackly_backend"}[24h]))) for clarity and consistency with the recording rules that use sum by (session_hash) (count_over_time(...)) without the > 0 filter.
| "expr": "count((sum by (session_hash) (count_over_time(trackly_session_heartbeat{job=\"trackly_backend\"}[24h]))) > 0)", | |
| "expr": "count(sum by (session_hash) (count_over_time(trackly_session_heartbeat{job=\"trackly_backend\"}[24h])))", |
| "targets": [ | ||
| { | ||
| "expr": "count(count_over_time(trackly_session_heartbeat{job=\"trackly_backend\"}[7d]) by (session_hash))", | ||
| "expr": "count((sum by (session_hash) (count_over_time(trackly_session_heartbeat{job=\"trackly_backend\"}[7d]))) > 0)", |
There was a problem hiding this comment.
The filtering condition > 0 is redundant in this context. The count() function already excludes time series with no samples. Since sum by (session_hash) (count_over_time(...)) will only produce results for session_hashes that have at least one sample, the > 0 filter doesn't change the result. Consider simplifying to count(sum by (session_hash) (count_over_time(trackly_session_heartbeat{job="trackly_backend"}[7d]))) for clarity and consistency with the recording rules that use sum by (session_hash) (count_over_time(...)) without the > 0 filter.
| "expr": "count((sum by (session_hash) (count_over_time(trackly_session_heartbeat{job=\"trackly_backend\"}[7d]))) > 0)", | |
| "expr": "count(sum by (session_hash) (count_over_time(trackly_session_heartbeat{job=\"trackly_backend\"}[7d])))", |
| "targets": [ | ||
| { | ||
| "expr": "count(count_over_time(trackly_session_heartbeat{job=\"trackly_backend\"}[30d]) by (session_hash))", | ||
| "expr": "count((sum by (session_hash) (count_over_time(trackly_session_heartbeat{job=\"trackly_backend\"}[30d]))) > 0)", |
There was a problem hiding this comment.
The filtering condition > 0 is redundant in this context. The count() function already excludes time series with no samples. Since sum by (session_hash) (count_over_time(...)) will only produce results for session_hashes that have at least one sample, the > 0 filter doesn't change the result. Consider simplifying to count(sum by (session_hash) (count_over_time(trackly_session_heartbeat{job="trackly_backend"}[30d]))) for clarity and consistency with the recording rules that use sum by (session_hash) (count_over_time(...)) without the > 0 filter.
| "expr": "count((sum by (session_hash) (count_over_time(trackly_session_heartbeat{job=\"trackly_backend\"}[30d]))) > 0)", | |
| "expr": "count(sum by (session_hash) (count_over_time(trackly_session_heartbeat{job=\"trackly_backend\"}[30d])))", |
| - record: sessions_active_1d | ||
| expr: | | ||
| count(count_over_time(trackly_session_heartbeat[1d]) by (session_hash)) | ||
| count((sum by (session_hash) (count_over_time(trackly_session_heartbeat[1d]))) > 0) |
There was a problem hiding this comment.
The filtering condition > 0 is redundant in this context. The count() function already excludes time series with no samples. Since sum by (session_hash) (count_over_time(...)) will only produce results for session_hashes that have at least one sample, the > 0 filter doesn't change the result. Consider simplifying to count(sum by (session_hash) (count_over_time(trackly_session_heartbeat[1d]))) for clarity and consistency with the other recording rules that use sum by (session_hash) (count_over_time(...)) without the > 0 filter (e.g., lines 27, 29, 58-81).
| - record: sessions_active_7d | ||
| expr: | | ||
| count(count_over_time(trackly_session_heartbeat[7d]) by (session_hash)) | ||
| count((sum by (session_hash) (count_over_time(trackly_session_heartbeat[7d]))) > 0) |
There was a problem hiding this comment.
The filtering condition > 0 is redundant in this context. The count() function already excludes time series with no samples. Since sum by (session_hash) (count_over_time(...)) will only produce results for session_hashes that have at least one sample, the > 0 filter doesn't change the result. Consider simplifying to count(sum by (session_hash) (count_over_time(trackly_session_heartbeat[7d]))) for clarity and consistency with the other recording rules that use sum by (session_hash) (count_over_time(...)) without the > 0 filter (e.g., lines 27, 29, 58-81).
…(count_over_time -> sum by(session_hash))