fix: calculate value of total field in OpenObserve logs, traces and spans response using a count query#65
Conversation
…pans response using a count query Signed-off-by: Nilushan Costa <nilushan@wso2.com>
📝 WalkthroughWalkthroughVersion updates for two Helm charts (logs 0.4.1→0.4.2, tracing 0.2.0→0.2.1). Added separate count query execution paths in both logging and tracing clients to retrieve total counts independently from paginated search results, with new query builders and count-response parsing helpers. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
🧹 Nitpick comments (4)
observability-tracing-openobserve/internal/openobserve/client.go (1)
327-342: Consider adding a nil check for defensive programming.The function handles missing hits and missing "total" field gracefully, but will panic if
respisnil. While current callers always pass non-nil responses after successfulexecuteSearchQuery, a nil check would make this helper more robust for future use.♻️ Optional: Add nil check
func extractTotalCount(resp *OpenObserveResponse) int { + if resp == nil { + return 0 + } if len(resp.Hits) > 0 {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@observability-tracing-openobserve/internal/openobserve/client.go` around lines 327 - 342, The helper extractTotalCount should guard against a nil response to avoid panics; update extractTotalCount to check if resp == nil (and return 0) before accessing resp.Hits, keeping the existing logic for handling empty hits and missing "total"; reference the extractTotalCount function and the resp parameter to locate where to add the nil check.observability-logs-openobserve/internal/openobserve/client.go (1)
187-198: Silent fallback to 0 when count extraction fails.The
extractTotalCountfunction returns 0 when it cannot extract the total count (empty hits, missing "total" key, or wrong type). This could mask issues where the count query returns an unexpected response format.Consider logging a warning when extraction fails, to aid debugging:
♻️ Suggested improvement with logging
// extractTotalCount extracts the total count from a count query response. // The response is expected to have hits[0].total as the count value. -func extractTotalCount(resp *OpenObserveResponse) int { +func extractTotalCount(resp *OpenObserveResponse, logger *slog.Logger) int { if len(resp.Hits) > 0 { if total, ok := resp.Hits[0]["total"]; ok { if v, ok := total.(float64); ok { return int(v) } + logger.Warn("Unexpected type for total count", slog.Any("total", total)) } + } else { + logger.Warn("Count query returned no hits") } return 0 }This would require updating the call sites to pass the logger:
TotalCount: extractTotalCount(countResp, c.logger),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@observability-logs-openobserve/internal/openobserve/client.go` around lines 187 - 198, The extractTotalCount function currently swallows failures and returns 0; change its signature to accept a logger (e.g., extractTotalCount(resp *OpenObserveResponse, logger Logger)) and inside, when hits are empty, "total" missing, or type assertion fails, emit a warning via logger with contextual info (e.g., what part failed and resp contents) before returning 0; update all call sites (e.g., where TotalCount: extractTotalCount(countResp, c.logger) is set) to pass c.logger, and ensure the logger type matches the existing c.logger (or adapt to the project's logging interface).observability-logs-openobserve/internal/openobserve/queries.go (1)
171-176: Debug logging usesfmt.Printfinstead of structured logger.The debug logging in both count query functions uses
fmt.Printfwhich bypasses the structured logger and goes directly to stdout. This is inconsistent with the rest of the codebase and makes log aggregation harder.Consider using the logger parameter for consistency:
♻️ Suggested refactor for generateComponentLogsCountQuery (lines 171-176)
if logger.Enabled(nil, slog.LevelDebug) { if prettyJSON, err := json.MarshalIndent(query, "", " "); err == nil { - fmt.Printf("Generated count query for component logs:\n") - fmt.Println(string(prettyJSON)) + logger.Debug("Generated count query for component logs", slog.String("query", string(prettyJSON))) } }♻️ Suggested refactor for generateWorkflowLogsCountQuery (lines 217-222)
if logger.Enabled(nil, slog.LevelDebug) { if prettyJSON, err := json.MarshalIndent(query, "", " "); err == nil { - fmt.Printf("Generated count query for workflow logs:\n") - fmt.Println(string(prettyJSON)) + logger.Debug("Generated count query for workflow logs", slog.String("query", string(prettyJSON))) } }Note: The same pattern exists in the other query generation functions (lines 284-288, 352-357, 436-441), so this is a pre-existing pattern. Consider addressing all of them in a follow-up.
Also applies to: 217-222
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@observability-logs-openobserve/internal/openobserve/queries.go` around lines 171 - 176, Replace the direct stdout debug prints in generateComponentLogsCountQuery and generateWorkflowLogsCountQuery with structured logs using the provided logger: when logger.Enabled(nil, slog.LevelDebug) and json.MarshalIndent succeeds, call the logger's debug-level method (e.g., logger.Debug / logger.Log with slog.LevelDebug) and pass a clear message plus the pretty JSON as a structured field (e.g., "query" or "pretty_query") instead of using fmt.Printf / fmt.Println; apply the same change to the other similar helpers (the other query generation functions flagged) for consistency.observability-logs-openobserve/internal/openobserve/client_test.go (1)
63-155: Consider adding test coverage for count query failures.The current tests cover the happy path where both the search query and count query succeed. However, the client code can fail at the count query stage (lines 226-233 and 266-273 in
client.go). Consider adding tests that verify proper error handling when:
- The count query generation fails
- The count query execution fails (e.g., server returns error for count query but not for main query)
Would you like me to generate test cases for count query failure scenarios?
Also applies to: 175-229
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@observability-logs-openobserve/internal/openobserve/client.go`:
- Around line 187-198: The extractTotalCount function currently swallows
failures and returns 0; change its signature to accept a logger (e.g.,
extractTotalCount(resp *OpenObserveResponse, logger Logger)) and inside, when
hits are empty, "total" missing, or type assertion fails, emit a warning via
logger with contextual info (e.g., what part failed and resp contents) before
returning 0; update all call sites (e.g., where TotalCount:
extractTotalCount(countResp, c.logger) is set) to pass c.logger, and ensure the
logger type matches the existing c.logger (or adapt to the project's logging
interface).
In `@observability-logs-openobserve/internal/openobserve/queries.go`:
- Around line 171-176: Replace the direct stdout debug prints in
generateComponentLogsCountQuery and generateWorkflowLogsCountQuery with
structured logs using the provided logger: when logger.Enabled(nil,
slog.LevelDebug) and json.MarshalIndent succeeds, call the logger's debug-level
method (e.g., logger.Debug / logger.Log with slog.LevelDebug) and pass a clear
message plus the pretty JSON as a structured field (e.g., "query" or
"pretty_query") instead of using fmt.Printf / fmt.Println; apply the same change
to the other similar helpers (the other query generation functions flagged) for
consistency.
In `@observability-tracing-openobserve/internal/openobserve/client.go`:
- Around line 327-342: The helper extractTotalCount should guard against a nil
response to avoid panics; update extractTotalCount to check if resp == nil (and
return 0) before accessing resp.Hits, keeping the existing logic for handling
empty hits and missing "total"; reference the extractTotalCount function and the
resp parameter to locate where to add the nil check.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d6293c27-2e35-49db-98e5-e1dbf3b6490c
⛔ Files ignored due to path filters (4)
observability-logs-openobserve/internal/api/gen/models.gen.gois excluded by!**/gen/**observability-logs-openobserve/internal/api/gen/server.gen.gois excluded by!**/gen/**observability-tracing-openobserve/internal/api/gen/models.gen.gois excluded by!**/gen/**observability-tracing-openobserve/internal/api/gen/server.gen.gois excluded by!**/gen/**
📒 Files selected for processing (10)
observability-logs-openobserve/README.mdobservability-logs-openobserve/helm/Chart.yamlobservability-logs-openobserve/internal/openobserve/client.goobservability-logs-openobserve/internal/openobserve/client_test.goobservability-logs-openobserve/internal/openobserve/queries.goobservability-tracing-openobserve/README.mdobservability-tracing-openobserve/helm/Chart.yamlobservability-tracing-openobserve/internal/openobserve/client.goobservability-tracing-openobserve/internal/openobserve/client_test.goobservability-tracing-openobserve/internal/openobserve/queries.go
Purpose
The total field in query responses for both traces and logs endpoints returns an incorrect count that is capped at the limit parameter value, rather than the true number of matching records. This breaks pagination logic for API consumers.
openchoreo/openchoreo#2889
Goals
Return the accurate total count of matching records in the total field, independent of the limit/size parameter, for all query endpoints across both the tracing and logs modules.
Approach
Each query endpoint now executes a separate count query against OpenObserve's _search API with size: 0 and a SELECT count(*) as total SQL statement (using count(distinct trace_id) for traces) with the same filter conditions as the original query. The true total is extracted from hits[0].total in the count response.
Summary by CodeRabbit
Chores
Documentation
New Features