Skip to content

fix(kafka): add message processing failure counter#477

Open
dscervantes wants to merge 3 commits intoRedHatInsights:masterfrom
dscervantes:bot/RHCLOUD-45237
Open

fix(kafka): add message processing failure counter#477
dscervantes wants to merge 3 commits intoRedHatInsights:masterfrom
dscervantes:bot/RHCLOUD-45237

Conversation

@dscervantes
Copy link
Copy Markdown
Contributor

Summary

  • Wire up the existing payload_tracker_message_process_errors Prometheus counter (IncMessageProcessErrors()) at every failure return path in the Kafka message handler
  • Covers: JSON unmarshal errors, payload upsert failures, status/service/source table entry creation errors, and DB insert retry exhaustion
  • The counter and helper function already existed in internal/endpoints/metrics.go but were never called

RHCLOUD-45237

Test plan

  • Verify go build ./... passes (confirmed locally)
  • Verify go vet ./... shows no new warnings (confirmed — existing warnings are pre-existing)
  • CI tests pass (requires DB — runs in CI)
  • Confirm payload_tracker_message_process_errors metric increments on message processing failures in staging

🤖 Generated with Claude Code

RHCLOUD-45237
Wire up the existing IncMessageProcessErrors() prometheus counter at
every failure return path in the kafka message handler: JSON unmarshal
errors, payload upsert failures, status/service/source table entry
creation errors, and DB insert retry exhaustion.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread internal/kafka/handler.go
RHCLOUD-45237
Also count invalid request ID as a message processing error for
consistency with all other early-return error paths in onMessage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
skarekrow
skarekrow previously approved these changes Apr 30, 2026
Copy link
Copy Markdown
Contributor

@dehort dehort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of comments. Other than that, it looks good to me!

Comment thread internal/kafka/handler.go Outdated
} else {
l.Log.Error("ERROR: Unmarshaling Payload Status Event: ", err)
}
endpoints.IncMessageProcessErrors()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be IncInvalidConsumerRequestIDs().

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — changed to IncInvalidConsumerRequestIDs(). Also moved the counter out of validateRequestID() so the validation function stays side-effect-free and the counter is only at the call site (avoids double-counting). Fixed in 08e148d.

Comment thread internal/kafka/handler.go Outdated
}

if !validateRequestID(cfg.RequestConfig.ValidateRequestIDLength, payloadStatus.RequestID) {
endpoints.IncMessageProcessErrors()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be IncInvalidConsumerRequestIDs().

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — changed to IncInvalidConsumerRequestIDs() here as well. Since validateRequestID() was calling it internally, I removed it from there to avoid double-counting on the same counter.

RHCLOUD-45237
Use IncInvalidConsumerRequestIDs() instead of IncMessageProcessErrors()
for unmarshal and request ID validation failures. Move counter out of
validateRequestID to the call site to avoid double-counting and keep
the validation function free of side effects.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants