Skip to content

Add metrics for failed topic load operation #18963

@codelipenghui

Description

@codelipenghui

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

Currently, we have topic load-related metrics like the followings:

topic_load_times{cluster="standalone",quantile="0.5"} 140.0
topic_load_times{cluster="standalone",quantile="0.75"} 183.0
topic_load_times{cluster="standalone",quantile="0.95"} 249.0
topic_load_times{cluster="standalone",quantile="0.99"} 249.0
topic_load_times{cluster="standalone",quantile="0.999"} 249.0
topic_load_times{cluster="standalone",quantile="0.9999"} 249.0
topic_load_times_count{cluster="standalone"} 6.0
topic_load_times_sum{cluster="standalone"} 955.0
topic_load_times_created{cluster="standalone"} 1.671240308864E9

But we are not able to detect if there are topics that failed to load due to
zookeeper/bookkeeper problems.

It's better to add new metrics for the topic load failed operation so that users
can add alerts based on the metrics.

Solution

Add topic_load_failed_count metrics

Alternatives

No response

Anything else?

The metrics changes requires a proposal

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

Labels

Staletype/enhancementThe enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions