Skip to content

feat: Kafka bootstrap and runtime health probes#76

Open
bigbluechief wants to merge 1 commit intodevelopfrom
feature/CT-2384_kafka_health_probes
Open

feat: Kafka bootstrap and runtime health probes#76
bigbluechief wants to merge 1 commit intodevelopfrom
feature/CT-2384_kafka_health_probes

Conversation

@bigbluechief
Copy link
Copy Markdown
Contributor

@bigbluechief bigbluechief commented Mar 23, 2026

Introduce Kafka-aware readiness and liveness health handling for the consumer.

Readiness is now based on initial Kafka bootstrap instead of a fixed startup delay. The application stays unready until the blocking listeners have consumed up to their startup end offsets, and then remains ready for the rest of the pod lifetime. This includes both the entity listener and the relation-update listener.

Liveness is now separated from bootstrap and tracks Kafka runtime health for registered listeners. It reacts to Spring Kafka runtime events such as non-responsive consumers, failed starts and stopped consumers, while using a grace period to avoid false positives from short interruptions. Normal lag and quiet topics do not make the pod unhealthy.

Also add Micrometer metrics for bootstrap progress and runtime Kafka health, including bootstrap duration, pending partitions, runtime problem counters and unhealthy state gauges.

Update actuator health group configuration and add documentation for the new startup/readiness/liveness model, Kafka-specific health behavior, metrics and Kubernetes probe configuration.

Introduce Kafka-aware readiness and liveness health handling for the
consumer.

Readiness is now based on initial Kafka bootstrap instead of a fixed
startup delay. The application stays unready until the blocking
listeners have consumed up to their startup end offsets, and then
remains ready for the rest of the pod lifetime. This includes both the
entity listener and the relation-update listener.

Liveness is now separated from bootstrap and tracks Kafka runtime health
for registered listeners. It reacts to Spring Kafka runtime events such
as non-responsive consumers, failed starts and stopped consumers, while
using a grace period to avoid false positives from short interruptions.
Normal lag and quiet topics do not make the pod unhealthy.

Also add Micrometer metrics for bootstrap progress and runtime Kafka
health, including bootstrap duration, pending partitions, runtime
problem counters and unhealthy state gauges.

Update actuator health group configuration and add documentation for the
new startup/readiness/liveness model, Kafka-specific health behavior,
metrics and Kubernetes probe configuration.
@bigbluechief bigbluechief requested review from alstad and nozoz March 23, 2026 08:56
@bigbluechief bigbluechief self-assigned this Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant