Description
The Kubernetes resource collector fetches metrics by making individual
GET requests
to metrics.k8s.io for each pod, instead of a single LIST call. With a
large number
of pods this causes client-go's default rate limiter (5 QPS / burst 10)
to be exceeded,
resulting in continuous warnings:
"Waited before sending request" delay="1.98s" reason="client-side
throttling,
not priority and fairness" verb="GET"
URL=".../apis/metrics.k8s.io/v1beta1/namespaces//pods/"
Steps to reproduce
Deploy maintenant on a cluster with 50+ pods. With the default resource
collector
interval of 10s and ~70 pods, the effective request rate is ~7 req/s,
consistently
exceeding the client-go threshold.
Expected behaviour
Use a single LIST request per namespace (or cluster-wide):
GET /apis/metrics.k8s.io/v1beta1/pods
or
GET /apis/metrics.k8s.io/v1beta1/namespaces/{ns}/pods
This would reduce the number of API calls from O(pods) to O(namespaces) or
O(1)
per collection cycle, staying well within the default rate limit.
Impact
- Warnings flood the logs continuously (every 10s)
- Metrics collection is artificially delayed by ~2s per throttled request
- Workaround: exclude namespaces via MAINTENANT_K8S_EXCLUDE_NAMESPACES to
reduce
pod count, but this only mitigates the issue
Environment
- maintenant: ghcr.io/kolapsis/maintenant:latest
- Kubernetes: k3s
- Pod count: ~70
Description
The Kubernetes resource collector fetches metrics by making individual
GETrequeststo
metrics.k8s.iofor each pod, instead of a singleLISTcall. With alarge number
of pods this causes
client-go's default rate limiter (5 QPS / burst 10)to be exceeded,
resulting in continuous warnings:
"Waited before sending request" delay="1.98s" reason="client-side
throttling,
not priority and fairness" verb="GET"
URL=".../apis/metrics.k8s.io/v1beta1/namespaces//pods/"
Steps to reproduce
Deploy maintenant on a cluster with 50+ pods. With the default resource
collector
interval of 10s and ~70 pods, the effective request rate is ~7 req/s,
consistently
exceeding the client-go threshold.
Expected behaviour
Use a single LIST request per namespace (or cluster-wide):
GET /apis/metrics.k8s.io/v1beta1/pods
or
GET /apis/metrics.k8s.io/v1beta1/namespaces/{ns}/pods
This would reduce the number of API calls from O(pods) to O(namespaces) or
O(1)
per collection cycle, staying well within the default rate limit.
Impact
reduce
pod count, but this only mitigates the issue
Environment