-
Notifications
You must be signed in to change notification settings - Fork 354
Description
Describe the bug:
We're using the logging-operator within a specific namespace, and have an unrelated application in another namespace that has created a large amount of secrets (in number and in content). The logging-operator gets into OOMKilled every time it starts up; removing all of those secrets makes this problem go away; recreating all of those secrets makes it return to OOMKilled; gradually creating those secrets allows us to see the memory consumption of the operator process inside the pod creep up.
I'm not experienced enough to submit a PR to fix it myself, but my semi-educated guess is that the problem is in https://github.com/kube-logging/logging-operator/blob/master/main.go#L337-L360 , as it looks like for all object types not explicitly mentioned there (such as secrets), the cache defaults to cluster-scope. I think that if you add a DefaultNamespaces entry as well, that will be used as a fallback for other resources.
Expected behaviour:
logging-operator should be consuming memory to 'remember' only those objects within the namespace it has been told to watch, such that other misbehaving applications on the cluster cannot affect its memory consumption.
Steps to reproduce the bug:
Create a running instance of the logging-operator pod (version 4.5.3) in namespace A, running with arguments -enable-leader-election=false -watch-namespace=A , and create several thousand secrets in namespace B. Regardless of which order you do these two things in, memory usage of the operator process will creep up, and reach OOMKilled - assuming you're using the default limit (via helm) of 128M, and are enthusiastic about secret creation.
Environment details:
- Kubernetes version: v1.25.13+k3s1
- Local Linux host (not cloud)
- logging-operator version:4.5.3
- Install method: official helm charts
- Logs; I've attached the HTTP logs from around the time of the operator starting up, showing that it's doing a LIST of multiple resources types within a namespace, but a LIST for other resource types (including secrets) at cluster-scope. In this particular run, the namespace containing the operator was
c8yedge, the namespace containing secrets wasother, but in this case we'd removed most of the secrets such that the bug wasn't triggered. k3s-filtered.log - Resource definition: pod.txt
/kind bug