[8.19](backport #48836) Fix packetbeat cache janitor goroutine leak#49800
[8.19](backport #48836) Fix packetbeat cache janitor goroutine leak#49800mergify[bot] wants to merge 1 commit into8.19from
Conversation
* packetbeat: stop cache janitor goroutines on protocol teardown Protocol plugins (dns, tcp, mysql, pgsql, mongodb, thrift, amqp, nfs/rpc, icmp) start cache janitor goroutines via StartJanitor() but never call StopJanitor() when the plugin is destroyed during a configuration reload. Each reload cycle leaks two goroutines (one from the DNS/protocol cache, one from the TCP stream cache) and their associated map allocations (~3 MB per cache with the default 64k-slot hash size). Under Fleet management, where policy revisions trigger frequent runner restarts, this causes unbounded memory growth. Add a PluginCloser interface and Close() methods to all protocol plugins that use caches, and call them from the sniffer cleanup path so janitor goroutines are stopped when the sniffer is torn down. * packetbeat: close pipeline clients when publisher worker exits The TransactionPublisher.worker goroutine exits when p.done is closed during Stop(), but never calls client.Close() on the beat.Client it holds. Each configuration reload creates a new client via CreateReporter() that is never released, leaking pipeline client resources. Add defer client.Close() to the worker so clients are properly released when the publisher stops. * libbeat: make Cache.StopJanitor idempotent StopJanitor closed the janitorQuit channel but never nilled it out, so a second call would panic on closing an already-closed channel. Nil the channel after close so repeated calls are safe. * packetbeat: add goroutine leak regression test for decoder cleanup * packetbeat: fix janitor leaks and decoder lifecycle on dynamic interface changes * changelog: add fragment for janitor and decoder cleanup fixes * changelog: fixup fragment * packetbeat: fix golangci-lint findings in touched files * packetbeat: more linter fixes * filebeat: fix AD memberOf filter test expectations * packetbeat: Add robust synchronization for Start/StopJanitor * packetbeat/thrift: stop publisher goroutine on shutdown Close the thrift publish queue in Close so publishTransactions can exit cleanly and avoid a lingering goroutine after shutdown. Made-with: Cursor * packetbeat: move protocols.Close() to sniffer shutdown protocols.Close() was being called from per-decoder cleanup, but the protocols instance outlives the decoder — it is created once per interface in setupSniffer and reused across decoder rebuilds on link-type changes. Calling Close() on decoder replacement could stop protocol janitors while the analyzers were still in use, and in the case of Thrift could panic on double channel close. Move protocols.Close() to run once after Sniffer.Run() exits, matching the actual lifetime of the protocols object. Decoder-level cleanup (ICMP, TCP, UDP) remains per-decoder as before. (cherry picked from commit 70b37f2) # Conflicts: # packetbeat/beater/processor.go # packetbeat/sniffer/sniffer.go
|
Cherry-pick of 70b37f2 has failed: To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally |
🤖 GitHub commentsJust comment with:
|
|
Pinging @elastic/sec-linux-platform (Team:Security-Linux Platform) |
|
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
TL;DRBuildkite failed because unresolved merge-conflict markers were committed in Remediation
Investigation detailsRoot CauseA bad backport/cherry-pick left conflict markers ( Evidence
Verification
Follow-upAfter resolving the two files, all the many Note 🔒 Integrity filtering filtered 2 itemsIntegrity filtering activated and filtered the following items during workflow execution.
What is this? | From workflow: PR Buildkite Detective Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not. |
Proposed commit message
Checklist
stresstest.shscript to run them under stress conditions and race detector to verify their stability../changelog/fragmentsusing the changelog tool.Disruptive User Impact
Author's Checklist
How to test this PR locally
Related issues
Use cases
Screenshots
Logs
This is an automatic backport of pull request #48836 done by [Mergify](https://mergify.com).