Skip to content

[8.19](backport #48836) Fix packetbeat cache janitor goroutine leak#49800

Open
mergify[bot] wants to merge 1 commit into8.19from
mergify/bp/8.19/pr-48836
Open

[8.19](backport #48836) Fix packetbeat cache janitor goroutine leak#49800
mergify[bot] wants to merge 1 commit into8.19from
mergify/bp/8.19/pr-48836

Conversation

@mergify
Copy link
Copy Markdown
Contributor

@mergify mergify bot commented Mar 31, 2026

Proposed commit message

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works. Where relevant, I have used the stresstest.sh script to run them under stress conditions and race detector to verify their stability.
  • I have added an entry in ./changelog/fragments using the changelog tool.

Disruptive User Impact

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Use cases

Screenshots

Logs


This is an automatic backport of pull request #48836 done by [Mergify](https://mergify.com).

* packetbeat: stop cache janitor goroutines on protocol teardown

Protocol plugins (dns, tcp, mysql, pgsql, mongodb, thrift, amqp,
nfs/rpc, icmp) start cache janitor goroutines via StartJanitor() but
never call StopJanitor() when the plugin is destroyed during a
configuration reload. Each reload cycle leaks two goroutines (one from
the DNS/protocol cache, one from the TCP stream cache) and their
associated map allocations (~3 MB per cache with the default 64k-slot
hash size). Under Fleet management, where policy revisions trigger
frequent runner restarts, this causes unbounded memory growth.

Add a PluginCloser interface and Close() methods to all protocol
plugins that use caches, and call them from the sniffer cleanup path
so janitor goroutines are stopped when the sniffer is torn down.

* packetbeat: close pipeline clients when publisher worker exits

The TransactionPublisher.worker goroutine exits when p.done is closed
during Stop(), but never calls client.Close() on the beat.Client it
holds. Each configuration reload creates a new client via
CreateReporter() that is never released, leaking pipeline client
resources.

Add defer client.Close() to the worker so clients are properly
released when the publisher stops.

* libbeat: make Cache.StopJanitor idempotent

StopJanitor closed the janitorQuit channel but never nilled it out,
so a second call would panic on closing an already-closed channel.
Nil the channel after close so repeated calls are safe.

* packetbeat: add goroutine leak regression test for decoder cleanup

* packetbeat: fix janitor leaks and decoder lifecycle on dynamic interface changes

* changelog: add fragment for janitor and decoder cleanup fixes

* changelog: fixup fragment

* packetbeat: fix golangci-lint findings in touched files

* packetbeat: more linter fixes

* filebeat: fix AD memberOf filter test expectations

* packetbeat: Add robust synchronization for Start/StopJanitor

* packetbeat/thrift: stop publisher goroutine on shutdown

Close the thrift publish queue in Close so publishTransactions can exit cleanly and avoid a lingering goroutine after shutdown.

Made-with: Cursor

* packetbeat: move protocols.Close() to sniffer shutdown

protocols.Close() was being called from per-decoder cleanup, but the
protocols instance outlives the decoder — it is created once per
interface in setupSniffer and reused across decoder rebuilds on
link-type changes. Calling Close() on decoder replacement could stop
protocol janitors while the analyzers were still in use, and in the
case of Thrift could panic on double channel close.

Move protocols.Close() to run once after Sniffer.Run() exits, matching
the actual lifetime of the protocols object. Decoder-level cleanup
(ICMP, TCP, UDP) remains per-decoder as before.

(cherry picked from commit 70b37f2)

# Conflicts:
#	packetbeat/beater/processor.go
#	packetbeat/sniffer/sniffer.go
@mergify mergify bot added the backport label Mar 31, 2026
@mergify mergify bot requested a review from a team as a code owner March 31, 2026 10:32
@mergify mergify bot added the conflicts There is a conflict in the backported pull request label Mar 31, 2026
@mergify mergify bot requested a review from a team as a code owner March 31, 2026 10:32
@mergify mergify bot requested review from faec and khushijain21 and removed request for a team March 31, 2026 10:32
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Mar 31, 2026
@mergify
Copy link
Copy Markdown
Contributor Author

mergify bot commented Mar 31, 2026

Cherry-pick of 70b37f2 has failed:

On branch mergify/bp/8.19/pr-48836
Your branch is up to date with 'origin/8.19'.

You are currently cherry-picking commit 70b37f21a.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	new file:   changelog/fragments/1771343134-packetbeat-libbeat-janitor-cleanup-lifecycle-fix.yaml
	modified:   libbeat/common/cache.go
	modified:   packetbeat/protos/amqp/amqp.go
	modified:   packetbeat/protos/dns/dns.go
	modified:   packetbeat/protos/icmp/icmp.go
	modified:   packetbeat/protos/icmp/icmp_test.go
	modified:   packetbeat/protos/mongodb/mongodb.go
	modified:   packetbeat/protos/mysql/mysql.go
	modified:   packetbeat/protos/nfs/rpc.go
	modified:   packetbeat/protos/pgsql/pgsql.go
	modified:   packetbeat/protos/protos.go
	modified:   packetbeat/protos/tcp/tcp.go
	modified:   packetbeat/protos/thrift/thrift.go
	modified:   packetbeat/publish/publish.go
	modified:   packetbeat/sniffer/decoders.go
	new file:   packetbeat/sniffer/decoders_test.go
	modified:   packetbeat/sniffer/sniffer_test.go

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   packetbeat/beater/processor.go
	both modified:   packetbeat/sniffer/sniffer.go

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

@github-actions
Copy link
Copy Markdown
Contributor

🤖 GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

@github-actions github-actions bot added Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team bugfix Team:Security-Linux Platform Linux Platform Team in Security Solution labels Mar 31, 2026
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Mar 31, 2026
@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/sec-linux-platform (Team:Security-Linux Platform)

@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@github-actions
Copy link
Copy Markdown
Contributor

TL;DR

Buildkite failed because unresolved merge-conflict markers were committed in packetbeat/sniffer/sniffer.go and packetbeat/beater/processor.go. Remove the conflict blocks and keep a single resolved version, then rerun Packetbeat checks.

Remediation

  • Resolve conflict markers in:
    • packetbeat/sniffer/sniffer.go
    • packetbeat/beater/processor.go
  • Re-run locally:
    • pre-commit run --all-files
    • make -C packetbeat check update && make check-no-changes
    • make -C x-pack/packetbeat check update && make check-no-changes
Investigation details

Root Cause

A bad backport/cherry-pick left conflict markers (<<<<<<<, =======, >>>>>>>) in committed source files. This is a code/integration error (merge resolution incomplete), not infra flakiness.

Evidence

  • Build: https://buildkite.com/elastic/beats/builds/43279
  • Key failing steps:
    • Packetbeat: Run pre-commit
    • Packetbeat: Run check/update
    • x-pack/packetbeat: Run pre-commit
    • x-pack/packetbeat: Run check/update
  • Log excerpts:
    • packetbeat/sniffer/sniffer.go:94: Merge conflict string '<<<<<<<' found
    • packetbeat/sniffer/sniffer.go:96: Merge conflict string '=======' found
    • packetbeat/sniffer/sniffer.go:98: Merge conflict string '>>>>>>>' found
    • packetbeat/beater/processor.go:251: Merge conflict string '<<<<<<<' found
    • packetbeat/beater/processor.go:253: Merge conflict string '=======' found
    • packetbeat/beater/processor.go:255: Merge conflict string '>>>>>>>' found
    • go vet/formatting then fail with syntax errors in the same files (unexpected <<, ==, >>).
  • Commit metadata for 0f0307abf7d44abc4992a8f55dc4eb3539364aba also includes:
    • # Conflicts:
    • packetbeat/beater/processor.go
    • packetbeat/sniffer/sniffer.go

Verification

  • Reproduced from provided Buildkite logs; local test execution was not run because failure is already explicit in CI logs and points to unresolved conflict markers.

Follow-up

After resolving the two files, all the many pre-commit failures across pipelines should clear together, since check-merge-conflict scans the full repo.

Note

🔒 Integrity filtering filtered 2 items

Integrity filtering activated and filtered the following items during workflow execution.
This happens when a tool call accesses a resource that does not meet the required integrity or secrecy level of the workflow.


What is this? | From workflow: PR Buildkite Detective

Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport bugfix conflicts There is a conflict in the backported pull request Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team Team:Security-Linux Platform Linux Platform Team in Security Solution

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants