Skip to content

[DNM] perf: use DeepCopyUpdate to eliminate Clone+DeepUpdate allocations#49762

Open
strawgate wants to merge 1 commit intoelastic:mainfrom
strawgate:claude-perf-deepcopyupdate
Open

[DNM] perf: use DeepCopyUpdate to eliminate Clone+DeepUpdate allocations#49762
strawgate wants to merge 1 commit intoelastic:mainfrom
strawgate:claude-perf-deepcopyupdate

Conversation

@strawgate
Copy link
Copy Markdown
Contributor

@strawgate strawgate commented Mar 29, 2026

Proposed commit message

Replace Clone()+DeepUpdate() with single-pass DeepCopyUpdate() in add_fields, cloud/host/observer metadata, processing setup, and heartbeat eventext. DeepCopyUpdate creates fresh nested maps during merge instead of cloning the entire source first.

  • add_fields: Detect single-key wrapper shape at init, use DeepCopyUpdate on inner map only. Split @metadata handling to bypass event.deepUpdate overhead.
  • add_cloud_metadata: Per-value deep copy instead of full metadata.Clone(). Immutable values returned as-is.
  • add_host_metadata: Lock-free cache read via atomic timestamp + mapstr.Pointer. DeepCopyUpdate for merge.
  • add_observer_metadata, publisher/processing (3 sites), heartbeat/eventext: Clone()+DeepUpdate() → DeepCopyUpdate().

Per-processor benchmarks:

Processor Δ ns/op Δ B/op Δ allocs/op
add_fields (3-processor chain) -47% -29% -29%
host metadata cache read -92% (120→10 ns)

E2E filebeat (benchmark input → mock ES, GOMAXPROCS=2, 30s): +15.8% EPS with 6× add_fields.

Depends on elastic/elastic-agent-libs#390 (temporarily pinned to fork).

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works.
  • I have added an entry in ./changelog/fragments using the changelog tool.

Disruptive User Impact

None. Event output is identical to main. DeepCopyUpdate produces the same merged result as Clone()+DeepUpdate().

How to test this PR locally

go test -bench=BenchmarkAddFields -benchmem ./libbeat/processors/actions/addfields/
go test -bench=BenchmarkCacheRead -benchmem ./libbeat/processors/add_host_metadata/

Related issues

@strawgate strawgate requested review from a team as code owners March 29, 2026 03:48
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Mar 29, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🤖 GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 29, 2026

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @strawgate? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 29, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Replaces many Clone() + DeepUpdate() call sites with a single-pass DeepCopyUpdate() across libbeat (add_fields, add_host_metadata, add_observer_metadata, add_cloud_metadata, publisher pipeline merges, and heartbeat event merging). Adds fast-path logic and optimized merge flows to the add_fields processor plus comprehensive unit tests and benchmarks. Refactors host metadata caching to use an atomic timestamp and a mutex with a lock-free fast-read path. Updates go.mod dependency versions and adds a replace directive. Tests/benchmarks for cloud and add_fields processors were added or extended.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • 🛠️ Update Documentation: Commit on current branch
  • 🛠️ Update Documentation: Create PR

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@libbeat/processors/actions/addfields/add_fields.go`:
- Around line 140-154: The current check around af.singleKey/af.singleKeyInner
short-circuits and returns early when the top-level key exists but nested leaves
may be missing; remove or replace that shortcut so DeepCopyUpdateNoOverwrite
always runs for this case. Specifically, eliminate the early-return branch that
inspects event.Fields[af.singleKey] and returns when all immediate child keys
exist, or change it to perform a full recursive existence check against
af.singleKeyInner before returning; otherwise always call
event.Fields.DeepCopyUpdateNoOverwrite(mapstr.M{af.singleKey:
af.singleKeyInner}) so nested missing leaves (e.g., host.os.version) are merged
in.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 48d539a1-37c3-4b8e-9d6f-f57a9d3b0118

📥 Commits

Reviewing files that changed from the base of the PR and between 2d10f57 and d5c9596.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (13)
  • changelog/fragments/1774757000-deepcopyupdate-allocations.yaml
  • go.mod
  • heartbeat/eventext/eventext.go
  • libbeat/processors/actions/addfields/add_fields.go
  • libbeat/processors/actions/addfields/add_fields_benchmark_test.go
  • libbeat/processors/actions/addfields/add_fields_test.go
  • libbeat/processors/actions/rename.go
  • libbeat/processors/add_cloud_metadata/add_cloud_metadata.go
  • libbeat/processors/add_cloud_metadata/add_cloud_metadata_optimize_test.go
  • libbeat/processors/add_host_metadata/add_host_metadata.go
  • libbeat/processors/add_host_metadata/add_host_metadata_test.go
  • libbeat/processors/add_observer_metadata/add_observer_metadata.go
  • libbeat/publisher/processing/default.go

@strawgate strawgate changed the title perf: use DeepCopyUpdate to eliminate Clone+DeepUpdate allocations [DNM] perf: use DeepCopyUpdate to eliminate Clone+DeepUpdate allocations Mar 29, 2026
@strawgate strawgate force-pushed the claude-perf-deepcopyupdate branch from 0072955 to a40faea Compare March 30, 2026 13:49
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
libbeat/processors/actions/addfields/add_fields.go (1)

140-154: ⚠️ Potential issue | 🔴 Critical

Nested no-overwrite merge early return still incorrectly short-circuits on top-level key presence.

This check only verifies immediate child keys exist in dstMap, but DeepCopyUpdateNoOverwrite would recursively descend into nested maps. If af.singleKeyInner contains {"os": {"version": "1.0"}} and destination has {"os": {}}, this returns early and version is never added.

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libbeat/processors/actions/addfields/add_fields.go` around lines 140 - 154,
The early-return incorrectly assumes presence of top-level child keys implies
all nested fields exist; replace the shallow check around
af.singleKey/af.singleKeyInner with a deep-existence check that recursively
descends mapstr.M to verify every nested key from af.singleKeyInner exists in
event.Fields[af.singleKey] (or implement a helper like deepAllKeysExist(dstMap,
af.singleKeyInner) that walks nested maps and returns false if any key path is
missing). Only return early when the recursive check confirms every nested path
exists; otherwise call
event.Fields.DeepCopyUpdateNoOverwrite(mapstr.M{af.singleKey:
af.singleKeyInner}) as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@libbeat/processors/actions/addfields/add_fields.go`:
- Around line 140-154: The early-return incorrectly assumes presence of
top-level child keys implies all nested fields exist; replace the shallow check
around af.singleKey/af.singleKeyInner with a deep-existence check that
recursively descends mapstr.M to verify every nested key from af.singleKeyInner
exists in event.Fields[af.singleKey] (or implement a helper like
deepAllKeysExist(dstMap, af.singleKeyInner) that walks nested maps and returns
false if any key path is missing). Only return early when the recursive check
confirms every nested path exists; otherwise call
event.Fields.DeepCopyUpdateNoOverwrite(mapstr.M{af.singleKey:
af.singleKeyInner}) as before.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 64f08516-1c4b-4056-a1c1-e50c464cf010

📥 Commits

Reviewing files that changed from the base of the PR and between 0072955 and a40faea.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (13)
  • changelog/fragments/1774757000-deepcopyupdate-allocations.yaml
  • go.mod
  • heartbeat/eventext/eventext.go
  • libbeat/processors/actions/addfields/add_fields.go
  • libbeat/processors/actions/addfields/add_fields_benchmark_test.go
  • libbeat/processors/actions/addfields/add_fields_test.go
  • libbeat/processors/actions/rename.go
  • libbeat/processors/add_cloud_metadata/add_cloud_metadata.go
  • libbeat/processors/add_cloud_metadata/add_cloud_metadata_optimize_test.go
  • libbeat/processors/add_host_metadata/add_host_metadata.go
  • libbeat/processors/add_host_metadata/add_host_metadata_test.go
  • libbeat/processors/add_observer_metadata/add_observer_metadata.go
  • libbeat/publisher/processing/default.go
✅ Files skipped from review due to trivial changes (3)
  • changelog/fragments/1774757000-deepcopyupdate-allocations.yaml
  • libbeat/processors/actions/addfields/add_fields_benchmark_test.go
  • libbeat/processors/actions/addfields/add_fields_test.go
🚧 Files skipped from review as they are similar to previous changes (7)
  • go.mod
  • libbeat/publisher/processing/default.go
  • heartbeat/eventext/eventext.go
  • libbeat/processors/add_cloud_metadata/add_cloud_metadata.go
  • libbeat/processors/actions/rename.go
  • libbeat/processors/add_host_metadata/add_host_metadata_test.go
  • libbeat/processors/add_cloud_metadata/add_cloud_metadata_optimize_test.go

@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown
Contributor

TL;DR

All 4 failing Buildkite jobs are failing on the same libbeat/processors/actions rename tests due to an error.message mismatch (case-sensitive string mismatch), not an infra outage. Align the emitted rename processor error text with test expectations and rerun libbeat unit/integration suites.

Remediation

  • In the rename processor error path, make sure the wrapped message prefix matches the test fixture exactly ("Failed to rename fields in processor: ...", including capitalization).
  • Re-run:
    • cd libbeat && go test ./processors/actions -run TestRenameRun -count=1 -v
    • cd libbeat && mage build unitTest goUnitTest goFIPSOnlyUnitTest goIntegTest
Investigation details

Root Cause

TestRenameRun compares full event maps with reflect.DeepEqual (libbeat/processors/actions/rename_test.go:253).

The failing cases are the ones that assert exact error.message content (rename_test.go:84-105 and rename_test.go:179-204). Logs from failing jobs show the processor emitting lowercase text:

  • failed to rename fields in processor: target field b already exists...
  • failed to rename fields in processor: could not put value: a.c: 10...

But fixtures expect "Failed to rename fields in processor: ..." (capital F). Because the message string is part of newEvent.Fields, reflect.DeepEqual fails.

Evidence

  • Build: https://buildkite.com/elastic/beats/builds/43228
  • Failed jobs:
    • Libbeat: Ubuntu x86_64 Unit Tests
    • Libbeat: Ubuntu x86_64 Go Unit Tests with fips provider and requirefips build tag
    • Libbeat: Ubuntu x86_64 fips140=only Unit Tests
    • Libbeat: Go Integration Tests
  • Key log excerpt (all jobs):
    • === FAIL: libbeat/processors/actions TestRenameRun/overwrites_an_existing_field_which_is_not_allowed
    • === FAIL: libbeat/processors/actions TestRenameRun/rename_two_fields_into_the_same_name_space...
    • rename_test.go:253: Error: Should be true

Verification

  • Local targeted verification run:
    • cd libbeat && go test ./processors/actions -run 'TestRenameRun/(overwrites_an_existing_field_which_is_not_allowed|rename_two_fields_into_the_same_name_space\._this_fails_because_a_is_already_a_key,_renaming_of_a_needs_to_happen_first)' -count=1 -v
  • This confirms the failing assertions are in the two error-message-sensitive subtests.

Follow-up

  • If you intentionally changed message wording/casing, update the two expected error.message fixtures accordingly.
  • If wording change was unintentional, restore legacy capitalization in the processor to preserve existing behavior.

Note

🔒 Integrity filtering filtered 2 items

Integrity filtering activated and filtered the following items during workflow execution.
This happens when a tool call accesses a resource that does not meet the required integrity or secrecy level of the workflow.


What is this? | From workflow: PR Buildkite Detective

Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.

@strawgate strawgate force-pushed the claude-perf-deepcopyupdate branch from a40faea to 296ee8c Compare March 30, 2026 16:08
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
libbeat/processors/actions/addfields/add_fields.go (1)

140-154: ⚠️ Potential issue | 🟠 Major

Shallow key check may skip nested merges.

The early-return at line 149 checks only immediate child key presence (dstMap[sk]), not nested structure. If af.singleKeyInner has {"os": {"version": "5.4"}} and dstMap already has {"os": {"family": "linux"}}, the check sees os exists and returns — skipping the merge that would add os.version.

DeepCopyUpdateNoOverwrite descends recursively; this shortcut does not.

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libbeat/processors/actions/addfields/add_fields.go` around lines 140 - 154,
The early-return uses a shallow presence check against dstMap (checking
dstMap[sk]) which wrongly skips nested merges when af.singleKeyInner contains
nested maps; update the logic around af.singleKey / af.singleKeyInner so you
either remove the shallow early-return or replace it with a recursive existence
check that descends maps and ensures all nested keys in af.singleKeyInner are
present in the corresponding nested dstMap before returning; reference the same
symbols (af.singleKey, af.singleKeyInner, dstMap,
event.Fields.DeepCopyUpdateNoOverwrite) and ensure the new check mirrors
DeepCopyUpdateNoOverwrite's recursion semantics so nested fields like {"os":
{"version": "5.4"}} will be merged correctly instead of being skipped.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@libbeat/processors/actions/addfields/add_fields.go`:
- Around line 140-154: The early-return uses a shallow presence check against
dstMap (checking dstMap[sk]) which wrongly skips nested merges when
af.singleKeyInner contains nested maps; update the logic around af.singleKey /
af.singleKeyInner so you either remove the shallow early-return or replace it
with a recursive existence check that descends maps and ensures all nested keys
in af.singleKeyInner are present in the corresponding nested dstMap before
returning; reference the same symbols (af.singleKey, af.singleKeyInner, dstMap,
event.Fields.DeepCopyUpdateNoOverwrite) and ensure the new check mirrors
DeepCopyUpdateNoOverwrite's recursion semantics so nested fields like {"os":
{"version": "5.4"}} will be merged correctly instead of being skipped.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e91b3a1d-f5c1-4ba1-a8e7-7aa1bd280832

📥 Commits

Reviewing files that changed from the base of the PR and between a40faea and 296ee8c.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (12)
  • changelog/fragments/1774757000-deepcopyupdate-allocations.yaml
  • go.mod
  • heartbeat/eventext/eventext.go
  • libbeat/processors/actions/addfields/add_fields.go
  • libbeat/processors/actions/addfields/add_fields_benchmark_test.go
  • libbeat/processors/actions/addfields/add_fields_test.go
  • libbeat/processors/add_cloud_metadata/add_cloud_metadata.go
  • libbeat/processors/add_cloud_metadata/add_cloud_metadata_optimize_test.go
  • libbeat/processors/add_host_metadata/add_host_metadata.go
  • libbeat/processors/add_host_metadata/add_host_metadata_test.go
  • libbeat/processors/add_observer_metadata/add_observer_metadata.go
  • libbeat/publisher/processing/default.go
✅ Files skipped from review due to trivial changes (4)
  • changelog/fragments/1774757000-deepcopyupdate-allocations.yaml
  • libbeat/processors/add_host_metadata/add_host_metadata.go
  • libbeat/processors/actions/addfields/add_fields_benchmark_test.go
  • heartbeat/eventext/eventext.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • libbeat/processors/add_observer_metadata/add_observer_metadata.go
  • libbeat/publisher/processing/default.go
  • libbeat/processors/actions/addfields/add_fields_test.go

@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Mar 30, 2026
@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Mar 30, 2026
@pierrehilbert pierrehilbert added Team:obs-ds-hosted-services Label for the Observability Hosted Services team needs_team Indicates that the issue/PR needs a Team:* label labels Mar 30, 2026
@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/obs-ds-hosted-services (Team:obs-ds-hosted-services)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Mar 30, 2026
Replace Clone()+DeepUpdate() with single-pass DeepCopyUpdate() in
add_fields, cloud/host/observer metadata, rename, processing, and
heartbeat eventext.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@strawgate strawgate force-pushed the claude-perf-deepcopyupdate branch from 4d2f151 to 8e0e2f4 Compare March 31, 2026 17:36
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@go.mod`:
- Line 554: The go.mod currently contains a replace directive "replace
github.com/elastic/elastic-agent-libs => github.com/strawgate/elastic-agent-libs
v0.33.4-0.20260327142400-b15ccc340463"; remove that replace line from go.mod so
builds use the canonical github.com/elastic/elastic-agent-libs module, and if
you need a temporary override for local testing move the override out of the
committed go.mod (e.g., use a local go.work or documented developer step) before
merging.

In `@libbeat/processors/add_host_metadata/add_host_metadata_test.go`:
- Around line 500-556: Both tests (TestLoadDataFastPath and
TestCachedDataNotCorruptedByDownstreamMutation) must pin and restore the global
features.FQDN() flag so their behavior is deterministic: before calling
newWithHostInfoFactory set features.SetFQDN(false) (or true if you intend FQDN
on) and register a t.Cleanup to restore the previous value, then proceed to
construct the processor and run assertions; apply the same setup/cleanup pattern
in both tests so the eager cache warm-up inside newWithHostInfoFactory doesn't
depend on prior tests.

In `@libbeat/processors/add_host_metadata/add_host_metadata.go`:
- Around line 68-71: The cache TTL check uses time.Unix(0, unixNano) which
strips monotonic clock data; change the approach to use a monotonic baseline:
introduce a package-level monotonic start (e.g., startTime := time.Now()) and
store lastUpdate as elapsed monotonic nanoseconds via
time.Since(startTime).Nanoseconds() in hostMetadataCache.lastUpdate
(atomic.Int64), then compute TTL by comparing current elapsed nanoseconds
(time.Since(startTime).Nanoseconds()) against lastUpdate instead of
reconstructing a time.Time with time.Unix; update any methods that read/write
lastUpdate and the expiry check to use the new elapsed-monotonic value.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 18fbe4d6-dff0-4d2f-8dc7-3474f56531d0

📥 Commits

Reviewing files that changed from the base of the PR and between 296ee8c and 8e0e2f4.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (12)
  • changelog/fragments/1774757000-deepcopyupdate-allocations.yaml
  • go.mod
  • heartbeat/eventext/eventext.go
  • libbeat/processors/actions/addfields/add_fields.go
  • libbeat/processors/actions/addfields/add_fields_benchmark_test.go
  • libbeat/processors/actions/addfields/add_fields_test.go
  • libbeat/processors/add_cloud_metadata/add_cloud_metadata.go
  • libbeat/processors/add_cloud_metadata/add_cloud_metadata_optimize_test.go
  • libbeat/processors/add_host_metadata/add_host_metadata.go
  • libbeat/processors/add_host_metadata/add_host_metadata_test.go
  • libbeat/processors/add_observer_metadata/add_observer_metadata.go
  • libbeat/publisher/processing/default.go
✅ Files skipped from review due to trivial changes (2)
  • changelog/fragments/1774757000-deepcopyupdate-allocations.yaml
  • libbeat/processors/actions/addfields/add_fields.go
🚧 Files skipped from review as they are similar to previous changes (5)
  • heartbeat/eventext/eventext.go
  • libbeat/processors/add_observer_metadata/add_observer_metadata.go
  • libbeat/publisher/processing/default.go
  • libbeat/processors/actions/addfields/add_fields_benchmark_test.go
  • libbeat/processors/add_cloud_metadata/add_cloud_metadata.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team Team:obs-ds-hosted-services Label for the Observability Hosted Services team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants