Skip to content

discovery: add system-probe-lite support#2610

Open
Yumasi wants to merge 6 commits intomainfrom
guillaume.pagnoux/add-sd-agent-suppor
Open

discovery: add system-probe-lite support#2610
Yumasi wants to merge 6 commits intomainfrom
guillaume.pagnoux/add-sd-agent-suppor

Conversation

@Yumasi
Copy link
Copy Markdown
Member

@Yumasi Yumasi commented Feb 18, 2026

What does this PR do?

Adds support for starting system-probe-lite (SPL) instead of the full system-probe in the operator when only discovery is enabled.

SPL is a privileged Rust binary that implements just the discovery module. Kubernetes needs a different approach than Docker/Linux since the operator is updated asynchronously to agent versions — we can't enable discovery by default and guarantee we won't start the full system-probe with default config.

How it works:

  • When only discovery is enabled (no other system-probe features), the system-probe container runs SPL as the entry point
  • If discovery.enabled is explicitly set and SPL doesn't exist, we fall back to the full system-probe binary (since the user opted in knowingly)
  • If only discovery.enabledByDefault is set (i.e. discovery was turned on without the user's explicit choice), SPL falls back to sleep infinity to avoid starting system-probe unexpectedly (or crashing) when using an older agent image without system-probe-lite (or without the discovery feature altogether). Note that discovery is not turned on by default in this PR, that's a separate, future step.
  • If other system-probe features are also enabled (NPM, USM, etc.), the regular system-probe binary is used directly.

Motivation

Match agent configuration.

Additional Notes

Minimum Agent Versions

Are there minimum versions of the Datadog Agent and/or Cluster Agent required?

  • Agent: v7.78

However, like in the Helm chart change, it checks first for the sd-agent binary to exists in the container image. This means using an older agent image will simply use system-probe's discovery implementation instead of sd-agent.

Describe your test plan

  • Tested on an AWS cluster that it behaves as follow:
    • Default config i.e features.serviceDiscovery.enabled set to false: no system-probe container.
    • features.serviceDiscovery.enabled set to true: system-probe container starts; execs system-probe-lite.
    • The above + another system-probe feature (e.g features.npm.enabled): system-probe container starts; system-probe runs directly.
    • Discovery enabled + agent version without sd-agent: system-probe container starts; system-probe runs directly.

Checklist

  • PR has at least one valid label: bug, enhancement, refactoring, documentation, tooling, and/or dependencies
  • PR has a milestone or the qa/skip-qa label
  • All commits are signed (see: signing commits)

@Yumasi Yumasi added the enhancement New feature or request label Feb 18, 2026
@Yumasi Yumasi force-pushed the guillaume.pagnoux/add-sd-agent-suppor branch from b2401a9 to 56ee1e8 Compare February 18, 2026 14:31
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Feb 18, 2026

Codecov Report

❌ Patch coverage is 89.47368% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 39.00%. Comparing base (4a814a5) to head (77d7137).

Files with missing lines Patch % Lines
...r/datadogagent/feature/servicediscovery/feature.go 88.57% 2 Missing and 2 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2610      +/-   ##
==========================================
+ Coverage   38.94%   39.00%   +0.05%     
==========================================
  Files         313      313              
  Lines       27134    27167      +33     
==========================================
+ Hits        10567    10596      +29     
- Misses      15778    15780       +2     
- Partials      789      791       +2     
Flag Coverage Δ
unittests 39.00% <89.47%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...ller/datadogagent/defaults/datadogagent_default.go 87.17% <100.00%> (+0.09%) ⬆️
...r/datadogagent/feature/servicediscovery/feature.go 82.27% <88.57%> (+3.11%) ⬆️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4a814a5...77d7137. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Yumasi Yumasi force-pushed the guillaume.pagnoux/add-sd-agent-suppor branch from 56ee1e8 to 1bd0750 Compare March 18, 2026 16:09
@Yumasi Yumasi added this to the v1.25.0 milestone Mar 19, 2026
@Yumasi Yumasi force-pushed the guillaume.pagnoux/add-sd-agent-suppor branch from 1bd0750 to 94b4da5 Compare March 19, 2026 10:04
@Yumasi Yumasi changed the title discovery: add sd-agent support discovery: add system-probe-lite support Mar 19, 2026
@Yumasi Yumasi force-pushed the guillaume.pagnoux/add-sd-agent-suppor branch from 94b4da5 to 98f90bb Compare March 19, 2026 11:06
@Yumasi Yumasi force-pushed the guillaume.pagnoux/add-sd-agent-suppor branch 8 times, most recently from b2a8245 to e1688ef Compare March 25, 2026 13:16
@Yumasi Yumasi force-pushed the guillaume.pagnoux/add-sd-agent-suppor branch from e1688ef to 5155320 Compare March 25, 2026 14:14
@levan-m levan-m modified the milestones: v1.25.0, v1.26.0 Mar 25, 2026
@Yumasi Yumasi marked this pull request as ready for review March 25, 2026 15:55
@Yumasi Yumasi requested a review from a team March 25, 2026 15:55
@Yumasi Yumasi requested review from a team as code owners March 25, 2026 15:55
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 515532045d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Copy Markdown

@domalessi domalessi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a suggestion to improve clarity! Approving so you're not blocked on me.

Copy link
Copy Markdown

@domalessi domalessi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes!

Yumasi and others added 3 commits March 27, 2026 10:15
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

// hasOtherSystemProbeFeatures returns true if any feature besides service discovery
// requires the full system-probe binary. When true, system-probe-lite cannot be used.
func hasOtherSystemProbeFeatures(features *v2alpha1.DatadogFeatures) bool {
Copy link
Copy Markdown
Contributor

@vitkyrka vitkyrka Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we ensure that list is correct and complete and remains that way? It's not obvious that I have to patch feature/servicediscovery/feature.go if I'm adding some unrelated feature that uses system-probe, and that a failure to do so may cause my feature to not work in some cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants