Skip to content

Add system-probe-lite support for discovery in Helm chart#2479

Draft
vitkyrka wants to merge 16 commits intomainfrom
vitkyrka/disco-lite
Draft

Add system-probe-lite support for discovery in Helm chart#2479
vitkyrka wants to merge 16 commits intomainfrom
vitkyrka/disco-lite

Conversation

@vitkyrka
Copy link
Copy Markdown
Contributor

@vitkyrka vitkyrka commented Mar 13, 2026

What this PR does / why we need it:

Adds support for starting system-probe-lite (SPL) instead of the full system-probe in the Helm chart when only discovery is enabled.

SPL is a privileged Rust binary that implements just the discovery module. The goal of this support is to ensure that SPL is started instead of system-probe whenever only discovery is enabled.

How it works:

If any other system-probe features than discovery are also enabled (NPM, USM, etc.), the regular system-probe binary is always used directly

When only discovery is enabled (no other system-probe features):

  • The system-probe container automatically runs SPL as the entry point
  • If SPL fails to start (either because it doesn't exist in the image or potentially due to some other issue):
    • If discovery.enabled was explicitly set by the user, we fall back to the full system-probe binary (since the user opted in to discovery knowingly)
    • If only discovery.enabledByDefault is set (i.e. discovery was turned on without the user's explicit choice), we fall back to sleep infinity to avoid starting system-probe unexpectedly (or crashing) when using an older agent image without system-probe-lite (or without the discovery feature altogether).

Special notes for your reviewer:

The discovery-enabled helper uses kindIs "invalid" to distinguish between enabled: false (explicitly disabled) and enabled being unset/nil. This is important because a simple or would treat false and nil the same way, causing enabledByDefault: true to override an explicit enabled: false.

Note that discovery is not turned on by default in this PR, that's a separate, future step.

Checklist

[Place an '[x]' (no spaces) in all applicable fields. Please remove unrelated fields.]

  • All commits are signed (see: signing commits)
  • Chart Version semver bump label has been added (use <chartName>/minor-version, <chartName>/patch-version, or <chartName>/no-version-bump)
  • For datadog or datadog-operator chart or value changes, update the test baselines (run: make update-test-baselines)

GitHub CI takes care of the below, but are still required:

  • Documentation has been updated with helm-docs (run: .github/helm-docs.sh)
  • CHANGELOG.md has been updated
  • Variables are documented in the README.md

vitkyrka and others added 5 commits March 13, 2026 17:15
When only discovery is enabled and no other system-probe feature is active,
use the lightweight system-probe-lite binary instead of full system-probe.
Falls back to system-probe if system-probe-lite is not available in the image.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When discovery is enabled via enabledByDefault (rather than explicitly
by the user), the system-probe-lite fallback is now `sleep infinity`
instead of the full system-probe binary. This avoids running the heavy
system-probe for users who didn't explicitly opt in to discovery.

The enabledByDefault flag is set to false for now and will be changed
to true in the future to enable discovery by default.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add baseline golden-file tests and targeted unit tests to verify the
tri-state system-probe command: explicit enable (fallback system-probe),
enabledByDefault (fallback sleep infinity), other features (regular
system-probe), and no discovery (no system-probe container).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ature lists

Add a `discovery-enabled` helper with correct tri-state logic so that
explicit `discovery.enabled: false` overrides `enabledByDefault: true`.
Redefine `system-probe-feature` in terms of `system-probe-feature-except-discovery`
to eliminate the duplicated feature list. Add test for the enabled=false override.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added the chart/datadog This issue or pull request is related to the datadog chart label Mar 13, 2026
vitkyrka and others added 2 commits March 13, 2026 17:26
Use an inline conditional for the fallback (system-probe vs sleep infinity)
instead of duplicating the entire system-probe-lite command line.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vitkyrka vitkyrka added the datadog/minor-version Minor version bump for datadog chart label Mar 13, 2026
dd-octo-sts bot and others added 9 commits March 13, 2026 16:32
- bump version for datadog to 3.188.0 (minor-version)
- update changelog for datadog with version 3.188.0
- update readme for datadog
- update readme for datadog
- update readme for datadog
# Conflicts:
#	charts/datadog/CHANGELOG.md
#	charts/datadog/Chart.yaml
#	charts/datadog/README.md
Move our system-probe-lite changelog entry to its own version (3.196.0)
instead of incorrectly merging it into the upstream 3.188.0 entry.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Regenerate discovery_spl baseline manifests to reflect the registry
change from gcr.io/datadoghq to registry.datadoghq.com.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When discovery is the only system-probe feature enabled, always use the
lightweight system-probe-lite binary. Previously this required setting
useSystemProbeLite: true explicitly, which meant enabling discovery by
default would start full system-probe since the flag was false by default.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Verify that when discovery is explicitly disabled but another
system-probe feature (NPM) is enabled, the regular system-probe
binary is used without SPL.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a baseline test where discovery.enabled=false overrides
enabledByDefault=true, verifying no system-probe container is rendered.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chart/datadog This issue or pull request is related to the datadog chart datadog/minor-version Minor version bump for datadog chart

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant