Skip to content

Add flightrecorder feature to the operator#2785

Merged
misteriaud merged 20 commits intomainfrom
misteriaud/flightrecorder-feature
Mar 31, 2026
Merged

Add flightrecorder feature to the operator#2785
misteriaud merged 20 commits intomainfrom
misteriaud/flightrecorder-feature

Conversation

@misteriaud
Copy link
Copy Markdown
Member

@misteriaud misteriaud commented Mar 20, 2026

Context

The flight recorder is a lightweight sidecar that passively mirrors pipeline signals (metrics, logs, trace stats) from the Datadog Agent over a Unix socket. A Rust sidecar process writes the data as columnar Parquet files for offline analysis and debugging.

Key design properties:

  • No backpressure — a slow or unavailable sidecar never blocks the agent's pipeline
  • Minimal overhead — <10 MB RSS at median agent workloads
  • Crash-safe — the sidecar is independent; if it crashes, the agent continues normally

Summary

Adds flightrecorder sidecar support to the Datadog Operator. When the annotation agent.datadoghq.com/flightrecorder-enabled: "true" is set on the DatadogAgent resource, the operator:

  • Injects a flightrecorder sidecar container into the agent DaemonSet
  • Sets up shared Unix socket volume (/var/run/flightrecorder) for agent-to-flightrecorder IPC
  • Sets up data volume (/data/signals) for Parquet output files
  • Configures DD_FLIGHTRECORDER_ENABLED and DD_FLIGHTRECORDER_SOCKET_PATH env vars on the core agent and trace-agent containers
  • Configure DD_FLIGHTRECORDER_SOCKET_PATH and DD_FLIGHTRECORDER_OUTPUT_DIR env vars on the flightrecorder container.

Usage

apiVersion: datadoghq.com/v2alpha1
kind: DatadogAgent
metadata:
  name: datadog
  annotations:
    agent.datadoghq.com/flightrecorder-enabled: "true"

Changes

  • internal/controller/datadogagent/feature/flightrecorder/ — Feature implementation (volumes, env vars, container injection)
  • internal/controller/datadogagent/feature/utils/utils.go — Annotation constant
  • internal/controller/datadogagent/component/agent/default.go — Default flightrecorder container definition with resource limits

Test plan

  • Unit tests for feature enable/disable and volume/env var configuration
  • go build ./... passes
  • All E2E tests pass (K8s 1.19 through 1.32)
  • Local deployment in kind cluster.

🤖 Generated with Claude Code

misteriaud and others added 3 commits March 20, 2026 09:53
Flight Recorder is a Rust sidecar that records agent pipeline signals
(metrics, logs) to Vortex columnar files. This adds operator support so
users can enable it via `spec.features.flightRecorder.enabled` in the
DatadogAgent CRD.

The operator creates a flightrecorder container in the agent pod with:
- Shared emptyDir volume at /var/run/flightrecorder for Unix socket IPC
- Data volume at /data/signals for Vortex output files
- DD_FLIGHTRECORDER_ENABLED and DD_FLIGHTRECORDER_SOCKET_PATH env vars
  on the core agent container

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 20, 2026

Codecov Report

❌ Patch coverage is 67.92453% with 34 lines in your changes missing coverage. Please review.
✅ Project coverage is 39.05%. Comparing base (4a814a5) to head (0880bbe).

Files with missing lines Patch % Lines
...controller/datadogagent/component/agent/default.go 0.00% 22 Missing ⚠️
...ler/datadogagent/feature/flightrecorder/feature.go 85.54% 10 Missing and 2 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2785      +/-   ##
==========================================
+ Coverage   38.94%   39.05%   +0.11%     
==========================================
  Files         313      314       +1     
  Lines       27134    27240     +106     
==========================================
+ Hits        10567    10639      +72     
- Misses      15778    15810      +32     
- Partials      789      791       +2     
Flag Coverage Δ
unittests 39.05% <67.92%> (+0.11%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
internal/controller/datadogagent/controller.go 92.85% <ø> (ø)
...nal/controller/datadogagent/feature/utils/utils.go 0.00% <ø> (ø)
...ontroller/datadogagent/override/podtemplatespec.go 77.70% <100.00%> (+0.15%) ⬆️
...ler/datadogagent/feature/flightrecorder/feature.go 85.54% <85.54%> (ø)
...controller/datadogagent/component/agent/default.go 43.61% <0.00%> (-1.38%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4a814a5...0880bbe. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

misteriaud and others added 6 commits March 20, 2026 13:43
Request 50Mi, limit 200Mi — based on DESIGN.md benchmarks showing
33-61MB RSS for the recorder sidecar.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nv var instead of CRD field

Remove spec.features.flightRecorder from the DatadogAgent CRD schema. Instead,
Configure() detects DD_EXPERIMENTAL_FLIGHTRECORDER_ENABLED=true in either
spec.override.nodeAgent.env (component-level) or
spec.override.nodeAgent.containers.agent.env (container-level).

This avoids the CRD pruning issue where unknown fields are silently stripped
before the operator can read them.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… changes

Restore controller-gen version to v0.17.3 (CI version) in CRD files that
were inadvertently regenerated with the local v0.16.3 toolchain. Revert
unrelated changes to dashboards/metrics/monitors/SLOs/RBAC that were side
effects of running make generate locally.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Set DD_FLIGHTRECORDER_ENABLED and DD_FLIGHTRECORDER_SOCKET_PATH on both
  the core agent and trace-agent containers
- Mount the flightrecorder socket volume on the trace-agent container

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@misteriaud misteriaud added the enhancement New feature or request label Mar 27, 2026
@misteriaud misteriaud marked this pull request as ready for review March 27, 2026 13:15
@misteriaud misteriaud requested a review from a team March 27, 2026 13:15
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bfb840294f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

misteriaud and others added 5 commits March 27, 2026 13:37
… feature

Switch from reading DD_EXPERIMENTAL_FLIGHTRECORDER_ENABLED in
spec.override.nodeAgent.env to the annotation-based pattern used by
other experimental features (e.g. privateactionrunner).

The feature is now enabled via:
  agent.datadoghq.com/flightrecorder-enabled: "true"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Align with the convention used by DogStatsD and APM sockets which
live under /var/run/datadog/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…der.sock

Only one socket is used, so a subdirectory is unnecessary. Mount the
volume at /var/run/datadog and place the socket file directly as
flightrecorder.sock, consistent with dsd.socket and apm.socket.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…r socket

Two volumes cannot mount at the same path — using /var/run/datadog
directly would conflict with the DogStatsD socket volume. Use a
dedicated subdirectory instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Set DD_FLIGHTRECORDER_SOCKET_PATH and DD_FLIGHTRECORDER_OUTPUT_DIR on
  the flightrecorder container so it knows where to listen and write
- Remove DD_FLIGHTRECORDER_ENABLED from the flightrecorder container
  (it is agent-side only)
- Clean up example YAML: remove custom image overrides and clusterName
- Replace "Vortex" with "Parquet" in comments

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@tbavelier tbavelier added this to the v1.26.0 milestone Mar 27, 2026
Copy link
Copy Markdown
Contributor

@adel121 adel121 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@tbavelier to have a quick final look 🙇

misteriaud and others added 5 commits March 30, 2026 15:54
Move env var constants (DD_FLIGHTRECORDER_*) from common/envvar.go to
flightrecorder/const.go since they are only used within the feature.

Move shared volume names and paths to common/const.go so they can be
referenced by both the feature and the component default container.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add volumeMountsForFlightRecorder() helper following the pattern used
by other containers. Remove memory requests/limits as they are never
set by default and can be configured via overrides.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add FlightRecorderContainerName to the merger AllAgentContainers map
so that node-agent-wide env/envFrom overrides are applied to the
flightrecorder sidecar container.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@misteriaud misteriaud merged commit bea964a into main Mar 31, 2026
37 checks passed
@misteriaud misteriaud deleted the misteriaud/flightrecorder-feature branch March 31, 2026 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants