Skip to content

Conversation

@sandipanpanda
Copy link
Contributor

@sandipanpanda sandipanpanda commented Jan 8, 2026

Support GKE dcgm exporter labels

Summary by CodeRabbit

  • Chores
    • Improved Kubernetes pod detection for NVIDIA monitoring components by broadening name matching to support varied naming conventions.
    • Updated pod selection sources to consider additional identification labels, increasing accuracy and compatibility of metrics collection across diverse cluster setups.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 8, 2026

📝 Walkthrough

Walkthrough

Broadened Prometheus kubernetes-pods relabeling to match DCGM exporter pods via regex '.dcgm-exporter.' and added __meta_kubernetes_pod_label_app_kubernetes_io_name as an additional source_label to influence pod selection.

Changes

Cohort / File(s) Summary
Prometheus ConfigMap template
helm-chart/zxporter/templates/prometheus-configmap.yaml
Changed relabel regex from literal nvidia-dcgm-exporter to .*dcgm-exporter.*; added __meta_kubernetes_pod_label_app_kubernetes_io_name to source_labels.
Prometheus values (hack)
config/prometheus/hack.prometheus.values.yaml
Mirrored change to broaden DCGM exporter matching and include __meta_kubernetes_pod_label_app_kubernetes_io_name alongside __meta_kubernetes_pod_label_app for the keep rule.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested reviewers

  • dray92
  • Parthiba-Hazra

Poem

🐇 Regex hops through meadow rows,

Labels guide where DCGM goes.
Broad and nimble, tails a-flicker,
Pods now found by whisker-ticker.
🥕🎋

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: updating Prometheus scrape configuration to use a broader pattern for matching dcgm-exporter pods.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sandipanpanda sandipanpanda force-pushed the dcgm-label branch 2 times, most recently from 70319ef to 81bf4dc Compare January 8, 2026 08:37
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @helm-chart/zxporter/templates/prometheus-configmap.yaml:
- Around line 92-96: The current relabeling broadened the regex to
'.*dcgm-exporter.*' and added the source label
__meta_kubernetes_pod_label_app_kubernetes_io_name which can match unintended
pods; tighten the match by adjusting the relabel regex to only match token
boundaries or exact label values for dcgm-exporter (for example use anchors or
word-boundary-like patterns) so that combined source_labels (
__meta_kubernetes_pod_label_app and
__meta_kubernetes_pod_label_app_kubernetes_io_name ) only match intended
dcgm-exporter pods while keeping the existing prerequisite check for
prometheus.io/scrape=true.
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 70319ef and 81bf4dc.

⛔ Files ignored due to path filters (4)
  • dist/backend-install.yaml is excluded by !**/dist/**
  • dist/install.yaml is excluded by !**/dist/**
  • dist/installer_updater.yaml is excluded by !**/dist/**
  • dist/prometheus.yaml is excluded by !**/dist/**
📒 Files selected for processing (2)
  • config/prometheus/hack.prometheus.values.yaml
  • helm-chart/zxporter/templates/prometheus-configmap.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build Docker Image
  • GitHub Check: Build Docker Image
  • GitHub Check: Analyze (go)
🔇 Additional comments (1)
config/prometheus/hack.prometheus.values.yaml (1)

413-415: Consistent configuration applied.

The changes here mirror those in the helm chart template, ensuring consistency between the configuration files. The same OR-based label matching logic applies here: pods with "dcgm-exporter" in either the app or app.kubernetes.io/name labels will be matched.

Comment on lines 92 to +96
- action: keep
regex: nvidia-dcgm-exporter
regex: '.*dcgm-exporter.*'
source_labels:
- __meta_kubernetes_pod_label_app
- __meta_kubernetes_pod_label_app_kubernetes_io_name
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Find and examine the prometheus-configmap.yaml file
find . -name "prometheus-configmap.yaml" -type f | head -20

Repository: devzero-inc/zxporter

Length of output: 121


🏁 Script executed:

# Once we find it, check the lines in question
git ls-files | grep -i prometheus-configmap

Repository: devzero-inc/zxporter

Length of output: 119


🏁 Script executed:

# Search for the file
fd prometheus-configmap.yaml

Repository: devzero-inc/zxporter

Length of output: 119


🏁 Script executed:

# Read the prometheus-configmap.yaml file, focusing on lines 92-96
cat -n ./helm-chart/zxporter/templates/prometheus-configmap.yaml | sed -n '80,110p'

Repository: devzero-inc/zxporter

Length of output: 1264


🏁 Script executed:

# Also check the git history to see what changed
git log --oneline -20 -- ./helm-chart/zxporter/templates/prometheus-configmap.yaml

Repository: devzero-inc/zxporter

Length of output: 164


🏁 Script executed:

# Check if there's a git diff to see the actual changes
git diff HEAD~1 ./helm-chart/zxporter/templates/prometheus-configmap.yaml | head -100

Repository: devzero-inc/zxporter

Length of output: 250


🏁 Script executed:

# Get the full commit to see the changes
git show 81bf4dc --stat

Repository: devzero-inc/zxporter

Length of output: 17165


🏁 Script executed:

# Get the actual diff of this commit
git show 81bf4dc

Repository: devzero-inc/zxporter

Length of output: 50378


🏁 Script executed:

# Check if there's any documentation or comments in the file about this
cat -n ./helm-chart/zxporter/templates/prometheus-configmap.yaml | head -100

Repository: devzero-inc/zxporter

Length of output: 4085


🏁 Script executed:

# Let me check the git commit to see what the previous state was
git show 81bf4dc:helm-chart/zxporter/templates/prometheus-configmap.yaml | sed -n '80,110p'

Repository: devzero-inc/zxporter

Length of output: 1047


🌐 Web query:

Prometheus relabel_config multiple source_labels concatenation separator

💡 Result:

Yes — when you list multiple source_labels in a relabel_config, Prometheus concatenates their values using the separator (defaults to ";"), then matches that combined string with your regex and uses capture groups in replacement. Example:

  • separator default: ";" (you can set any string).
  • flow: concatenate source_labels with separator → match regex → use $1, $2… in replacement.

Example:
relabel_configs:

  • source_labels: [job, instance]
    separator: "_"
    regex: (.+)
    replacement: "$1"
    target_label: combined

This will set label combined to "job_instance" (e.g. "api_10.0.0.1"). (prometheus.io)

Citations:


🌐 Web query:

Prometheus relabel action keep regex multiple source_labels behavior

💡 Result:

Short answer: When keep (or drop/replace/hashmod) has multiple source_labels, Prometheus concatenates their values with the configured separator (default ";"), then matches your regex against that single concatenated string (the regex is anchored on both ends). If the regex matches the concatenation the object is kept (keep) or dropped (drop); otherwise it is removed/kept accordingly. Example: source_labels: [app,region], separator: ":" and regex: "frontend:us-west" will match only when app=="frontend" and region=="us-west". [1][2]

Sources:

  • Prometheus docs — relabeling & config (concatenation, separator, anchored regex, actions). [1]
  • PromLabs relabeling guide (keep/drop behavior and examples). [2]

🏁 Script executed:

# Verify if there's a previous version of this file in git history
git log --all --full-history -- helm-chart/zxporter/templates/prometheus-configmap.yaml | head -20

Repository: devzero-inc/zxporter

Length of output: 757


🏁 Script executed:

# Check if the file appears in any previous commits
git log --follow --all -p -- helm-chart/zxporter/templates/prometheus-configmap.yaml | head -100

Repository: devzero-inc/zxporter

Length of output: 3546


🏁 Script executed:

# Check commit 65f1bcf to see if it had any prometheus config
git show 65f1bcf --stat | head -30

Repository: devzero-inc/zxporter

Length of output: 1022


🏁 Script executed:

# Search for any previous prometheus config file in the repo history
git log --all --full-history --diff-filter=D -- "*prometheus*configmap*" | head -20

Repository: devzero-inc/zxporter

Length of output: 46


🏁 Script executed:

# Check if there are other prometheus config files in the repo
find . -name "*prometheus*" -type f | grep -v ".git"

Repository: devzero-inc/zxporter

Length of output: 426


🏁 Script executed:

# Let's see what changed in the prometheus-configmap.yaml at commit 65f1bcf
git show 65f1bcf:helm-chart/zxporter/templates/prometheus-configmap.yaml | grep -A 20 "kubernetes-pods"

Repository: devzero-inc/zxporter

Length of output: 716


🏁 Script executed:

# And compare to commit 81bf4dc
git show 81bf4dc:helm-chart/zxporter/templates/prometheus-configmap.yaml | grep -A 20 "kubernetes-pods"

Repository: devzero-inc/zxporter

Length of output: 743


🏁 Script executed:

# Get the actual diff between these two commits for the configmap
git diff 65f1bcf 81bf4dc -- helm-chart/zxporter/templates/prometheus-configmap.yaml | head -100

Repository: devzero-inc/zxporter

Length of output: 749


The regex pattern change correctly supports GKE dcgm-exporter with flexible labeling.

The changes appropriately:

  1. Broaden the regex from literal nvidia-dcgm-exporter to '.*dcgm-exporter.*' to accommodate label variations across different Kubernetes distributions
  2. Add __meta_kubernetes_pod_label_app_kubernetes_io_name to handle GKE's standard labeling scheme

Prometheus concatenates multiple source_labels with a separator (default ;) and matches the regex against the combined string. This effectively creates an OR condition—the pod is scraped if either the app or app.kubernetes.io/name label contains "dcgm-exporter".

The broader regex pattern could match unintended pods (e.g., "test-dcgm-exporter-backup", "dcgm-exporter-v2-dev"), but the configuration remains protected by the prometheus.io/scrape=true annotation requirement (checked in the preceding relabel rule) and any downstream metric filtering.

🤖 Prompt for AI Agents
In @helm-chart/zxporter/templates/prometheus-configmap.yaml around lines 92 -
96, The current relabeling broadened the regex to '.*dcgm-exporter.*' and added
the source label __meta_kubernetes_pod_label_app_kubernetes_io_name which can
match unintended pods; tighten the match by adjusting the relabel regex to only
match token boundaries or exact label values for dcgm-exporter (for example use
anchors or word-boundary-like patterns) so that combined source_labels (
__meta_kubernetes_pod_label_app and
__meta_kubernetes_pod_label_app_kubernetes_io_name ) only match intended
dcgm-exporter pods while keeping the existing prerequisite check for
prometheus.io/scrape=true.

@sandipanpanda sandipanpanda merged commit 7b9694b into main Jan 9, 2026
24 checks passed
@sandipanpanda sandipanpanda deleted the dcgm-label branch January 9, 2026 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants