Skip to content

Conversation

@slashpai
Copy link
Member

Configure all ServiceMonitor objects for DaemonSet workloads to use EndpointSlice for service discovery instead of the default Endpoints API, improving scalability and performance for monitoring DaemonSet pods.

Changes:

  • Add serviceDiscoveryRole: EndpointSlice to ServiceMonitor specs
  • Add discovery.k8s.io/endpointslices RBAC permissions to prometheus-k8s Roles

Related Epic: https://issues.redhat.com/browse/MON-4216

cc @simonpasquier

/hold to test changes

Configure all ServiceMonitor objects for DaemonSet workloads to use
EndpointSlice for service discovery instead of the default Endpoints API,
improving scalability and performance for monitoring DaemonSet pods.

Changes:
- Add serviceDiscoveryRole: EndpointSlice to ServiceMonitor specs
- Add discovery.k8s.io/endpointslices RBAC permissions to prometheus-k8s Roles

Signed-off-by: Jayapriya Pai <janantha@redhat.com>
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 17, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Nov 17, 2025

@slashpai: This pull request references MON-4432 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the sub-task to target the "4.21.0" version, but no target version was set.

In response to this:

Configure all ServiceMonitor objects for DaemonSet workloads to use EndpointSlice for service discovery instead of the default Endpoints API, improving scalability and performance for monitoring DaemonSet pods.

Changes:

  • Add serviceDiscoveryRole: EndpointSlice to ServiceMonitor specs
  • Add discovery.k8s.io/endpointslices RBAC permissions to prometheus-k8s Roles

Related Epic: https://issues.redhat.com/browse/MON-4216

cc @simonpasquier

/hold to test changes

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 17, 2025
@coderabbitai
Copy link

coderabbitai bot commented Nov 17, 2025

Walkthrough

This change adds EndpointSlice-based service discovery support across multiple Prometheus monitoring configurations. ServiceMonitor resources are updated with serviceDiscoveryRole: EndpointSlice, and corresponding RBAC Role permissions are extended to grant Prometheus access to endpointslices resources in the discovery.k8s.io API group.

Changes

Cohort / File(s) Summary
EndpointSlice Discovery Configuration
bindata/kube-proxy/monitor.yaml, bindata/network/frr-k8s/monitor.yaml, bindata/network/network-metrics/002-prometheus.yaml, bindata/network/openshift-sdn/monitor.yaml, bindata/network/ovn-kubernetes/common/monitor-node.yaml
Adds serviceDiscoveryRole: EndpointSlice field to ServiceMonitor specs. Extends prometheus-k8s Role with new RBAC rule granting get, list, and watch permissions on endpointslices resource in discovery.k8s.io API group.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

  • Homogeneous changes applied consistently across 5 manifests (low complexity pattern)
  • All modifications are YAML configuration additions with no code logic changes
  • Each file requires validation that RBAC permissions are correctly specified and aligned
  • Verify that serviceDiscoveryRole: EndpointSlice field is syntactically valid for ServiceMonitor resources
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between fda7a9f and 57acbaf.

📒 Files selected for processing (5)
  • bindata/kube-proxy/monitor.yaml (2 hunks)
  • bindata/network/frr-k8s/monitor.yaml (2 hunks)
  • bindata/network/network-metrics/002-prometheus.yaml (2 hunks)
  • bindata/network/openshift-sdn/monitor.yaml (3 hunks)
  • bindata/network/ovn-kubernetes/common/monitor-node.yaml (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • bindata/kube-proxy/monitor.yaml
  • bindata/network/frr-k8s/monitor.yaml
  • bindata/network/network-metrics/002-prometheus.yaml
  • bindata/network/openshift-sdn/monitor.yaml
  • bindata/network/ovn-kubernetes/common/monitor-node.yaml
🔇 Additional comments (5)
bindata/network/network-metrics/002-prometheus.yaml (1)

27-27: Consistent implementation across ServiceMonitor and RBAC.

Changes align with the pattern seen in other monitored workloads. RBAC rule is correctly structured with all required verbs.

Also applies to: 65-72

bindata/network/frr-k8s/monitor.yaml (1)

60-60: Consistent pattern applied across frr-k8s ServiceMonitor and RBAC.

Changes follow the established pattern. Note that serviceDiscoveryRole applies cluster-wide to the ServiceMonitor's scope, regardless of the number of endpoints defined.

Also applies to: 78-85

bindata/network/openshift-sdn/monitor.yaml (1)

27-27: Correctly applied to multiple ServiceMonitors with consolidated RBAC.

Both ServiceMonitor resources (monitor-sdn and monitor-sdn-controller) correctly receive serviceDiscoveryRole: EndpointSlice. The single RBAC rule properly covers both, avoiding duplication—this is the correct approach.

Also applies to: 76-76, 116-123

bindata/kube-proxy/monitor.yaml (1)

27-27: No issues found — serviceDiscoveryRole is valid and RBAC is correctly configured.

ServiceMonitor supports the serviceDiscoveryRole field with valid values "EndpointSlice" or "Endpoints" and is documented in the Prometheus Operator API reference. The RBAC rule at lines 67-74 correctly grants the required permissions for reading EndpointSlices.

bindata/network/ovn-kubernetes/common/monitor-node.yaml (1)

34-34: Verified: All DaemonSet ServiceMonitors have serviceDiscoveryRole field; all non-DaemonSet ServiceMonitors correctly lack it.

Verification confirms the review comment is accurate. All five DaemonSet workload ServiceMonitors (kube-proxy, frr-k8s, network-metrics-daemon, sdn, ovnkube-node) have received the serviceDiscoveryRole: EndpointSlice field. The five ServiceMonitors without this field correctly target Deployments, not DaemonSets. No DaemonSet ServiceMonitors were missed.


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot requested review from arghosh93 and miheer November 17, 2025 06:53
@simonpasquier
Copy link

/skip
/retest-required

Copy link

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Let's see if the e2e failures are caused by the PR but it looks ok to me.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 19, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 19, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: simonpasquier, slashpai
Once this PR has been reviewed and has the lgtm label, please assign abhat for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@slashpai
Copy link
Member Author

/retest-required

1 similar comment
@slashpai
Copy link
Member Author

/retest-required

@slashpai
Copy link
Member Author

/test e2e-aws-ovn-windows

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 20, 2025

@slashpai: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-upgrade 57acbaf link false /test 4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-upgrade
ci/prow/4.21-upgrade-from-stable-4.20-e2e-azure-ovn-upgrade 57acbaf link false /test 4.21-upgrade-from-stable-4.20-e2e-azure-ovn-upgrade
ci/prow/security 57acbaf link false /test security
ci/prow/frrk8s-e2e 57acbaf link false /test frrk8s-e2e
ci/prow/4.21-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade 57acbaf link false /test 4.21-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade
ci/prow/e2e-aws-ovn-windows 57acbaf link true /test e2e-aws-ovn-windows

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@simonpasquier
Copy link

/test e2e-aws-ovn-windows

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants