Skip to content

Conversation

@sdodson
Copy link
Member

@sdodson sdodson commented Oct 28, 2025

Assisted by: Cursor (claude-sonnet-4.5)

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 28, 2025
An extension to the DNS configuration that defines a list of services
and namespaces which are appended to the existing cluster image registry
entry stored in the SERVICES environment variable for the dns-resolver
daemonset running in the openshift-dns namespace. This allows node
services to resolve cluster services and connect to them. Potential use
cases are for augmenting monitoring leveraging a cluster hosted service
for any other means within a node scoped process.
@sdodson sdodson force-pushed the nodeservice-etc-host-injection branch from cdfae8f to fbdf513 Compare October 28, 2025 13:17
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 28, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 28, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign candita for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 28, 2025

@sdodson: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/markdownlint fbdf513 link true /test markdownlint

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Comment on lines +7 to +11
- "@Miciah"
- "@candita"
- "@alebedev87"
- "@JoelSpeed"
- "@everettraven"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Miciah @candita @alebedev87 @JoelSpeed @everettraven PTAL

I'll fix up the markdown lint issues but otherwise I think this covers the basics for the proposed implementation.

Comment on lines +66 to +70
// namespace is the namespace of the service.
// This field is required and must not be empty.
// +kubebuilder:validation:Required
// +kubebuilder:validation:MinLength=1
Namespace string `json:"namespace,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an API PR to give feedback on, or would you prefer detailed API feedback here?

This will need a max length of 63, and a validation to ensure compatibility with DNS1123 label formatting and the appropriate description for that validation included in the godoc (we can copy/paste from other APIs for all of this)

Comment on lines +72 to +76
// name is the name of the service.
// This field is required and must not be empty.
// +kubebuilder:validation:Required
// +kubebuilder:validation:MinLength=1
Name string `json:"name,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need to be validated as DNS1035 label and the documentation updated to describe that validation

Can copy/paste this from other fields using the same validation

// +listType=map
// +listMapKey=namespace
// +listMapKey=name
NodeServices []NodeService `json:"nodeServices,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will need both a minItems and maxItems, I think we previously discussed limiting this to something small (32?)

Comment on lines +146 to +148
3. The operator populates the node-resolver DaemonSet `SERVICES` environment variable with the list of service names
- Format: Comma-separated and/or space-separated list of service names in `name.namespace.svc` format
- Example: `SERVICES="image-registry.openshift-image-registry.svc, datastore.coredump.svc"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this already exist? If not, do we need the ambiguity of comma OR space separation? Cleaner to pick one no?

4. The DNS node resolver pods read the `SERVICES` environment variable on startup
5. The `update-node-resolver.sh` script runs on each node and:
- Parses the list of service names from the `SERVICES` environment variable (comma/space-separated)
- For each service, performs DNS lookups using `dig` to resolve the ClusterIP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For services, isn't the ClusterIP allocated on the container network at the point where the service is first handled? Wondering if the nodes actually need to dig vs just being told here's the IP on the container network (moving the lookup to the operator not the daemonset)

Or are we looking at host network level routing to the services here?

- Flexible format: supports comma-separated, space-separated, or mixed delimiters

**Limitations:**
- **Service list changes require pod restarts**: When services are added to or removed from the `nodeServices` list in the DNS CR, the operator must update the DaemonSet `SERVICES` environment variable, triggering a rolling restart of all node resolver pods
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How often are these changes likely to happen?

4. The DNS node resolver pods read the `SERVICES` environment variable on startup
5. The `update-node-resolver.sh` script runs on each node and:
- Parses the list of service names from the `SERVICES` environment variable (comma/space-separated)
- For each service, performs DNS lookups using `dig` to resolve the ClusterIP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this is about providing an option prior to DNS services being completely operational, does this work? What DNS is going to be available at this point?

- Efficient: ~1.4KB for 20 services, minimal impact on environment variable limits
- Flexible format: supports comma-separated, space-separated, or mixed delimiters

**Limitations:**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section appears to duplicate a number of the advantages?

- Different nodes temporarily track different service lists during the rollout

**For service IP changes** (ClusterIP of a tracked service changes):
- The `update-node-resolver.sh` script detects the IP change during its next 60-second polling cycle
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the DNS infrastructure is up and running, does it make sense to continue to have these host entries as a fallback? CoreDNS has its own caching doesn't it, so once it has fetched each of these, it should be able to serve the other host components?

- The DNS node resolver runs with elevated privileges (required to modify `/etc/hosts`)
- Only services with valid ClusterIPs will be added (no external IPs or LoadBalancer IPs)
- Service namespace and name validation prevents injection attacks
- The resolver will not add entries that conflict with existing system entries (e.g., localhost)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At what point are these entries validated/pruned?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants