Skip to content

Comments

WINC-1635, WINC-1592: enable log rotation for kubelet and kubeproxy services#3766

Draft
jrvaldes wants to merge 6 commits intoopenshift:masterfrom
jrvaldes:log-rotation-kubelet
Draft

WINC-1635, WINC-1592: enable log rotation for kubelet and kubeproxy services#3766
jrvaldes wants to merge 6 commits intoopenshift:masterfrom
jrvaldes:log-rotation-kubelet

Conversation

@jrvaldes
Copy link
Contributor

@jrvaldes jrvaldes commented Feb 6, 2026

This pull request introduces configurable log rotation and flushing for Windows kubelet services, allowing log file size, retention, and flush interval to be set via environment variables. It also adds helper functions to safely parse and use these environment variables, and includes unit tests to ensure their correctness.

Log rotation and flush configuration:

  • Added environment variable support for log file size (SERVICES_LOG_FILE_SIZE), log file age (SERVICES_LOG_FILE_AGE), and log flush interval (SERVICES_LOG_FLUSH_INTERVAL), with sensible defaults, to control log runner behavior in pkg/services/services.go. [1] [2]
  • Updated the kubelet service command generation to use a new helper function, getLogRunnerForCmd, which incorporates the log rotation and flush parameters. [1] [2]
  • Added the log flush interval to the kubelet configuration via the Logging field, using the new logsapi.LoggingConfiguration struct. [1] [2]

Helper functions and environment parsing:

  • Introduced getEnvQuantityOrDefault and getEnvDurationOrDefault functions to safely parse quantity and duration environment variables, falling back to defaults if invalid. [1] [2]

Testing:

  • Added unit tests for log runner command generation and environment variable parsing functions in pkg/services/services_test.go, covering various scenarios and edge cases.

Summary by CodeRabbit

  • New Features

    • Automatic log rotation for managed Windows services, configurable via environment variables (file size, age, flush interval)
    • Tooling to enable verbose operator debug logging and reduce service log file sizes (applies changes and ensures restart)
  • Updates

    • Kubelet configuration now includes structured logging with a 5s flush frequency
  • Tests

    • Added comprehensive tests for env parsing and log-runner command construction
  • Documentation

    • README updated with log rotation feature and enablement instructions
  • Chores

    • Dependency declarations adjusted for build consistency

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 6, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 6, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 6, 2026
@jrvaldes
Copy link
Contributor Author

jrvaldes commented Feb 6, 2026

/retitle WINC-1635: enable log rotation for kubelet service

@openshift-ci openshift-ci bot changed the title Log rotation kubelet WINC-1635: enable log rotation for kubelet service Feb 6, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 6, 2026
@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 6, 2026

@jrvaldes: This pull request references WINC-1635 which is a valid jira issue.

Details

In response to this:

This pull request introduces configurable log rotation and flushing for Windows kubelet services, allowing log file size, retention, and flush interval to be set via environment variables. It also adds helper functions to safely parse and use these environment variables, and includes unit tests to ensure their correctness.

Log rotation and flush configuration:

  • Added environment variable support for log file size (SERVICES_LOG_FILE_SIZE), log file age (SERVICES_LOG_FILE_AGE), and log flush interval (SERVICES_LOG_FLUSH_INTERVAL), with sensible defaults, to control log runner behavior in pkg/services/services.go. [1] [2]
  • Updated the kubelet service command generation to use a new helper function, getLogRunnerForCmd, which incorporates the log rotation and flush parameters. [1] [2]
  • Added the log flush interval to the kubelet configuration via the Logging field, using the new logsapi.LoggingConfiguration struct. [1] [2]

Helper functions and environment parsing:

  • Introduced getEnvQuantityOrDefault and getEnvDurationOrDefault functions to safely parse quantity and duration environment variables, falling back to defaults if invalid. [1] [2]

Testing:

  • Added unit tests for log runner command generation and environment variable parsing functions in pkg/services/services_test.go, covering various scenarios and edge cases.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jrvaldes jrvaldes force-pushed the log-rotation-kubelet branch from 336fc21 to ad7f5c8 Compare February 6, 2026 16:32
@jrvaldes
Copy link
Contributor Author

jrvaldes commented Feb 6, 2026

/test ?

@jrvaldes
Copy link
Contributor Author

jrvaldes commented Feb 6, 2026

/test lint

@jrvaldes
Copy link
Contributor Author

jrvaldes commented Feb 6, 2026

/test unit

@jrvaldes
Copy link
Contributor Author

jrvaldes commented Feb 6, 2026

/test images

@jrvaldes
Copy link
Contributor Author

jrvaldes commented Feb 6, 2026

/test vsphere-e2e-operator

@jrvaldes jrvaldes force-pushed the log-rotation-kubelet branch from 4469104 to f6a626f Compare February 6, 2026 17:57
@jrvaldes
Copy link
Contributor Author

jrvaldes commented Feb 6, 2026

/test vsphere-e2e-operator

1 similar comment
@jrvaldes
Copy link
Contributor Author

jrvaldes commented Feb 6, 2026

/test vsphere-e2e-operator

@jrvaldes jrvaldes force-pushed the log-rotation-kubelet branch from 06d4c31 to ca06837 Compare February 7, 2026 04:49
@jrvaldes
Copy link
Contributor Author

jrvaldes commented Feb 7, 2026

/test vsphere-e2e-operator

@jrvaldes jrvaldes force-pushed the log-rotation-kubelet branch from ca06837 to f994aee Compare February 11, 2026 22:58
@jrvaldes
Copy link
Contributor Author

/test vsphere-e2e-operator

@jrvaldes jrvaldes force-pushed the log-rotation-kubelet branch from f994aee to 547445b Compare February 12, 2026 04:15
@jrvaldes
Copy link
Contributor Author

/test vsphere-e2e-operator

@jrvaldes jrvaldes force-pushed the log-rotation-kubelet branch from 547445b to abd7af6 Compare February 12, 2026 13:07
@jrvaldes
Copy link
Contributor Author

/test vsphere-e2e-operator

@coderabbitai
Copy link

coderabbitai bot commented Feb 12, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds kubelet Logging configuration (FlushFrequency 5s), promotes k8s.io/component-base to a direct go.mod require, introduces environment-driven log rotation for managed Windows services with helpers, tests and package init-time env parsing, and adds enable_debug_logging() in hack/common.sh to toggle WMCO debug logging.

Changes

Cohort / File(s) Summary
Dependency change
go.mod
Moved k8s.io/component-base v0.34.4 from indirect to direct require.
Kubelet logging
pkg/nodeconfig/nodeconfig.go, pkg/nodeconfig/nodeconfig_test.go
Imported k8s.io/component-base/logs/api/v1 and added Logging to generated KubeletConfiguration (FlushFrequency = 5s, SerializeAsString enabled); test expectation updated from 0s to 5s.
Service init vars
pkg/services/init.go
New file: declares package-level vars (logFileSize, logFileAge, flushInterval) and initializes them from env using getEnvQuantity/getEnvDuration, logging errors on parse failures.
Service log runner & env parsing
pkg/services/services.go
Added env var constants (SERVICES_LOG_FILE_SIZE, SERVICES_LOG_FILE_AGE, SERVICES_LOG_FLUSH_INTERVAL), helpers (getEnvQuantity, getEnvDuration), and getLogRunnerForCmd to wrap service commands with kube-log-runner flags; updated service command construction to use the wrapper.
Service tests
pkg/services/services_test.go
New tests covering getEnvDuration, getEnvQuantity, and getLogRunnerForCmd, including ordering assertions and package-level state preservation/restoration.
Test automation helper
hack/common.sh
Added enable_debug_logging() to detect OLM v0/v1 and patch WMCO subscription or manager container to enable --debugLogging and set SERVICES_LOG_FILE_SIZE=1M; OLmv1 path applies envs in one call and restarts the deployment, includes verification retry loop and waits for rollout; explicit error handling on unknown OLM version.
Documentation
README.md
Added "Automatic log rotation for managed Windows services" feature section (appears duplicated in the diff); minor formatting tweaks.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: enabling log rotation for kubelet and kubeproxy services with JIRA issue references.
Stable And Deterministic Test Names ✅ Passed All test names in the pull request are static and deterministic with no dynamic content like timestamps, UUIDs, or generated suffixes.
Test Structure And Quality ✅ Passed The test code meets all five quality requirements: single responsibility with focused test cases, proper setup/cleanup using t.Cleanup() and t.Setenv(), no timeouts needed for unit tests, meaningful assertion messages with context, and consistency with codebase conventions using testify and table-driven patterns.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@jrvaldes
Copy link
Contributor Author

@coderabbitai review

@jrvaldes jrvaldes force-pushed the log-rotation-kubelet branch from abd7af6 to 5abf765 Compare February 16, 2026 22:07
@jrvaldes
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Feb 16, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Fix all issues with AI agents
Verify each finding against the current code and only fix it if needed.


In `@pkg/services/services_test.go`:
- Around line 158-189: The test TestGetLogRunnerForCmd relies on package-level
defaults (logFileSize, logFileAge, flushInterval) which are initialized from
environment variables; make the test deterministic by setting those env vars at
test start (use t.Setenv for SERVICES_LOG_FILE_SIZE, SERVICES_LOG_FILE_AGE,
SERVICES_LOG_FLUSH_INTERVAL) to the expected default values ("100M", "168h",
"5s") before calling getLogRunnerForCmd so the output always matches the
hardcoded expected strings.
🧹 Nitpick comments (1)
🤖 Fix all nitpicks with AI agents
Verify each finding against the current code and only fix it if needed.


In `@pkg/services/services_test.go`:
- Around line 158-189: The test TestGetLogRunnerForCmd relies on package-level
defaults (logFileSize, logFileAge, flushInterval) which are initialized from
environment variables; make the test deterministic by setting those env vars at
test start (use t.Setenv for SERVICES_LOG_FILE_SIZE, SERVICES_LOG_FILE_AGE,
SERVICES_LOG_FLUSH_INTERVAL) to the expected default values ("100M", "168h",
"5s") before calling getLogRunnerForCmd so the output always matches the
hardcoded expected strings.
pkg/services/services_test.go (1)

158-189: Test relies on package-level defaults initialized at startup.

The test hardcodes expected values (100M, 168h, 5s) that match the default constants. However, since logFileSize, logFileAge, and flushInterval are set in init() from environment variables, if any of these env vars (SERVICES_LOG_FILE_SIZE, SERVICES_LOG_FILE_AGE, SERVICES_LOG_FLUSH_INTERVAL) are set in the test environment, this test will fail unexpectedly.

Consider either:

  1. Explicitly setting the env vars in this test using t.Setenv to ensure predictable behavior, or
  2. Documenting that this test assumes default values
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/services/services_test.go` around lines 158 - 189, The test
TestGetLogRunnerForCmd relies on package-level defaults (logFileSize,
logFileAge, flushInterval) which are initialized from environment variables;
make the test deterministic by setting those env vars at test start (use
t.Setenv for SERVICES_LOG_FILE_SIZE, SERVICES_LOG_FILE_AGE,
SERVICES_LOG_FLUSH_INTERVAL) to the expected default values ("100M", "168h",
"5s") before calling getLogRunnerForCmd so the output always matches the
hardcoded expected strings.

@jrvaldes
Copy link
Contributor Author

/test vsphere-e2e-operator

@jrvaldes
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Feb 19, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@jrvaldes
Copy link
Contributor Author

@coderabbitai full review

@coderabbitai
Copy link

coderabbitai bot commented Feb 19, 2026

✅ Actions performed

Full review triggered.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/services/init.go`:
- Around line 12-24: The log.Error calls are using printf-style "%s" which won’t
be interpolated by controller-runtime's logr; update each call (the three uses
of log.Error in this snippet) to pass the variable name as a structured
key-value pair instead of a format string — e.g. replace log.Error(err, "cannot
load environment variable %s", logFileSizeEnvVar) with something like
log.Error(err, "cannot load environment variable", "env", logFileSizeEnvVar); do
the same for the calls around getEnvDuration and the variables logFileAgeEnvVar
and logFlushIntervalEnvVar so the variable names are logged as keys rather than
a literal format string.

In `@README.md`:
- Around line 271-278: The "Not yet supported" list contains a duplicated entry
"csi-proxy"; edit the README's list (the "Not yet supported" section) to remove
the duplicate so "csi-proxy" appears only once, ensuring the other entries
(containerd, windows_exporter, hybrid-overlay-node, azure-cloud-node-manager)
remain unchanged and the list ordering/format stays consistent.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/services/services_test.go (1)

486-494: Consider using strings.Index from the standard library.

This helper reimplements strings.Index. The stdlib version would be more idiomatic.

-// indexOf returns the index of the first occurrence of substr in s, or -1 if not found.
-func indexOf(s, substr string) int {
-	for i := 0; i <= len(s)-len(substr); i++ {
-		if s[i:i+len(substr)] == substr {
-			return i
-		}
-	}
-	return -1
-}
+import "strings"
+// Then use strings.Index(s, substr) directly in the test
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/services/services_test.go` around lines 486 - 494, The custom indexOf
function duplicates stdlib behavior; replace its usage with strings.Index by
removing or deprecating indexOf and updating callers to call strings.Index(s,
substr) instead (and add an import "strings" where needed); alternatively, if
you must keep a wrapper, implement indexOf to return strings.Index(s, substr)
and add the strings import.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@pkg/services/init.go`:
- Around line 11-24: The log.Error calls in init.go are using printf-style "%s"
which won't be interpolated by controller-runtime's logr; update each failing
call (the three calls that follow getEnvQuantity and getEnvDuration) to use
structured key/value args instead of "%s" so the error and the environment
variable name are recorded (e.g., call log.Error(err, "cannot load environment
variable", "env", logFileSizeEnvVar) and similarly for logFileAgeEnvVar and
logFlushIntervalEnvVar), keeping the same error variable and message context.

In `@README.md`:
- Around line 271-276: In the "Not yet supported" list titled "Not yet
supported" remove the duplicate csi-proxy entry so it appears only once; locate
the bullet list containing containerd, csi-proxy, windows_exporter,
hybrid-overlay-node and ensure only a single csi-proxy line remains (delete the
repeated csi-proxy bullet).

---

Nitpick comments:
In `@pkg/services/services_test.go`:
- Around line 486-494: The custom indexOf function duplicates stdlib behavior;
replace its usage with strings.Index by removing or deprecating indexOf and
updating callers to call strings.Index(s, substr) instead (and add an import
"strings" where needed); alternatively, if you must keep a wrapper, implement
indexOf to return strings.Index(s, substr) and add the strings import.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 19, 2026

@jrvaldes: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@jrvaldes jrvaldes force-pushed the log-rotation-kubelet branch from b469a17 to 646672d Compare February 19, 2026 16:38
@jrvaldes
Copy link
Contributor Author

vsphere-e2e-operator passed, and you can see the log rotation configuration applied in the

name: windows-services-10.22.0-b9b2652-dirty:

"name":"kubelet",
"path":"C:\\k\\kube-log-runner.exe
      -log-file=C:\\var\\log\\kubelet\\kubelet.log -log-file-size=1M C:\\k\\kubelet.exe
      --config=C:\\k\\kubelet.conf --bootstrap-kubeconfig=C:\\k\\bootstrap-kubeconfig
      --kubeconfig=C:\\k\\kubeconfig --cert-dir=c:\\var\\lib\\kubelet\\pki\\ --windows-service
      --node-labels=node.openshift.io/os_id=Windows --windows-priorityclass=ABOVE_NORMAL_PRIORITY_CLASS
      --v=4 --cloud-provider=external --hostname-override=HOSTNAME_OVERRIDE --node-ip=NODE_IP"
"name":"kube-proxy",
"path":"C:\\k\\kube-log-runner.exe
      -log-file=C:\\var\\log\\kube-proxy\\kube-proxy.log -log-file-size=1M C:\\k\\kube-proxy.exe
      --config C:\\k\\kube-proxy.conf --windows-service"

Copy link
Member

@mansikulkarni96 mansikulkarni96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good, some questions.


#### Patching the subscription using the CLI:
```shell script
oc patch subscription <subscription_name> -n <namespace_name> \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this will need an operator restart?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100%, a subs change with spin a new pod. I will add a line make it explicit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


#### Patching the operator deployment using the CLI:
```shell script
oc set env deployment/windows-machine-config-operator -n <namespace_name> -c manager \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing this change, but have we removed setting default values for these env variables now?

Copy link
Contributor Author

@jrvaldes jrvaldes Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct, that was the agreement, the feature is disabled by default. So no default values.


For details on the log rotation naming convention, please refer to the [kube-log-runner documentation](https://github.com/kubernetes/kubernetes/tree/master/staging/src/k8s.io/component-base/logs/kube-log-runner)

The log rotation functionality is disabled by default, meaning:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this commit configures kubelet service to use kube-log-runner wrapper to
enable optional log rotation on Windows nodes. This prevents unbounded log
growth that could exhaust disk space.

Log rotation parameters are configurable via environment variables:
- SERVICES_LOG_FILE_SIZE: Size limit before rotation
- SERVICES_LOG_FILE_AGE: Retention period for rotated logs
- SERVICES_LOG_FLUSH_INTERVAL: Flush interval to disk

This approach is necessary because Windows services don't have a native
mechanism for output redirection, and Kubernetes has deprecated the
--log-file flag for components.
ran:
go mod tidy && go mod vendor
@jrvaldes jrvaldes force-pushed the log-rotation-kubelet branch from 646672d to 8533948 Compare February 19, 2026 16:57
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
README.md (2)

249-316: Consider adding guidance for disabling log rotation and handling configuration changes.

The documentation clearly explains how to enable and configure log rotation, but could be enhanced with:

  1. Instructions for disabling log rotation (reverting to defaults)
  2. Guidance on what happens to existing logs when changing rotation settings
  3. Expected operator/node behavior during configuration changes

These additions would provide a more complete operational picture for cluster administrators.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 249 - 316, The README adds only enable instructions
for log rotation but lacks disable guidance and behavior notes; update the
"Enabling log rotation for managed Windows services" section to include (1) how
to disable rotation by removing or clearing the SERVICES_LOG_FILE_SIZE /
SERVICES_LOG_FILE_AGE / SERVICES_LOG_FLUSH_INTERVAL env vars (or setting them to
empty/0 where applicable) for both Subscription and operator Deployment
examples, (2) a short paragraph describing what happens to existing log files
when rotation settings change (existing rotated files are retained per previous
SERVICES_LOG_FILE_AGE until expiry; new rotation rules apply going forward), and
(3) a brief note on operator/node behavior during config changes (operator
restart when env changes, node-by-node reconfiguration to apply new settings and
no service downtime guarantee), referencing the env var names and the operator
deployment/subscription examples already present so readers can find the
locations to change.

314-316: Consider improving the formatting of parameter explanations.

The "where:" section could be more readable using a markdown table or consistent formatting. For example:

**Parameters:**
- `<namespace_name>`: The namespace where the operator is installed (e.g., `openshift-windows-machine-config-operator`)
- `<subscription_name>`: The name of the subscription used to install the operator (e.g., `windows-machine-config-operator-subscription`)

This improves scannability and consistency with other documentation sections.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 314 - 316, Update the "where:" parameter explanations
to use consistent, more readable formatting: replace the current plain lines for
<namespace_name> and <subscription_name> with either a Markdown bullet list or a
small table, and include inline examples in backticks and parentheses (e.g.,
`<namespace_name>`: The namespace where the operator is installed (e.g.,
`openshift-windows-machine-config-operator`) and `<subscription_name>`: The name
of the subscription used to install the operator (e.g.,
`windows-machine-config-operator-subscription`); ensure the same style is
applied to both parameter lines for consistency with other docs sections.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@README.md`:
- Around line 249-316: The README adds only enable instructions for log rotation
but lacks disable guidance and behavior notes; update the "Enabling log rotation
for managed Windows services" section to include (1) how to disable rotation by
removing or clearing the SERVICES_LOG_FILE_SIZE / SERVICES_LOG_FILE_AGE /
SERVICES_LOG_FLUSH_INTERVAL env vars (or setting them to empty/0 where
applicable) for both Subscription and operator Deployment examples, (2) a short
paragraph describing what happens to existing log files when rotation settings
change (existing rotated files are retained per previous SERVICES_LOG_FILE_AGE
until expiry; new rotation rules apply going forward), and (3) a brief note on
operator/node behavior during config changes (operator restart when env changes,
node-by-node reconfiguration to apply new settings and no service downtime
guarantee), referencing the env var names and the operator
deployment/subscription examples already present so readers can find the
locations to change.
- Around line 314-316: Update the "where:" parameter explanations to use
consistent, more readable formatting: replace the current plain lines for
<namespace_name> and <subscription_name> with either a Markdown bullet list or a
small table, and include inline examples in backticks and parentheses (e.g.,
`<namespace_name>`: The namespace where the operator is installed (e.g.,
`openshift-windows-machine-config-operator`) and `<subscription_name>`: The name
of the subscription used to install the operator (e.g.,
`windows-machine-config-operator-subscription`); ensure the same style is
applied to both parameter lines for consistency with other docs sections.

this commit documents the automatic log rotation for managed
Windows services in the Enabled features section in the README.md
@jrvaldes
Copy link
Contributor Author

/remove-approve

@openshift-ci openshift-ci bot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 19, 2026
@mansikulkarni96
Copy link
Member

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 24, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mansikulkarni96

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 24, 2026
@jrvaldes
Copy link
Contributor Author

@jrvaldes
Copy link
Contributor Author

@coderabbitai full review

@coderabbitai
Copy link

coderabbitai bot commented Feb 24, 2026

✅ Actions performed

Full review triggered.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
hack/common.sh (1)

393-406: ⚠️ Potential issue | 🟡 Minor

Avoid overwriting existing ARGS/env entries when enabling debug logging.

The OLMv0 merge patch replaces the env array, and the OLMv1 oc set env call overwrites ARGS, which can drop existing flags or env-based config. Consider merging instead of replacing, and append --debugLogging only if missing.

🔧 Example adjustment for OLMv1 ARGS preservation
-    # Add debug env variable and log file limit to the WMCO manager container
-    oc set env deployment/windows-machine-config-operator -n "$WMCO_DEPLOY_NAMESPACE" \
-      ARGS="--debugLogging" \
-      SERVICES_LOG_FILE_SIZE="1M" \
-      -c manager
+    # Add debug env variable and log file limit to the WMCO manager container
+    current_args=$(oc get deployment/windows-machine-config-operator -n "$WMCO_DEPLOY_NAMESPACE" \
+      -o jsonpath='{.spec.template.spec.containers[?(@.name=="manager")].env[?(@.name=="ARGS")].value}')
+    if [[ -z "$current_args" ]]; then
+      current_args="--debugLogging"
+    elif [[ "$current_args" != *"--debugLogging"* ]]; then
+      current_args="$current_args --debugLogging"
+    fi
+    oc set env deployment/windows-machine-config-operator -n "$WMCO_DEPLOY_NAMESPACE" \
+      ARGS="$current_args" \
+      SERVICES_LOG_FILE_SIZE="1M" \
+      -c manager
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/common.sh` around lines 393 - 406, The current OLMv0 oc patch (using
'{"spec":{"config":{"env":[...]}}}') and the OLMv1 oc set env call overwrite the
existing env/ARGS and can drop flags; update the logic so we merge/update
environment entries instead of replacing them: for the OLMv0 path, patch the
subscription/deployment config by adding/updating only the ARGS and
SERVICES_LOG_FILE_SIZE entries (preserve existing env array and other entries)
and for the OLMv1 path read the current ARGS value from
deployment/windows-machine-config-operator (container "manager"), append
"--debugLogging" only if it is not already present, and set/patch
SERVICES_LOG_FILE_SIZE without clobbering other env vars. Ensure you reference
WMCO_SUB, ARGS, SERVICES_LOG_FILE_SIZE, oc patch and oc set
env/deployment/windows-machine-config-operator when implementing the
merge/update logic.
🧹 Nitpick comments (1)
README.md (1)

329-333: Minor wording tweak for concision.

Consider replacing “in order to” with “to.”

📝 Suggested edit
- will cause the operator to restart in order to load the updated configuration.
+ will cause the operator to restart to load the updated configuration.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 329 - 333, The sentence describing operator restart
uses the phrase "in order to" which is wordy; edit the sentence that mentions
SERVICES_LOG_FILE_SIZE, SERVICES_LOG_FILE_AGE, and SERVICES_LOG_FLUSH_INTERVAL
so it reads "...will cause the operator to restart to load the updated
configuration." — replace "in order to" with "to" in that sentence to improve
concision.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@hack/common.sh`:
- Around line 393-406: The current OLMv0 oc patch (using
'{"spec":{"config":{"env":[...]}}}') and the OLMv1 oc set env call overwrite the
existing env/ARGS and can drop flags; update the logic so we merge/update
environment entries instead of replacing them: for the OLMv0 path, patch the
subscription/deployment config by adding/updating only the ARGS and
SERVICES_LOG_FILE_SIZE entries (preserve existing env array and other entries)
and for the OLMv1 path read the current ARGS value from
deployment/windows-machine-config-operator (container "manager"), append
"--debugLogging" only if it is not already present, and set/patch
SERVICES_LOG_FILE_SIZE without clobbering other env vars. Ensure you reference
WMCO_SUB, ARGS, SERVICES_LOG_FILE_SIZE, oc patch and oc set
env/deployment/windows-machine-config-operator when implementing the
merge/update logic.

---

Nitpick comments:
In `@README.md`:
- Around line 329-333: The sentence describing operator restart uses the phrase
"in order to" which is wordy; edit the sentence that mentions
SERVICES_LOG_FILE_SIZE, SERVICES_LOG_FILE_AGE, and SERVICES_LOG_FLUSH_INTERVAL
so it reads "...will cause the operator to restart to load the updated
configuration." — replace "in order to" with "to" in that sentence to improve
concision.

ℹ️ Review info

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 0c513b2 and 98ddd01.

📒 Files selected for processing (8)
  • README.md
  • go.mod
  • hack/common.sh
  • pkg/nodeconfig/nodeconfig.go
  • pkg/nodeconfig/nodeconfig_test.go
  • pkg/services/init.go
  • pkg/services/services.go
  • pkg/services/services_test.go

Copy link
Contributor

@sebsoto sebsoto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commits seem to be out of order, the commit that implements the env var parsing should be first shouldn't it?

Comment on lines +259 to +261
- no rotation, log file grows indefinitely
- no cleanup, old rotated files are never deleted
- no periodic flushing, relies on OS buffer flushing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is overly verbose, essentially repeating the same thing over again. I recommend trimming to this:

The log rotation functionality is disabled by default, causing log files to grow indefinitely

Comment on lines +280 to +283
To enable and customize the log rotation behavior, add the following environment variables to the subscription
(OLMv0) or operator deployment (if using OLMv1 or installing manually). The operator will restart to load the newly
added environment variables and apply log rotation to the managed services. This will result in a reconfiguration of
the existing Windows nodes, one at a time, until all nodes have been handled, to minimize disruption.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While OLMv1 is in tech preview I recommend not mentioning it, or adding it as a separate section

Comment on lines +263 to +265
> **Warning:** The defaults **are not** recommended for production installations, logs will grow indefinitely and
eventually fill the disk. Unless you have a specific reason to disable log rotation, or have an alternative
log management solution in place, consider enabling log rotation as described below.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this statement is believed to be true, then we should not be shipping it with defaulted off.
I recommend creating a new story for 10.23.0 (assuming this feature is in 10.22.0) to default to on.

Comment on lines +317 to +334
### Disabling log rotation for managed Windows services

To disable log rotation, remove the `SERVICES_LOG_FILE_SIZE`, `SERVICES_LOG_FILE_AGE`, and `SERVICES_LOG_FLUSH_INTERVAL`
environment variables from the subscription or operator deployment.

### Behavior when log rotation settings change

**Effect on existing log files:** When rotation settings are changed (enabled, disabled, or updated), any previously
rotated log files are retained according to the `SERVICES_LOG_FILE_AGE` value that was in effect when they were
created. Once that retention period expires, the files are cleaned up automatically. New log files and any future
rotated files will follow the updated rotation rules going forward.

**Operator and node behavior:** Any change to the `SERVICES_LOG_FILE_SIZE`, `SERVICES_LOG_FILE_AGE`, or
`SERVICES_LOG_FLUSH_INTERVAL` environment variables—whether in the subscription (OLMv0) or the operator deployment
(OLMv1 / manual installs)—will cause the operator to restart in order to load the updated configuration. After
restarting, the operator will reconfigure each Windows node one at a time to apply the new log rotation settings,
minimizing disruption. Note that service continuity during reconfiguration is not guaranteed; brief interruptions
to managed services (such as kubelet or kube-proxy) may occur on each node as it is reconfigured.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot of documentation for log rotation.
Consider moving to a separate file and linking from the README.

Comment on lines +755 to +760
Logging: logsapi.LoggingConfiguration{
FlushFrequency: logsapi.TimeOrMetaDuration{
Duration: meta.Duration{Duration: 5 * time.Second},
SerializeAsString: true,
},
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is explicitly setting a default, which is fine, but the commit message is misleading as there is no functional change here.

// not present
return "", nil
}
if strings.HasPrefix(value, "-") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't parse the string, if you are already parsing this as a quantity.
https://pkg.go.dev/k8s.io/apimachinery/pkg/api/resource#Quantity.Sign


// getEnvQuantity returns the value of the environment variable with the given key
// if it represents a valid and non-negative quantity, otherwise returns error
func getEnvQuantity(key string) (string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getEnvQuantity returns a string, not a quantity?
Also why does getEnvQuantity care if it is "non-negative". The caller should check that shouldn't they?


// getEnvDuration returns the value of the environment variable with the given key
// if it represents a valid and non-negative duration, otherwise returns error
func getEnvDuration(key string) (string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as this 7722cda#r2850043734

But time.Duration rather than Quantity

}
}

func TestGetEnvDuration(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this much testing is required for parsing environment variables its a good sign something might need to be tweaked 😉

}

// indexOf returns the index of the first occurrence of substr in s, or -1 if not found.
func indexOf(s, substr string) int {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants