Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 27 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,31 @@ in a healthy state with no disruptions.

## Enabled features

### Automatic log rotation for managed Windows services

Automatic rotation of the log files for the managed Windows services is available to prevent disk space
exhaustion. Uses [kube-log-runner](https://github.com/kubernetes/kubernetes/tree/master/staging/src/k8s.io/component-base/logs/kube-log-runner)
as a wrapper binary that executes the service while capturing stdout/stderr, rotates logs based on size
and automatically cleaning up old files based on age.

For details on the log rotation naming convention, please refer to the [kube-log-runner documentation](https://github.com/kubernetes/kubernetes/tree/master/staging/src/k8s.io/component-base/logs/kube-log-runner)

The log rotation functionality is disabled by default, causing log files to grow indefinitely.

Managed Windows services with log rotation capabilities:
- kubelet
- kube-proxy

Not yet supported:
- containerd
- csi-proxy
- windows_exporter
- hybrid-overlay-node
- azure-cloud-node-manager

For instructions to enable, customize or disable log rotation refer to
[log rotation for managed Windows services documentation](docs/log-rotation-managed-services.md).

### Autoscaling Windows nodes
Cluster autoscaling is supported for Windows instances.

Expand All @@ -257,7 +282,7 @@ Cluster autoscaling is supported for Windows instances.
Windows instances brought up with WMCO are set up with the containerd container runtime. As WMCO installs and manages the container runtime,
it is recommended not to preinstall containerd in MachineSet or BYOH Windows instances.

### Cluster-wide proxy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace nit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will address in separate PR

### Cluster-wide proxy
WMCO supports using a [cluster-wide proxy](https://docs.openshift.com/container-platform/latest/networking/enable-cluster-wide-proxy.html)
to route egress traffic from Windows nodes on OpenShift Container Platform.

Expand Down Expand Up @@ -297,7 +322,7 @@ Some valid values could be: `$mirrorRegistry/oss/kubernetes/pause:3.9`, `$mirror

### Horizontal Pod Autoscaling
Horizontal Pod autoscaling is available for Windows workloads.
Please follow the [Horizontal Pod autoscaling docs](https://docs.openshift.com/container-platform/latest/nodes/pods/nodes-pods-autoscaling.html)
Please follow the [Horizontal Pod autoscaling docs](https://docs.openshift.com/container-platform/latest/nodes/pods/nodes-pods-autoscaling.html)
to create a horizontal pod autoscaler object for CPU and memory utilization of Windows workloads.

## Limitations
Expand Down
63 changes: 63 additions & 0 deletions docs/log-rotation-managed-services.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Log rotation for managed Windows services

Log rotation for managed Windows services is available for WMCO 10.22+. This feature rotates log files based
on configurable size and age thresholds and is configured via environment variables in the operator.

## Enabling log rotation for managed Windows services

To enable and customize the log rotation behavior, add the following environment variables to the subscription (OLMv0).
The operator will restart to load the newly added environment variables and apply log rotation to the
managed services. This will result in a reconfiguration of the existing Windows nodes, one at a time, until all
nodes have been handled, to minimize disruption.

### Setting environment variables in the subscription:
```yaml
kind: Subscription
spec:
config:
env:
- name: SERVICES_LOG_FILE_SIZE
value: "100M" # Rotate when log reaches this size (suggested: 100M)
- name: SERVICES_LOG_FILE_AGE
value: "168h" # Keep rotated logs for this duration (e.g: 168h/7 days)
- name: SERVICES_LOG_FLUSH_INTERVAL
value: "5s" # Flush logs to disk at this interval (suggested: 5s)
```

### Patching the subscription using the CLI:
```shell script
oc patch subscription <subscription_name> -n <namespace_name> \
--type=merge \
-p '{"spec":{"config":{"env":[{"name":"SERVICES_LOG_FILE_SIZE","value":"100M"},{"name":"SERVICES_LOG_FILE_AGE","value":"168h"},{"name":"SERVICES_LOG_FLUSH_INTERVAL","value":"5s"}]}}}'
```

### Patching the operator deployment using the CLI (OLMv1 or manual installs):

```shell script
oc set env deployment/windows-machine-config-operator -n <namespace_name> -c manager \
SERVICES_LOG_FILE_SIZE="100M" \
SERVICES_LOG_FILE_AGE="168h" \
SERVICES_LOG_FLUSH_INTERVAL="5s"
```
where:
- `<namespace_name>`: The namespace where the operator is installed (e.g., `openshift-windows-machine-config-operator`)
- `<subscription_name>`: The name of the subscription used to install the operator (e.g., `windows-machine-config-operator-subscription`)

## Disabling log rotation for managed Windows services

To disable log rotation, remove the `SERVICES_LOG_FILE_SIZE`, `SERVICES_LOG_FILE_AGE`, and `SERVICES_LOG_FLUSH_INTERVAL`
environment variables from the subscription or operator deployment.

## Behavior when log rotation settings change

**Effect on existing log files:** When rotation settings are changed (enabled, disabled, or updated), any previously
rotated log files are retained according to the `SERVICES_LOG_FILE_AGE` value that was in effect when they were
created. Once that retention period expires, the files are cleaned up automatically. New log files and any future
rotated files will follow the updated rotation rules going forward.

**Operator and node behavior:** Any change to the `SERVICES_LOG_FILE_SIZE`, `SERVICES_LOG_FILE_AGE`, or
`SERVICES_LOG_FLUSH_INTERVAL` environment variables—whether in the subscription (OLMv0) or the operator deployment
(OLMv1 / manual installs)—will cause the operator to restart in order to load the updated configuration. After
restarting, the operator will reconfigure each Windows node one at a time to apply the new log rotation settings,
minimizing disruption. Note that service continuity during reconfiguration is not guaranteed; brief interruptions
to managed services (such as kubelet or kube-proxy) may occur on each node as it is reconfigured.
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ require (
k8s.io/apimachinery v0.34.4
k8s.io/client-go v0.34.4
k8s.io/cloud-provider v0.34.4
k8s.io/component-base v0.34.4
k8s.io/klog/v2 v2.130.1
k8s.io/kubectl v0.34.4
k8s.io/kubelet v0.34.4
Expand Down Expand Up @@ -153,7 +154,6 @@ require (
k8s.io/apiextensions-apiserver v0.34.4 // indirect
k8s.io/apiserver v0.34.4 // indirect
k8s.io/cli-runtime v0.34.4 // indirect
k8s.io/component-base v0.34.4 // indirect
k8s.io/controller-manager v0.34.4 // indirect
k8s.io/kube-openapi v0.0.0-20260127142750-a19766b6e2d4 // indirect
sigs.k8s.io/apiserver-network-proxy/konnectivity-client v0.31.2 // indirect
Expand Down
13 changes: 10 additions & 3 deletions hack/common.sh
Original file line number Diff line number Diff line change
Expand Up @@ -380,6 +380,10 @@ deleteParallelUpgradeCheckerResources() {
}


# Enables debug logging and set smaller size for services log file in the operator pod to make it easier to
# troubleshoot issues in CI.
# The method for patching the deployment depends on the OLM version, which is detected by checking for the presence
# of a subscription (OLMv0) or clusterextension (OLMv1).
enable_debug_logging() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kind of weird to comment on, but I remember enable_debug_logging causing problems for the submodule upgrade script. Might be good to be sure that these changes don't mess with it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont anticipate any issues. @wgahnagl do you recommend creating a specific test to validate this?

if [[ $(oc get -n $WMCO_DEPLOY_NAMESPACE pod -l name=windows-machine-config-operator -ojson) == *"--debugLogging"* ]]; then
echo "Debug logging already enabled"
Expand All @@ -390,13 +394,16 @@ enable_debug_logging() {
WMCO_SUB=$(oc get sub -n "$WMCO_DEPLOY_NAMESPACE" --no-headers 2>/dev/null | awk '{print $1}')
if [[ -n "$WMCO_SUB" ]]; then
echo "Detected OLMv0, patching subscription $WMCO_SUB"
oc patch subscription $WMCO_SUB -n $WMCO_DEPLOY_NAMESPACE --type=merge -p '{"spec":{"config":{"env":[{"name":"ARGS","value":"--debugLogging"}]}}}'
oc patch subscription $WMCO_SUB -n $WMCO_DEPLOY_NAMESPACE --type=merge -p '{"spec":{"config":{"env":[{"name":"ARGS","value":"--debugLogging"},{"name":"SERVICES_LOG_FILE_SIZE","value":"1M"}]}}}'
# delete the deployment to ensure the changes are picked up in a timely matter
oc delete deployment -n $WMCO_DEPLOY_NAMESPACE windows-machine-config-operator
elif oc get clusterextension windows-machine-config-operator &>/dev/null; then
echo "Detected OLMv1, patching deployment directly..."
# Add debug env variable to the WMCO manager container
oc set env deployment/windows-machine-config-operator -n "$WMCO_DEPLOY_NAMESPACE" ARGS="--debugLogging" -c manager
# Add debug env variable and log file limit to the WMCO manager container
oc set env deployment/windows-machine-config-operator -n "$WMCO_DEPLOY_NAMESPACE" \
ARGS="--debugLogging" \
SERVICES_LOG_FILE_SIZE="1M" \
-c manager
# force restart to pick up the env variable change
oc scale deployment/windows-machine-config-operator -n "$WMCO_DEPLOY_NAMESPACE" --replicas=0
oc scale deployment/windows-machine-config-operator -n "$WMCO_DEPLOY_NAMESPACE" --replicas=1
Expand Down
23 changes: 15 additions & 8 deletions pkg/nodeconfig/nodeconfig.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ import (
"k8s.io/apimachinery/pkg/util/wait"
"k8s.io/client-go/kubernetes"
clientcmdv1 "k8s.io/client-go/tools/clientcmd/api/v1"
logsapi "k8s.io/component-base/logs/api/v1"
"k8s.io/kubectl/pkg/drain"
kubeletconfigv1 "k8s.io/kubelet/config/v1"
kubeletconfig "k8s.io/kubelet/config/v1beta1"
Expand Down Expand Up @@ -743,14 +744,20 @@ func generateKubeletConfiguration(clusterDNS string) kubeletconfig.KubeletConfig
Enabled: &falseBool,
},
},
ClusterDomain: "cluster.local",
ClusterDNS: []string{clusterDNS},
CgroupsPerQOS: &falseBool,
RuntimeRequestTimeout: meta.Duration{Duration: 10 * time.Minute},
MaxPods: 250,
KubeAPIQPS: &kubeAPIQPS,
KubeAPIBurst: 100,
SerializeImagePulls: &falseBool,
ClusterDomain: "cluster.local",
ClusterDNS: []string{clusterDNS},
CgroupsPerQOS: &falseBool,
RuntimeRequestTimeout: meta.Duration{Duration: 10 * time.Minute},
MaxPods: 250,
KubeAPIQPS: &kubeAPIQPS,
KubeAPIBurst: 100,
SerializeImagePulls: &falseBool,
Logging: logsapi.LoggingConfiguration{
FlushFrequency: logsapi.TimeOrMetaDuration{
Duration: meta.Duration{Duration: 5 * time.Second},
SerializeAsString: true,
},
},
Comment on lines +755 to +760
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is explicitly setting a default, which is fine, but the commit message is misleading as there is no functional change here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated commit message

EnableSystemLogHandler: &trueBool,
EnableSystemLogQuery: &trueBool,
FeatureGates: map[string]bool{
Expand Down
2 changes: 1 addition & 1 deletion pkg/nodeconfig/nodeconfig_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ func TestCreateKubeletConf(t *testing.T) {
{
name: "valid cidr",
cidr: "10.0.128.8/24",
expectedSpec: "{\"kind\":\"KubeletConfiguration\",\"apiVersion\":\"kubelet.config.k8s.io/v1beta1\",\"syncFrequency\":\"0s\",\"fileCheckFrequency\":\"0s\",\"httpCheckFrequency\":\"0s\",\"rotateCertificates\":true,\"serverTLSBootstrap\":true,\"authentication\":{\"x509\":{\"clientCAFile\":\"C:\\\\k\\\\kubelet-ca.crt\"},\"webhook\":{\"cacheTTL\":\"0s\"},\"anonymous\":{\"enabled\":false}},\"authorization\":{\"webhook\":{\"cacheAuthorizedTTL\":\"0s\",\"cacheUnauthorizedTTL\":\"0s\"}},\"clusterDomain\":\"cluster.local\",\"clusterDNS\":[\"10.0.128.10\"],\"streamingConnectionIdleTimeout\":\"0s\",\"nodeStatusUpdateFrequency\":\"0s\",\"nodeStatusReportFrequency\":\"0s\",\"imageMinimumGCAge\":\"0s\",\"imageMaximumGCAge\":\"0s\",\"volumeStatsAggPeriod\":\"0s\",\"cgroupsPerQOS\":false,\"cpuManagerReconcilePeriod\":\"0s\",\"runtimeRequestTimeout\":\"10m0s\",\"maxPods\":250,\"resolvConf\":\"\",\"kubeAPIQPS\":50,\"kubeAPIBurst\":100,\"serializeImagePulls\":false,\"evictionHard\":{\"imagefs.available\":\"15%\",\"nodefs.available\":\"10%\"},\"evictionPressureTransitionPeriod\":\"0s\",\"featureGates\":{\"NodeLogQuery\":true,\"RotateKubeletServerCertificate\":true},\"memorySwap\":{},\"containerLogMaxSize\":\"50Mi\",\"systemReserved\":{\"cpu\":\"500m\",\"ephemeral-storage\":\"1Gi\",\"memory\":\"2Gi\"},\"enforceNodeAllocatable\":[\"none\"],\"logging\":{\"flushFrequency\":0,\"verbosity\":0,\"options\":{\"text\":{\"infoBufferSize\":\"0\"},\"json\":{\"infoBufferSize\":\"0\"}}},\"enableSystemLogHandler\":true,\"enableSystemLogQuery\":true,\"shutdownGracePeriod\":\"0s\",\"shutdownGracePeriodCriticalPods\":\"0s\",\"crashLoopBackOff\":{},\"registerWithTaints\":[{\"key\":\"os\",\"value\":\"Windows\",\"effect\":\"NoSchedule\"}],\"registerNode\":true,\"containerRuntimeEndpoint\":\"npipe://./pipe/containerd-containerd\"}",
expectedSpec: "{\"kind\":\"KubeletConfiguration\",\"apiVersion\":\"kubelet.config.k8s.io/v1beta1\",\"syncFrequency\":\"0s\",\"fileCheckFrequency\":\"0s\",\"httpCheckFrequency\":\"0s\",\"rotateCertificates\":true,\"serverTLSBootstrap\":true,\"authentication\":{\"x509\":{\"clientCAFile\":\"C:\\\\k\\\\kubelet-ca.crt\"},\"webhook\":{\"cacheTTL\":\"0s\"},\"anonymous\":{\"enabled\":false}},\"authorization\":{\"webhook\":{\"cacheAuthorizedTTL\":\"0s\",\"cacheUnauthorizedTTL\":\"0s\"}},\"clusterDomain\":\"cluster.local\",\"clusterDNS\":[\"10.0.128.10\"],\"streamingConnectionIdleTimeout\":\"0s\",\"nodeStatusUpdateFrequency\":\"0s\",\"nodeStatusReportFrequency\":\"0s\",\"imageMinimumGCAge\":\"0s\",\"imageMaximumGCAge\":\"0s\",\"volumeStatsAggPeriod\":\"0s\",\"cgroupsPerQOS\":false,\"cpuManagerReconcilePeriod\":\"0s\",\"runtimeRequestTimeout\":\"10m0s\",\"maxPods\":250,\"resolvConf\":\"\",\"kubeAPIQPS\":50,\"kubeAPIBurst\":100,\"serializeImagePulls\":false,\"evictionHard\":{\"imagefs.available\":\"15%\",\"nodefs.available\":\"10%\"},\"evictionPressureTransitionPeriod\":\"0s\",\"featureGates\":{\"NodeLogQuery\":true,\"RotateKubeletServerCertificate\":true},\"memorySwap\":{},\"containerLogMaxSize\":\"50Mi\",\"systemReserved\":{\"cpu\":\"500m\",\"ephemeral-storage\":\"1Gi\",\"memory\":\"2Gi\"},\"enforceNodeAllocatable\":[\"none\"],\"logging\":{\"flushFrequency\":\"5s\",\"verbosity\":0,\"options\":{\"text\":{\"infoBufferSize\":\"0\"},\"json\":{\"infoBufferSize\":\"0\"}}},\"enableSystemLogHandler\":true,\"enableSystemLogQuery\":true,\"shutdownGracePeriod\":\"0s\",\"shutdownGracePeriodCriticalPods\":\"0s\",\"crashLoopBackOff\":{},\"registerWithTaints\":[{\"key\":\"os\",\"value\":\"Windows\",\"effect\":\"NoSchedule\"}],\"registerNode\":true,\"containerRuntimeEndpoint\":\"npipe://./pipe/containerd-containerd\"}",
expectedErr: false,
},
{
Expand Down
25 changes: 25 additions & 0 deletions pkg/services/init.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
package services

import ctrl "sigs.k8s.io/controller-runtime"

var logFileSize, logFileAge, flushInterval string

func init() {
log := ctrl.Log.WithName("services").WithName("init")

var err error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might feel a little better to save these in a config file somewhere instead of setting them with environment variables? And then address Sebastian's comment about the checking the variables being too complicated by verifying the config all in one go when it loads.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have mixed feelings about introducing a new config file and all the complexity of watching it.

logFileSize, err = getEnvQuantity(logFileSizeEnvVar)
if err != nil {
log.Error(err, "cannot load environment variable", "name", logFileSizeEnvVar)
}

logFileAge, err = getEnvDuration(logFileAgeEnvVar)
if err != nil {
log.Error(err, "cannot load environment variable", "name", logFileAgeEnvVar)
}

flushInterval, err = getEnvDuration(logFlushIntervalEnvVar)
if err != nil {
log.Error(err, "cannot load environment variable", "name", logFlushIntervalEnvVar)
}
}
93 changes: 88 additions & 5 deletions pkg/services/services.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,13 @@ package services

import (
"fmt"
"os"
"path/filepath"
"strings"
"time"

config "github.com/openshift/api/config/v1"
"k8s.io/apimachinery/pkg/api/resource"

"github.com/openshift/windows-machine-config-operator/pkg/cluster"
"github.com/openshift/windows-machine-config-operator/pkg/ignition"
Expand All @@ -22,6 +25,13 @@ const (
// hostnameOverrideVar is the variable that should be replaced with the value of the desired instance hostname
hostnameOverrideVar = "HOSTNAME_OVERRIDE"
NodeIPVar = "NODE_IP"

// logFileSizeEnvVar is the environment variable name for log file size limit
logFileSizeEnvVar = "SERVICES_LOG_FILE_SIZE"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrvaldes is this a pattern on linux side to give options for these values to be configurable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// logFileAgeEnvVar is the environment variable name for log file age retention
logFileAgeEnvVar = "SERVICES_LOG_FILE_AGE"
// logFlushIntervalEnvVar is the environment variable name for log flush interval
logFlushIntervalEnvVar = "SERVICES_LOG_FLUSH_INTERVAL"
)

// GenerateManifest returns the expected state of the Windows service configmap. If debug is true, debug logging
Expand Down Expand Up @@ -143,9 +153,8 @@ func hybridOverlayConfiguration(apiServerEndpoint, vxlanPort string, debug bool)

// kubeProxyConfiguration returns the Service definition for kube-proxy
func kubeProxyConfiguration(debug bool) servicescm.Service {
cmd := fmt.Sprintf("%s -log-file=%s %s --config %s --windows-service", windows.KubeLogRunnerPath, windows.KubeProxyLog,
windows.KubeProxyPath, windows.KubeProxyConfigPath)

cmd := getLogRunnerForCmd(windows.KubeProxyPath, windows.KubeProxyLog)
cmd = fmt.Sprintf("%s --config %s --windows-service", cmd, windows.KubeProxyConfigPath)
verbosity := "0"
if debug {
verbosity = "4"
Expand Down Expand Up @@ -222,8 +231,7 @@ func getKubeletServiceConfiguration(argsFromIginition map[string]string, debug b
preScripts = append(preScripts, hostnameOverridePowershellVar)
}

kubeletServiceCmd := fmt.Sprintf("%s -log-file=%s %s",
windows.KubeLogRunnerPath, windows.KubeletLog, windows.KubeletPath)
kubeletServiceCmd := getLogRunnerForCmd(windows.KubeletPath, windows.KubeletLog)

for _, arg := range kubeletArgs {
kubeletServiceCmd += fmt.Sprintf(" %s", arg)
Expand Down Expand Up @@ -307,3 +315,78 @@ func getHostnameCmd(platformType config.PlatformType) string {
return ""
}
}

// getLogRunnerForCmd returns the command string to run the given commandPath with kube-log-runner
// logging to the given logfilePath. Log rotation parameters can be configured via environment variables.
func getLogRunnerForCmd(commandPath, logfilePath string) string {
cmdBuilder := strings.Builder{}
// log runner path must be first
cmdBuilder.WriteString(windows.KubeLogRunnerPath)

// add log file option
cmdBuilder.WriteString(" -log-file=")
cmdBuilder.WriteString(logfilePath)

if logFileSize != "" {
// log file size limit before creating a backup
cmdBuilder.WriteString(" -log-file-size=")
cmdBuilder.WriteString(logFileSize)
}

if logFileAge != "" {
// log retention for backup files created after the size limit is reached
cmdBuilder.WriteString(" -log-file-age=")
cmdBuilder.WriteString(logFileAge)
}

if flushInterval != "" {
// flush to ensure recent log entries are written to disk in near real-time
cmdBuilder.WriteString(" -flush-interval=")
cmdBuilder.WriteString(flushInterval)
}

// last, add the target command to be run
cmdBuilder.WriteString(" " + commandPath)

return cmdBuilder.String()
}

// getEnvQuantity returns the value of the environment variable with the given key
// if it represents a valid and non-negative quantity, otherwise returns error
func getEnvQuantity(key string) (string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getEnvQuantity returns a string, not a quantity?
Also why does getEnvQuantity care if it is "non-negative". The caller should check that shouldn't they?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getEnvQuantity returns a string, not a quantity?

correct, returns a string that represents a valid and non-negative quantity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The caller should check that shouldn't they?

in this case, no. Given that the function encapsulates the validation. I don't anticipate other usages ATM. the func can be adjusted in the future as needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrvaldes can we rename this to getEnvString or getEnvValue, I feel like that will be more inline with what you are returning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add to @sebsoto's point here:
"non-negative" validation is business logic that doesn't belong in a generic environment variable getter.

  • getEnv should focus on: retrieving and parsing the value
  • Caller should handle: validating the value meets their requirements

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, have you considered returning resource.ParseQuantity here?
If we do that, we can reduce the testing scenarios to:

  • Empty string returns zero value
  • Whitespace is trimmed
  • Invalid formats return error (will be delegated to ParseQuantity)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add to @sebsoto's point here: "non-negative" validation is business logic that doesn't belong in a generic environment variable getter.

  • getEnv should focus on: retrieving and parsing the value
  • Caller should handle: validating the value meets their requirements

correct, and that is the point. I'm not proposing a generic getter; what's the point on having a func as dummy wrapper for value := os.Getenv(key) . the intention here is to introduce a func that validates the env var at startup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, have you considered returning resource.ParseQuantity here? If we do that, we can reduce the testing scenarios to:

  • Empty string returns zero value
  • Whitespace is trimmed
  • Invalid formats return error (will be delegated to ParseQuantity)

correct, unit tests are only covering the specific edge cases, empty, negative, etc. there is not intention to implement test coverge for the ParseQuantity or ParseDuration funcs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the intention is not to introduce a generic getter that can be reused in many ways. The naming should reflect what the function is doing to avoid confusion.

value := os.Getenv(key)
value = strings.TrimSpace(value)
if value == "" {
// not present
return "", nil
}
// validate value as quantity
q, err := resource.ParseQuantity(value)
if err != nil {
return "", fmt.Errorf("invalid quantity value for %s: %w", key, err)
}
if q.Sign() < 0 {
return "", fmt.Errorf("quantity cannot be negative for %s", key)
}
return value, nil
}

// getEnvDuration returns the value of the environment variable with the given key
// if it represents a valid and non-negative duration, otherwise returns error
func getEnvDuration(key string) (string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as this 7722cda#r2850043734

But time.Duration rather than Quantity

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrvaldes there is a return type "time.Duration" and if this function is not returning that, the naming is confusing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be a better pattern to return time.Duration instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrvaldes there is a return type "time.Duration" and if this function is not returning that, the naming is confusing

I see your point, will change it to getEnvDurationString

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be a better pattern to return time.Duration instead?

no, the logic need a string.

the only reason I am proposing the validation (duration, quantity, non-negative, etc) is to prevent WMCO from starting with an invalid configuration for the log wrapper, and avoid the kube-log-runner to fail in the background

value := os.Getenv(key)
value = strings.TrimSpace(value)
if value == "" {
// not present
return "", nil
}
if strings.HasPrefix(value, "-") {
return "", fmt.Errorf("duration cannot be negative for %s", key)
}

// validate value as duration
if _, err := time.ParseDuration(value); err != nil {
return "", fmt.Errorf("invalid duration value for %s: %w", key, err)
}
return value, nil
}
Loading