fix: sanitize /etc/sysctl.conf to prevent overriding 999-sysctl-aks.conf#128
fix: sanitize /etc/sysctl.conf to prevent overriding 999-sysctl-aks.conf#128blanquicet wants to merge 1 commit intoAzure:mainfrom
Conversation
sysctl --system loads files in this order: 1. /etc/sysctl.d/*.conf (alphabetical) 2. /etc/sysctl.conf (loaded LAST, overrides everything) On some Linux images, /etc/sysctl.conf has uncommented settings like net.ipv4.conf.*.rp_filter=1 that override the values set in 999-sysctl-aks.conf. This causes rp_filter=1 (strict) to be applied instead of the intended rp_filter=2 (loose), breaking expected network behavior. Add sanitizeSysctlConf() that comments out any lines in /etc/sysctl.conf whose keys conflict with our managed sysctlSettings. Extract the generic commentOutMatchingLines() helper, used by both sanitizeSysctlConf() and commentOutSwapInFstab(). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR addresses sysctl configuration drift on some Linux images by preventing /etc/sysctl.conf (loaded last by sysctl --system) from overriding AKSFlexNode-managed settings in /etc/sysctl.d/999-sysctl-aks.conf.
Changes:
- Add
sanitizeSysctlConf()to comment out conflicting keys in/etc/sysctl.confbased on the managedsysctlSettings. - Extract a reusable
commentOutMatchingLines()helper and refactor swap-in-fstab logic to use it. - Add/extend unit tests for sysctl.conf sanitization and the new helper, plus shared test assertions for backup behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| components/linux/v20260301/configure.go | Writes managed sysctl.d config, sanitizes /etc/sysctl.conf, and refactors “comment out” logic via a shared helper. |
| components/linux/v20260301/configure_test.go | Adds tests for sysctl.conf sanitization and the shared helper; introduces assertFileAndBackup test helper. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| if shouldComment(trimmed) { | ||
| lines[i] = "# " + line | ||
| modified = true |
There was a problem hiding this comment.
PR description says conflicting lines should be commented out with a "# commented out by AKSFlexNode:" prefix, but commentOutMatchingLines currently prefixes with just "# ". Either update the implementation to include the stated prefix (and adjust tests accordingly) or update the PR description to match the actual behavior, since this affects auditability/traceability of changes to system config files.
| if bytes.Equal(currentConfig, expectedConfig) { | ||
| // config is already applied, no need to do anything | ||
| return nil | ||
| } |
There was a problem hiding this comment.
ensureSysctlConfig returns early when /etc/sysctl.d/999-sysctl-aks.conf already matches the expected content, which skips sanitizeSysctlConf(/etc/sysctl.conf) and sysctl --system. This means the new sanitization will never run on subsequent (idempotent) executions, and nodes can still end up with conflicting /etc/sysctl.conf overrides. Consider running sanitizeSysctlConf (and applying sysctl settings, if required) even when the managed sysctl.d file is unchanged, or moving the early-return check to after sanitization/apply.
Which linux images show this? I want to understand more on this before taking this change... |
|
I reproduced it with v0.0.17. Following the details of the non-Azure VM where I reproduced it.
/ # uname -a
Linux my-node 6.11.0-1016-nvidia #16-Ubuntu SMP PREEMPT_DYNAMIC Sun Sep 21 17:06:33 UTC 2025 x86_64 GNU/Linux
/ #
/ # grep rp_filter /host/etc/sysctl.conf
#net.ipv4.conf.default.rp_filter=1
#net.ipv4.conf.all.rp_filter=1
net.ipv4.conf.all.rp_filter=1
net.ipv4.conf.default.rp_filter=1
/ #
/ # grep rp_filter /host/etc/sysctl.d/999-sysctl-aks.conf
net.ipv4.conf.all.rp_filter=2
net.ipv4.conf.default.rp_filter=2
/ #
/ # cat /host/proc/sys/net/ipv4/conf/*/rp_filter
1
1
1
1
0 # This is the lo interface
1
1
1Connectivity check: # Pods on Azure and non-Azure nodes to test connectivity
$ k get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
azure-pinger 1/1 Running 0 10m 10.100.0.9 aks-system-40737636-vmss000000 <none> <none>
non-azure-pinger 1/1 Running 0 10m 10.244.0.2 my-node <none> <none>
# Connectivity non-Azure -> Azure is OK
$ kubectl exec non-azure-pinger -- ping -c 3 -W 5 10.100.0.9
PING 10.100.0.9 (10.100.0.9): 56 data bytes
64 bytes from 10.100.0.9: seq=0 ttl=61 time=108.433 ms
64 bytes from 10.100.0.9: seq=1 ttl=61 time=108.570 ms
64 bytes from 10.100.0.9: seq=2 ttl=61 time=108.322 ms
--- 10.100.0.9 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 108.322/108.441/108.570 ms
# Connectivity Azure -> non-Azure doesn't work
$ kubectl exec azure-pinger -- ping -c 3 -W 5 10.244.0.2
PING 10.244.0.2 (10.244.0.2): 56 data bytes
--- 10.244.0.2 ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss
command terminated with exit code 1 |
Problem
sysctl --systemloads configuration files in this order:/etc/sysctl.d/*.conf(alphabetical)/etc/sysctl.conf(loaded last, overrides everything)On some Linux images,
/etc/sysctl.confcontains uncommented settings (e.g.,net.ipv4.conf.*.rp_filter=1) that override the values written by AKSFlexNode to/etc/sysctl.d/999-sysctl-aks.conf. This silently reverts intended sysctl tuning — for example,rp_filterends up as1(strict) instead of the intended2(loose), breaking network connectivity.Fix
sanitizeSysctlConf()which reads/etc/sysctl.conf, identifies lines whose sysctl keys conflict with our managedsysctlSettings, and comments them out (with a# commented out by AKSFlexNode:prefix). A.bakbackup is created before modifying the file.commentOutMatchingLines(path, shouldComment)helper, now shared by bothsanitizeSysctlConf()and the existingcommentOutSwapInFstab(), reducing code duplication.sanitizeSysctlConf()is called at the end ofconfigureSysctls(), after writing999-sysctl-aks.confand beforesysctl --system.Testing
sanitizeSysctlConfcommentOutSwapInFstabtests continue to pass (refactored to use sharedassertFileAndBackuphelper)