Skip to content

fix: sanitize /etc/sysctl.conf to prevent overriding 999-sysctl-aks.conf#128

Open
blanquicet wants to merge 1 commit intoAzure:mainfrom
blanquicet:jose/fix-sysctl-conf-override
Open

fix: sanitize /etc/sysctl.conf to prevent overriding 999-sysctl-aks.conf#128
blanquicet wants to merge 1 commit intoAzure:mainfrom
blanquicet:jose/fix-sysctl-conf-override

Conversation

@blanquicet
Copy link
Member

Problem

sysctl --system loads configuration files in this order:

  1. /etc/sysctl.d/*.conf (alphabetical)
  2. /etc/sysctl.conf (loaded last, overrides everything)

On some Linux images, /etc/sysctl.conf contains uncommented settings (e.g., net.ipv4.conf.*.rp_filter=1) that override the values written by AKSFlexNode to /etc/sysctl.d/999-sysctl-aks.conf. This silently reverts intended sysctl tuning — for example, rp_filter ends up as 1 (strict) instead of the intended 2 (loose), breaking network connectivity.

Fix

  • Add sanitizeSysctlConf() which reads /etc/sysctl.conf, identifies lines whose sysctl keys conflict with our managed sysctlSettings, and comments them out (with a # commented out by AKSFlexNode: prefix). A .bak backup is created before modifying the file.
  • Extract the generic commentOutMatchingLines(path, shouldComment) helper, now shared by both sanitizeSysctlConf() and the existing commentOutSwapInFstab(), reducing code duplication.
  • sanitizeSysctlConf() is called at the end of configureSysctls(), after writing 999-sysctl-aks.conf and before sysctl --system.

Testing

  • Added new tests for sanitizeSysctlConf
  • Existing commentOutSwapInFstab tests continue to pass (refactored to use shared assertFileAndBackup helper)

sysctl --system loads files in this order:
  1. /etc/sysctl.d/*.conf (alphabetical)
  2. /etc/sysctl.conf (loaded LAST, overrides everything)

On some Linux images, /etc/sysctl.conf has uncommented settings like
net.ipv4.conf.*.rp_filter=1 that override the values set in
999-sysctl-aks.conf. This causes rp_filter=1 (strict) to be applied
instead of the intended rp_filter=2 (loose), breaking expected network
behavior.

Add sanitizeSysctlConf() that comments out any lines in /etc/sysctl.conf
whose keys conflict with our managed sysctlSettings. Extract the generic
commentOutMatchingLines() helper, used by both sanitizeSysctlConf() and
commentOutSwapInFstab().

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 12, 2026 22:21
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses sysctl configuration drift on some Linux images by preventing /etc/sysctl.conf (loaded last by sysctl --system) from overriding AKSFlexNode-managed settings in /etc/sysctl.d/999-sysctl-aks.conf.

Changes:

  • Add sanitizeSysctlConf() to comment out conflicting keys in /etc/sysctl.conf based on the managed sysctlSettings.
  • Extract a reusable commentOutMatchingLines() helper and refactor swap-in-fstab logic to use it.
  • Add/extend unit tests for sysctl.conf sanitization and the new helper, plus shared test assertions for backup behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
components/linux/v20260301/configure.go Writes managed sysctl.d config, sanitizes /etc/sysctl.conf, and refactors “comment out” logic via a shared helper.
components/linux/v20260301/configure_test.go Adds tests for sysctl.conf sanitization and the shared helper; introduces assertFileAndBackup test helper.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +227 to 229
if shouldComment(trimmed) {
lines[i] = "# " + line
modified = true
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description says conflicting lines should be commented out with a "# commented out by AKSFlexNode:" prefix, but commentOutMatchingLines currently prefixes with just "# ". Either update the implementation to include the stated prefix (and adjust tests accordingly) or update the PR description to match the actual behavior, since this affects auditability/traceability of changes to system config files.

Copilot uses AI. Check for mistakes.
Comment on lines 136 to 139
if bytes.Equal(currentConfig, expectedConfig) {
// config is already applied, no need to do anything
return nil
}
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ensureSysctlConfig returns early when /etc/sysctl.d/999-sysctl-aks.conf already matches the expected content, which skips sanitizeSysctlConf(/etc/sysctl.conf) and sysctl --system. This means the new sanitization will never run on subsequent (idempotent) executions, and nodes can still end up with conflicting /etc/sysctl.conf overrides. Consider running sanitizeSysctlConf (and applying sysctl settings, if required) even when the managed sysctl.d file is unchanged, or moving the early-return check to after sanitization/apply.

Copilot uses AI. Check for mistakes.
@bcho
Copy link
Member

bcho commented Mar 13, 2026

On some Linux images

Which linux images show this? I want to understand more on this before taking this change...

@blanquicet
Copy link
Member Author

blanquicet commented Mar 13, 2026

I reproduced it with v0.0.17. Following the details of the non-Azure VM where I reproduced it.

rp_filter configuration:

/ # uname -a
Linux my-node 6.11.0-1016-nvidia #16-Ubuntu SMP PREEMPT_DYNAMIC Sun Sep 21 17:06:33 UTC 2025 x86_64 GNU/Linux
/ #
/ # grep rp_filter /host/etc/sysctl.conf
#net.ipv4.conf.default.rp_filter=1
#net.ipv4.conf.all.rp_filter=1
net.ipv4.conf.all.rp_filter=1
net.ipv4.conf.default.rp_filter=1
/ #
/ # grep rp_filter /host/etc/sysctl.d/999-sysctl-aks.conf
net.ipv4.conf.all.rp_filter=2
net.ipv4.conf.default.rp_filter=2
/ #
/ # cat /host/proc/sys/net/ipv4/conf/*/rp_filter
1
1
1
1
0  # This is the lo interface
1
1
1

Connectivity check:

# Pods on Azure and non-Azure nodes to test connectivity
$ k get pod -o wide
NAME                READY   STATUS    RESTARTS   AGE     IP            NODE                                 NOMINATED NODE   READINESS GATES
azure-pinger        1/1     Running   0          10m     10.100.0.9    aks-system-40737636-vmss000000       <none>           <none>
non-azure-pinger    1/1     Running   0          10m     10.244.0.2    my-node                              <none>           <none>

# Connectivity non-Azure -> Azure is OK
$ kubectl exec non-azure-pinger -- ping -c 3 -W 5 10.100.0.9
PING 10.100.0.9 (10.100.0.9): 56 data bytes
64 bytes from 10.100.0.9: seq=0 ttl=61 time=108.433 ms
64 bytes from 10.100.0.9: seq=1 ttl=61 time=108.570 ms
64 bytes from 10.100.0.9: seq=2 ttl=61 time=108.322 ms

--- 10.100.0.9 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 108.322/108.441/108.570 ms

# Connectivity Azure -> non-Azure doesn't work
$ kubectl exec azure-pinger -- ping -c 3 -W 5 10.244.0.2
PING 10.244.0.2 (10.244.0.2): 56 data bytes

--- 10.244.0.2 ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss
command terminated with exit code 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants