Skip to content

[windows] Intermittent DNS resolution failure where DNS requests go to port 1024 #354

@mauriciopoppe

Description

@mauriciopoppe

Background

GKE uses Antrea and OVS in Windows nodes, a customer reported that the DNS resolution failed intermittently. From a GKE cluster with Windows nodes they created a Pod which did the following while true; do dnslooup www.google.com; done; and reported that the lookup failed sometimes, we noted that the DNS request went to port 1024 instead of 53.

Reproducible Scenario

GKE teams created the following reproducible scenario, while we reproduced it in a GKE cluster it can be reproduced in a vanilla kubernetes distro.

  1. Create a kubernetes cluster with Windows nodes, install Antrea and OVS (non-containerized daemon) through https://github.com/antrea-io/antrea/blob/main/docs/windows.md
  2. Deploy the following windows Pod which sets up a vanilla http server
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: win-webserver
  name: win-webserver
spec:
  replicas: 1
  selector:
    matchLabels:
      app: win-webserver
  template:
    metadata:
      labels:
        app: win-webserver
      name: win-webserver
    spec:
     containers:
      - name: windowswebserver
        image: mcr.microsoft.com/windows/servercore:ltsc2019
        command:
        - powershell.exe
        - -command
        - "<#code used from https://gist.github.com/19WAS85/5424431#> ; $$listener = New-Object System.Net.HttpListener ; $$listener.Prefixes.Add('http://*:80/') ; $$listener.Start() ; $$callerCounts = @{} ; Write-Host('Listening at http://*:80/') ; while ($$listener.IsListening) { ;$$context = $$listener.GetContext() ;$$requestUrl = $$context.Request.Url ;$$clientIP = $$context.Request.RemoteEndPoint.Address ;$$response = $$context.Response ;Write-Host '' ;Write-Host('> {0}' -f $$requestUrl) ;  ;$$count = 1 ;$$k=$$callerCounts.Get_Item($$clientIP) ;if ($$k -ne $$null) { $$count += $$k } ;$$callerCounts.Set_Item($$clientIP, $$count) ;$$ip=(Get-NetAdapter | Get-NetIpAddress); $$header='<html><body><H1>Windows Container Web Server</H1>' ;$$callerCountsString='' ;$$callerCounts.Keys | % { $$callerCountsString+='<p>IP {0} callerCount {1} ' -f $$ip[1].IPAddress,$$callerCounts.Item($$_) } ;$$footer='</body></html>' ;$$content='{0}{1}{2}' -f $$header,$$callerCountsString,$$footer ;Write-Output $$content ;$$buffer = [System.Text.Encoding]::UTF8.GetBytes($$content) ;$$response.ContentLength64 = $$buffer.Length ;$$response.OutputStream.Write($$buffer, 0, $$buffer.Length) ;$$response.Close() ;$$responseStatus = $$response.StatusCode ;Write-Host('< {0}' -f $$responseStatus)  } ; "
     nodeSelector:
      kubernetes.io/os: windows
  1. The pod takes some time to become ready, exec into the pod kubectl exec -it POD_NAME -- cmd
  2. Create the following batch file:
echo @echo off > infinite_nslookup.bat
echo :loopStart >> infinite_nslookup.bat
echo   nslookup www.google.com >> infinite_nslookup.bat
echo goto loopStart >> infinite_nslookup.bat
  1. Execute the previously created file infinite_nslookup.bat. It will take around a minute for DNS requests to start failing. I can see the failed DNS requests going to port 1024.

Debugging

Debugging from a GKE networking expert, we captured a trace attached as ovt-trace.txt:

  1. Before the issue is observed, there are 1274 open UDP connections. Once the issue is observed, there are 7849 connections. That's a very noticable increase in connections and likely confirms my theory: we are overflowing some table due to too many UDP connections. The number of TCP connections actually slightly decreases (233->188).

  2. The flow dump reveals the smoking gun. Here we have a packet that is clearly showing as being sent to port 53, yet OVS remaps it to port 1024:

recirc_id(0x1502),ct_state(new|trk|dnat),ct_zone(),ct_mark(),ct_label(),ct_tuple4(src=100.78.22.75,dst=100.112.128.74,proto=17,tp_src=55691,tp_dst=53),dp_hash(0x4211aca5),eth(src=00:15:5d:d6:bc:78,dst=00:15:5d:00:3a:01),in_port(5),eth_type(0x0800),ipv4(src=100.78.22.75,dst=100.78.16.72,proto=17,tos=0,ttl=128,frag=no),udp(src=55691,dst=1024), packets:0, bytes:0, used:never, actions:set(eth(src=00:15:5d:d6:bc:78,dst=00:15:5d:00:3a:02)),4

Notice how we initially have tp_dst=53, but then UDP(src=55691,dst=1024). It is trivial to find more instances by doing egrep =53.*=1024 ovs-trace.txt:

recirc_id(0x1502),ct_state(new|trk|dnat),ct_zone(),ct_mark(),ct_label(),ct_tuple4(src=100.78.22.75,dst=100.112.128.74,proto=17,tp_src=55691,tp_dst=53),dp_hash(0x4211aca5),eth(src=00:15:5d:d6:bc:78,dst=00:15:5d:00:3a:01),in_port(5),eth_type(0x0800),ipv4(src=100.78.22.75,dst=100.78.16.72,proto=17,tos=0,ttl=128,frag=no),udp(src=55691,dst=1024), packets:0, bytes:0, used:never, actions:set(eth(src=00:15:5d:d6:bc:78,dst=00:15:5d:00:3a:02)),4
recirc_id(0x1505),ct_state(new|trk|dnat),ct_zone(),ct_mark(),ct_label(),ct_tuple4(src=100.78.22.75,dst=100.112.128.74,proto=17,tp_src=55714,tp_dst=53),dp_hash(0xdf6ae13d),eth(src=00:15:5d:d6:bc:78,dst=00:15:5d:00:3a:01),in_port(5),eth_type(0x0800),ipv4(src=100.78.22.75,dst=100.78.18.76,proto=17,tos=0,ttl=128,frag=no),udp(src=55714,dst=1024), packets:0, bytes:0, used:never, actions:set(eth(src=00:15:5d:d6:bc:78,dst=00:15:5d:00:3a:02)),4
recirc_id(0x1508),ct_state(new|trk|dnat),ct_zone(),ct_mark(),ct_label(),ct_tuple4(src=100.78.22.75,dst=100.112.128.74,proto=17,tp_src=55693,tp_dst=53),dp_hash(0x125489c6),eth(src=00:15:5d:d6:bc:78,dst=00:15:5d:00:3a:01),in_port(5),eth_type(0x0800),ipv4(src=100.78.22.75,dst=100.78.16.73,proto=17,tos=0,ttl=128,frag=no),udp(src=55693,dst=1024), packets:0, bytes:0, used:never, actions:set(eth(src=00:15:5d:d6:bc:78,dst=00:15:5d:00:3a:02)),4
recirc_id(0x1507),ct_state(new|trk|dnat),ct_zone(),ct_mark(),ct_label(),ct_tuple4(src=100.78.22.75,dst=100.112.128.74,proto=17,tp_src=55706,tp_dst=53),dp_hash(0x208a0891),eth(src=00:15:5d:d6:bc:78,dst=00:15:5d:00:3a:01),in_port(5),eth_type(0x0800),ipv4(src=100.78.22.75,dst=100.78.17.202,proto=17,tos=0,ttl=128,frag=no),udp(src=55706,dst=1024), packets:0, bytes:0, used:never, actions:set(eth(src=00:15:5d:d6:bc:78,dst=00:15:5d:00:3a:02)),4
...

Source code debugging:

Ok, so I think I found the code path that sets the port to 1024. Here, MIN_NAT_EPHEMERAL_PORT is set to 1024:

https://github.com/openvswitch/ovs/blob/d92003120dd9213de9452dff3fca0b46af79c68a/datapath-windows/ovsext/Conntrack-nat.c#L288

And then here, the port is overriden with minPort, which is set to MIN_NAT_EPHEMERAL_PORT:

https://github.com/openvswitch/ovs/blob/d92003120dd9213de9452dff3fca0b46af79c68a/datapath-windows/ovsext/Conntrack-nat.c#L398

This code seems to be cycling through all ports from 1024 to 65535, and once it runs out of ports, the code to override the port with 1024 is triggered:

https://github.com/openvswitch/ovs/blob/d92003120dd9213de9452dff3fca0b46af79c68a/datapath-windows/ovsext/Conntrack-nat.c#L365

Also, it seems weird that the destination port is being overriden. In a NAT scenario, I would expect the source port to be overriden, but somehow we end up with destination port override.

Platform details

OS: Windows Server LTSC2019 or LTSC2022
OVS Version: 2.14.4.
Attached file: ovs-trace.txt

ovs-trace.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions