I'm diving deep into a frustrating situation with my application running in a Kubernetes pod. It uses UDP and suffers from significant lag, despite having sufficient bandwidth available. The network setup includes physical hosts and links set to a standard MTU of 1500, while Calico defaults to 1450. I attempted to raise the host MTU to 1550, hoping to adjust Calico to 1500 as well, but this change caused Kubernetes host communication to fail. It's puzzling—why would changing the MTU on the physical host disrupt Kubernetes when they should negotiate the largest packet size through ICMP? Any insights on this would be really appreciated!
4 Answers
If you're uncertain, you could run an "MTU ping test". It helps identify the optimal MTU size for your network setup, and it might clarify where the issue lies.
Keep in mind that PMTUD works at Layer-3, while Layer-2 MTU settings are hidden from the hosts. If your switches or virtual switches can’t handle the MTU you’re trying to set, you won’t get anywhere. It’s usually recommended to set the Layer-2 MTU to the maximum supported value so you can focus on Layer-3 concerns. Make sure all your network devices can manage the larger MTU.
MTU discovery can be tricky because it relies on all network layers correctly passing and respecting ICMP messages. In Kubernetes, this can be problematic due to various additional layers like the pod interface and CNI overlays. When you increased the host MTU, the larger packets were likely sent internally by Calico, but something along the path couldn't handle those sizes or dropped fragmentation messages, leading to this random lag you’re experiencing. The 1450 setting is there to safely account for overhead, so if you want to increase it, all network hops need to support that change. Otherwise, you'll hit these silent errors that manifest as lag.
Good point! Just a heads up that MTU Discovery is primarily designed for TCP, so it's almost guaranteed not to work with UDP or IPSEC. Make sure your Ethernet setup stays at the default of 1500 to avoid issues.
Yeah, and troubleshooting UDP in Kubernetes doesn't seem to come with a lot of guidance, unfortunately.
You can’t just increase the host MTU without considering all devices in the network chain. Alongside increasing the host MTU, you should adjust the Layer-2 MTU on your devices too—enabling jumbo frames on switches, for example. Also, remember that PMTUD only works with TCP, not UDP. Your application might need to directly set a maximum packet size for UDP. Inspecting packet behavior on the network, like any dropped or fragmented packets, can provide further insight into the underlying problems.

That’s how I configured my switches too; I’ve got my Nexus 9k ports set to 9216 MTU for the connected devices.