I'm currently running Rocky Linux 8 and trying to set up a single-node Kafka cluster, which also requires ZooKeeper. ZooKeeper seems to be running fine, but I'm getting a "No route to host" error when Kafka tries to connect to it. I've noticed some problematic logs from CoreDNS during this process. Here's the error I encountered:
[ERROR] plugin/errors: 2 kafka-svc.reddog.microsoft.com. AAAA: read udp 10.244.77.165:56358->172.19.0.126:53: read: no route to host
[ERROR] plugin/errors: 2 kafka-svc.reddog.microsoft.com. A: read udp 10.244.77.165:57820->172.19.0.126:53: i/o timeout
Can anyone help me troubleshoot this?
4 Answers
It looks like there’s a network issue between your Kafka pod and ZooKeeper. CoreDNS is throwing errors because your pods can't reach the DNS server at 172.19.0.126, which likely indicates a problem with your CNI. You should check if your CNI plugin is properly installed and running. Restarting the pods or the node might help, but usually, this points to a deeper network configuration problem in your k8s cluster.
It seems there’s a routing issue with accessing 172.19.0.126. The error strongly suggests that the host isn't reachable. You could try attaching a debug container to the Kafka bootstrap and testing connectivity to that host. If something got rotated, maybe restarting the CoreDNS deployment will help. Also, check your Kubelet's clusterDomain setting and see if everything's configured correctly.
Are you running this on AKS? Just curious if that might be the case since setups can vary significantly depending on what you're using.
This type of error generally occurs when the pod network isn't set up correctly, which causes DNS lookups to fail since the pods can’t communicate with the DNS server. Since you’re using kubeadm, make sure your CNI plugin is in good shape. You can run `kubectl get pods -n kube-system` to check for any pods in CrashLoopBackOff or other issues. If that's okay, try an `nslookup` inside your Kafka pod to see whether you can resolve service names, which would help determine if the network is broken or if the issue is isolated to CoreDNS. Also, check if a firewall or misconfigured route is blocking access to 172.19.0.126.

Yes, I'm using Kubeadm on an AKS instance. Thought that might have something to do with it.