Networking Issues on EKS 1.33 with High MQTT Traffic

0
18
Asked By TechieTravels42 On

I'm currently running a high workload on AWS EKS involving MQTT traffic from devices, utilizing a VerneMQ broker. Everything was functioning smoothly until I updated the cluster to version 1.33. The traffic flow goes like this: MQTT traffic goes to an Application Load Balancer (ALB) on the VerneMQ port, which then routes to the VerneMQ Kubernetes service and the VerneMQ pods.

There's another pod that subscribes to a specific topic to read data from the VerneMQ pods. However, since the upgrade, that pod has been unable to connect to the VerneMQ pods, resulting in crashes or timeouts during liveness probes, particularly under heavy MQTT traffic (hundreds of thousands of requests). Under lower traffic conditions, everything works fine.

I found a workaround by modifying the container image to connect via the external ALB instead of the VerneMQ Kubernetes service, which resolved the issue, but I'd prefer not to go that route. I haven't changed any infrastructure or container code since starting with EKS 1.27.

I'm also unsure whether the issue stems from the base AMI or potential kernel config changes because the setup worked fine with EKS 1.32 but not with 1.33. I'm using Amazon's VPC CNI plugin for networking. Are there any tools available for inspecting the traffic, kernel calls, or better monitoring this situation?

3 Answers

Answered By NetNinja42 On

You should consider leveraging tools like ethtool to check for packet drops, and examine the VPC CNI logs on your node for any potential warnings. Also, don't forget to look into the maximum network throughput for your instance types; using pod IPs with NodePorts could help as well.

Answered By CloudyCoder88 On

A good fix is to revert the version to confirm whether 1.33 is the issue for you. Is there any crucial feature in 1.33 that you can't do without? Just out of curiosity!

TechieTravels42 -

I actually reverted back to 1.32 and it's working fine now! I looked through the changelog and nothing significant stood out, except for the switch to Amazon Linux 2023, which I'm already using with 1.32.

Answered By DataDiver99 On

For monitoring traffic, I suggest checking out Kubeshark, although it comes with limitations on the free tier for larger clusters. It might be worth looking into other monitoring options as well.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.