Optimizing DNS Performance on Talos K8s Cluster

0
6
Asked By CloudyNinja88 On

I'm seeking advice on enhancing DNS performance and scaling for my setup. Here's the architecture I'm working with:

- Kubernetes on Talos OS, consisting of a master and worker nodes (currently 2, with a potential to scale up to 10 nodes)
- Each node has 8 vCPUs and 8 GB of RAM, connected via a 10G network
- I'm using BIND9 for DNS and deploying with horizontal pod autoscaling (HPA)
- Load balancing is handled by MetalLB in L2 mode
- The main use case is an internal or ISP-style DNS resolver, focused solely on DNS workloads, with each DNS pod allocated 4 vCPUs and 4 GB of RAM.

For testing, I used dnsperf from my Linux laptop with the following command: `dnsperf -s -p 53 -d queries.txt -Q 50000 -c 2000 -l 60`, resulting in approximately 2k to 2.5k queries per second (QPS). However, I've noticed latency increases under higher concurrency and occasional timeouts.

I'm curious about a couple of things:
1. Is deploying BIND in this way the best approach for DNS workloads, and what's a good baseline for CPU and memory per DNS pod?
2. Would switching to alternatives like Unbound or Knot DNS provide a significant QPS boost? Any real-world experiences or tuning tips would be greatly appreciated!

2 Answers

Answered By PerformanceGuru On

Before making any changes, pinpoint where the bottleneck lies—whether it's CPU, networking, or upstream issues. Generally, using a dedicated CNI can increase latency, so don’t forget that. If you can, aiming for alternatives like Knot, Unbound, or PowerDNS Resolver might be a worthwhile test; they're known for their performance. Just be clear on what kind of QPS and latency you ultimately need to achieve with your configuration!

TechScout -

Exactly! Make sure you're clear on your performance goals before switching anything up.

NodeNerd45 -

Yeah, understanding the expected performance will guide your tuning efforts much better.

Answered By TechieTinkerbell On

Using MetalLB might not be necessary if you're already using host networking. Focus more on your network and CPU to identify any bottlenecks, and consider running your DNS service as a daemonset with local traffic policies. You might also want to try other load balancing methods, as MetalLB can complicate things when scaling. As for resource allocation, you might want to start with 0.5 CPU cores and 128 MB of RAM per pod and adjust based on performance monitoring.

K8sWhisperer -

Good point! It might also help to monitor your network traffic to identify any specific latencies.

LogicalGamer42 -

Definitely a good idea to examine the upstream resolver too! Overloading it can cause timeouts.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.