Hey everyone, I'm working on a custom inbound mail server that will run in ECS Fargate behind a Network Load Balancer. The process of handling incoming emails involves a lot of DNS lookups. Just to give you an idea: a typical email requires multiple queries like PTR checks, several SPF queries, DKIM lookups, DMARC checks, and RBL/DNSBL queries. These can easily total between 10 to 20 DNS queries per email, which means I could quickly hit the 1024-packet limit imposed by AWS DNS Resolver in high-volume scenarios. My current approach is to use Unbound as an L1 cache and pair it with ElastiCache for L2 caching. The setup is designed so that if a record isn't found in ElastiCache, Unbound queries the AWS DNS Resolver and updates both caches. I'm really curious whether I'm on the right track here, or if there's a more effective solution to handle this scale.
3 Answers
Why not consider using AWS SES for handling inbound emails?
Plus, getting permission for SES can be a real hassle!
If you're just pointing Unbound to the VPC resolver (`.2`), you'll still hit that 1024-queries-per-second cap. A better approach would be to run Unbound (or a similar full resolver) in recursive mode on an EC2 instance. This way, it doesn't forward queries to `.2`; it directly queries root and authoritative servers, which avoids hitting the VPC limit and offers better scalability. You still can configure it to hit `.2` for private Route 53 zones but use normal recursion for everything else. By relying on a properly set up recursive resolver fleet, you can scale without needing to maintain a complex caching system with Redis or ElastiCache, which could introduce more issues than it solves.
This is really helpful advice, thanks!
Have you thought about applying for a quota increase? Many of these limits can be raised with a solid justification.
Unfortunately, 1024 is a hard limit.
My product can't really manage the costs of SES at scale, though I do use it for outbound transactional emails like verification and password resets.