I've noticed a significant increase in our NAT Gateway costs over the past few days, and I'm struggling to pinpoint the source. We have resources operating in private subnets that route their traffic through the NAT Gateway, but since I don't have VPC Flow Logs enabled, I can't track where that traffic is heading. Here's what I've established: 1) The NAT Gateway traffic has drastically increased, 2) This change started a few days ago, 3) We are using EC2 spot instances in the private subnets, 4) There haven't been any recent deployments or changes to our environment. I have a few questions to help me resolve this issue quickly: How can I identify which instance might be causing this spike without VPC Flow Logs? Are there any specific CloudWatch metrics or tools I should be monitoring? Is there a quick way to diagnose this? I've started enabling VPC Flow Logs, but I need to figure this out today!
4 Answers
Consider switching to fcknat to help reduce NAT Gateway costs in the long run. It's a more cost-effective solution if you're facing constant traffic through your NAT.
Have you started sending a lot of traffic to S3 without a gateway S3 endpoint? That could cause a cost spike. It's strange that you've been operating normally for a while and this spike just started recently; double check your recent traffic patterns.
You should get the VPC Flow Logs enabled right away; they take about 15 minutes to start working. After that, you can see where the traffic is going. It sounds like you might have a service polling something on the internet. NAT Gateways can incur costs per hour and per gigabyte of data, so have a look at automated processes that might be causing this. There's a chance something internal might be failing and is falling back to access the internet instead—just a thought.
Start by monitoring the network bytes on all your instances. This will give you a clearer picture of which instance might be using the most data. Note, though, that since you're using spot instances, which are often tied to services like EMR, that could contribute to unusual traffic patterns. Keep an eye on them!

Related Questions
Can't Load PhpMyadmin On After Server Update
Redirect www to non-www in Apache Conf
How To Check If Your SSL Cert Is SHA 1
Windows TrackPad Gestures