I'm trying to tackle some performance and cost challenges with our AWS setup in a big enterprise with over 350 accounts. Each account has its own VPC, and we're mainly using Transit Gateway (TGW) to connect them, which has been running into some expensive issues. The additional network hops due to centralized inspection have been causing performance problems for many of our applications. We're considering adding VPC peering to alleviate congestion and maybe grouping some apps into shared VPCs, but I'm worried that this could complicate our networking further. I'm also contemplating using centralized multi-tenant VPCs with microsegmentation for better organization. What's the best approach here? Any insights on the advantages and downsides of these configurations?
4 Answers
It sounds like you've got a lot on your plate with those accounts. I recommend pursuing a centralized approach with shared VPCs for better cost management. Multi-tenant VPCs where each application has its own subnet can work well if you maintain good security policies. You’ll want to avoid excessive subnet segmentation to not waste IP space. Consider using security groups to manage access, as they provide more flexibility than subnets. And definitely reach out to your AWS account manager; they can provide tailored advice given your setup and scale!
No problem, happy to help out! Always good to explore all options, especially when it comes to managing costs effectively.
I’ve migrated my setup away from TGW and found that using PrivateLink for service-to-service communication lowered costs significantly. It reduces the traffic that goes through TGW while ensuring robust security by targeting specific resources. You may want to consider moving towards models incorporating AWS PrivateLink or even Cloud WAN for a more scalable setup. Let's face it, reducing dependency on TGW can have a huge impact on your overall bill!
Thanks for sharing that! PrivateLink is something we haven't fully explored yet but definitely sounds worth considering.
For sure! It offers a good mix of security, performance, and lower costs, especially for service interactions.
With the number of VPCs you're managing, adding VPC peering could definitely lead to a confusing network mess. Remember, VPC peering doesn’t support transitive routing, which means you'd have to set up individual peerings for each VPC. TGW is designed to handle larger networks and may end up being more cost-effective than you'd think, especially when you factor in the complexity of many peer connections. Instead of using numerous peerings, consider consolidating your VPCs strategically for efficiency. You can still implement micro-segmentation through security groups and subnets without needing a separate VPC for each app. Also, do look into tools like Cisco Secure Workload for better segmentation options if you go that route.
Going through a similar transition, I can say that moving to a hub-and-spoke model with TGW has been really beneficial for us. It simplifies the overall architecture and allows for better traffic management. Although TGW has its costs, it prevents the chaos that can come with a lot of peer connections. In regards to the performance issues, are you tracking what your bottlenecks are? Maybe it's worth investigating specific instances where workloads are severely impacted. If you're implementing inspection in certain VPCs, consider minimizing that to only where it's absolutely necessary to reduce latencies and costs.
Great point on tracking bottlenecks! I need to dive into our metrics to better understand where the issues are stemming from.
Absolutely, it can be enlightening to see the data. Once you spot the trouble areas, you can make more informed decisions about your network layout.

Thanks for the tips! I'm definitely looking into how security groups can help us out. And I will keep pushing my account manager for more specific guidance.