I'm currently working with ECS using Terraform and am using EC2 as my launch type while also incorporating a capacity provider to manage scaling for the Auto Scaling Group. However, whenever I try to run the Terraform destroy command to remove my environment, it gets stuck and often forces me to terminate the process manually. The error message I encounter is a timeout after waiting for the state to change to 'INACTIVE', where it remains in a 'DRAINING' state for too long (the timeout is set to 20 minutes). I've tried several solutions but nothing has worked so far. Has anyone else experienced this issue, especially when using capacity providers with Terraform? P.S. Interestingly, if I run Terraform destroy again after it fails, it seems to work the second time around.
4 Answers
Just a thought, but make sure you don’t have shutdown or termination protection enabled on your EC2 instances. If those settings are active, they might block the destroy operation from completing.
The 'DRAINING' state could indicate that a load balancer is at play. It might be waiting for your application servers to respond and acknowledge that they're being removed. If the health checks are misconfigured or connections are lingering, it can block the operation from completing.
It sounds like your security groups might be the culprit. They can't be deleted until they're no longer associated with any resources, like an Elastic Network Interface (ENI). Sometimes those ENIs take a bit to fully release, which can lead to the hang you're seeing during the destroy process.
You could also look at your CloudTrail logs to see if any API calls resulted in errors during the process. Alternatively, enabling logging in Terraform might provide you with more insights into what's going wrong.

I'm using the Gatus application for monitoring, so I'm not sure if it could be affecting the state changes.