Hey everyone! We're working on a robust recovery strategy for several EKS clusters. In the past, we treated these clusters as pets, which made it tough to recreate them from scratch with consistent configurations.
Recently, we've started using ArgoCD alongside two ApplicationSets to help streamline this process: one for bootstrapping core services and another for business applications. We're managing the clusters and these ApplicationSets through Terraform, keeping everything under source control. This allows us to pass OIDC IAM roles and other Terraform-based values directly from the source.
At the moment, provisioning a new EKS cluster takes three `terraform apply` commands:
1. Create the EKS cluster
2. Bootstrap core services
3. Bootstrap application services
I think we could combine steps 2 and 3 by setting up sync waves properly, but I've noticed that the Kubernetes and Helm providers in Terraform have some limitations. For example, even when disabling resource creation via booleans, Helm throws errors about resources that don't exist during state refreshes.
I'd love to hear how others create clusters from a template. Are there better alternatives to Terraform for managing this workflow?
1 Answer
A common approach is to run Terraform locally, first to set up a backend (like creating a remote S3 bucket), and then run a second Terraform command to create the cluster, Argo, and the necessary credentials for accessing your Git repository. After the initial bootstrap, you can let Argo handle everything, as long as it has a parent App to manage. Just be careful with trying to provision the cluster and use the output immediately in the same Terraform run; that can get quite tricky. Also, don’t let Argo manage its own resources from within; that leads to circular dependency issues. Stick with Terraform for the Argo setup to avoid headaches later on.

Thanks for the advice! We're aiming for a one-click cluster provision, and it sounds like sticking to Terraform for the bootstrap is the way to go.