I've noticed that in many small teams and startups, most production incidents occur during infrastructure changes rather than application code changes. Even when using Infrastructure as Code (IaC) tools like Terraform, there are still issues that slip through—like incorrect variables, missing dependencies, or last-minute console changes that bypass the review process. For teams without a dedicated DevOps engineer, what processes or safeguards have you found effective in minimizing the risks associated with infrastructure changes on AWS? I'm eager to hear about real-world experiences, including what has worked well or not so much.
6 Answers
I've noticed it often comes down to having too much development focus with not enough operational oversight. Balancing those two is key to reducing risks.
We find that posting plans in Pull Request comments helps everyone stay aligned. The GitHub repo for the setup-terraform action includes a solid example of how to do this effectively.
One crucial step is to have someone who understands the potential impact of the change monitoring the environment. Often, developers focus on Terraform and infrastructure without fully grasping the broader implications of their changes. This kind of collaboration—true DevOps—is essential for mitigating risks.
To tackle the issue of last-minute console changes, we've started using Crossplane. With the provider-opentofu, it enters a reconciliation loop where any changes have to be approved via IaC in the main branch or they'll be reverted. This way, we aim to eliminate ClickOps altogether.
If you're fully using IaC, implementing source code versioning can be extremely helpful. It makes rollbacks much simpler since you can just redeploy the previous version that worked well. Ensuring that changes are only made through a structured pipeline could also help avoid issues.
Having a structured checklist and a constraint checker can really prevent deviating from the original settings in ways that might break existing services. It serves as a good guideline to keep track of what needs to be preserved.

Exactly! In more stringent environments, it's beneficial to restrict developers' rights to make infrastructure changes except through a pipeline. This creates a natural check on changes being made.