I've been struggling with this perplexing issue for about a week now while using ArgoCD alongside Crossplane, and it's driving me nuts! The main problem is that ArgoCD keeps saying my resources are "Healthy" and "Synced", but in reality, Crossplane is failing to provision AWS resources, throwing 400 errors left and right. I have Lambda functions that won't update, RDS instances stuck without progress, and IAM roles that never get created—all while the ArgoCD dashboard shows everything is fine.
What's baffling is that I've searched high and low online, and I can't find anyone discussing this issue. It feels like I'm the only one encountering a broken health check with ArgoCD and Crossplane. I think the Lua logic for health checks processes conditions in a way that misleadingly shows resources as healthy when they're not.
I made a workaround by reordering the checks, prioritizing error conditions over healthy ones. Still, I'm left wondering: is anyone else experiencing this? Are you all just monitoring AWS directly and ignoring ArgoCD? Or am I just the unluckiest person in this situation?
**UPDATE:** I wrote up the solution to my issue here to help anyone else dealing with a similar problem. I also opened a GitHub issue after chatting with others about this. I really hope we can find a fix for this!
5 Answers
Why go for Medium when you could just as well open an issue directly on GitHub? It seems like a more effective way to address the problem!
I've faced this issue a while back too. Thankfully, I had already learned about ArgoCD's health check behavior. Anyone using Argo should test and write custom health checks for scenarios that aren’t accounted for. It’s a real shame that many don't seem aware of this.
I agree—this knowledge definitely needs to be shared more widely!
Thanks for sharing your experience! We're considering migrating to a similar stack, so this has been super helpful. Have you thought about filing a GitHub issue for this? It sounds like it might affect many people unknowingly.
I've considered that, but after talking with maintainers, they said this might be more of a corner case. Still, I hope my article helps someone!
It's great that you found a workaround! But honestly, posting it on Medium behind a paywall isn't the best way to help others who might be facing the same issue. Think about putting it on GitHub instead where more people can access it.
Yeah, putting it behind a paywall kind of defeats the purpose, right?
Totally agree! Medium just isn't user-friendly for troubleshooting.
I feel you on this. It’s easy to misunderstand how GitOps operates. ArgoCD is right in saying the objects are synced because that's its job. The issue with Crossplane failing afterward is a separate concern. Proper monitoring tools should have alerted you to the actual system health.
In GitOps, the focus is about ensuring the objects in your cluster match what’s expected, but that doesn't mean everything is running perfectly. You might want to look into monitoring solutions like Grafana or Prometheus for a clearer health overview of your system.
This is the real answer! ArgoCD and GitOps don't completely cover health checks.
Exactly! It's crucial to have different tools for different monitoring needs.
Yeah, I keep hearing about people avoiding Medium for this reason.