I've been dealing with what feels like a major anomaly while using ArgoCD with Crossplane, and it's driving me nuts! For about a week now, I've been trying to figure out why ArgoCD claims everything is "Healthy" and "Synced" when Crossplane is failing to provision AWS resources. We're talking about errors all over the place from AWS, like 400 errors, while ArgoCD continues to give us the green light as if nothing's wrong.
I've noticed this strange behavior where resources like Lambda functions and RDS instances are stuck or not being updated, while ArgoCD's dashboard is blissfully showing everything is fine. What's even more perplexing is that after searching tirelessly online, I couldn't find anyone else reporting this exact issue. It seems like I'm the only one experiencing these broken health checks between ArgoCD and Crossplane!
It turns out the issue lies in the health check Lua logic: it prioritizes the status conditions such that if a resource is "Ready: True" before "Synced: False," ArgoCD just assumes all is well. Since I've worked around this by changing the order of condition checks, I'm baffled that no one else seems to have hit this before. So, am I truly alone or has anyone else out there seen this issue? Are others ignoring ArgoCD health checks completely?
I've even documented my findings and solution, hoping it helps someone else who stumbles upon this problem in the future.
6 Answers
Thanks for taking the time to write this up! We're considering migrating to the stack you're using, so your insights are invaluable. Have you thought about filing a bug with GitHub for ArgoCD or Crossplane? This seems like it could impact many people.
This sounds like a real issue with the status reporting logic! Instead of a workaround, why not just submit a patch to modify that logic? That could solve the problem for everyone experiencing this issue with Crossplane.
Sounds like you're really mixing up what GitOps entails. ArgoCD is showing that your configurations are synced in the cluster, but Crossplane failing after that is a different issue altogether. In GitOps, matching the declared state of resources is key, but actual resource health falls under monitoring tools like Prometheus or Grafana. You might want to set up monitoring to catch those failures instead!
Definitely agree. You should leverage proper observability tools.
Exactly! ArgoCD being green doesn't necessarily mean everything's running smoothly.
I get your frustration! I think more people need to talk about this on GitHub and share experiences. It would be helpful if more users reported similar issues to push for attention from maintainers.
Agreed! This might help raise awareness of the issue within the community.
You’re definitely not alone; I've faced this exact issue in the past. I think a lot of folks might just be blissfully unaware of how Argo's health checks work. If you’re using Crossplane, it's crucial to write custom health checks that suit your resources!
I had no idea! Thanks for the tip, will keep that in mind.
Glad to see you managed to find a workaround! But honestly, posting it on Medium behind a paywall isn't the best idea. Maybe just share the details directly on platforms like GitHub or forums where it's accessible to everyone?!
100% agree with you on that!
I avoid Medium for that reason too!

I've considered it, but the maintainers said it’s more of an edge case... that’s why I went public with this to help others who might stumble here.