Hey everyone! I'm in a bit of a bind here and could really use some insight. I've been struggling with a pretty frustrating issue involving ArgoCD and Crossplane. Specifically, ArgoCD is showing all my resources as "Healthy" and "Synced," but in reality, Crossplane is failing to provision AWS resources left and right. I'm seeing 400 errors from AWS, yet my ArgoCD dashboard makes it look like everything is just peachy.
I've been trying to debug this for days—Lambda functions aren't updating, RDS instances are stuck, and IAM roles aren't being created. The bizarre part? I can't find anyone online who's mentioned this problem; it's like I'm the only one in this situation.
The glitch seems to stem from the health check Lua logic, which processes status conditions in a specific order. Because of this, if `Ready: True` is listed before `Synced: False`, ArgoCD just assumes everything is fine, completely ignoring the failed provisioning.
I did fix it by reordering those checks, but I can't help but wonder if I'm the only one experiencing this—or if others are just monitoring AWS directly instead of relying on ArgoCD for health checks. Did I mess something up in my configuration?
If anyone else has faced this, your experiences would really help me feel less alone in this chaos!
4 Answers
Appreciate your write-up! We're thinking about transitioning to this setup, so it's super helpful. Have you considered raising a GitHub issue? This seems like a potential problem for many users down the line.
Honestly, I hit a similar issue a while back, but I had already learned about Argo's health check quirks before diving into Crossplane. I assumed most users would know to customize their health checks if the defaults don’t cut it.
I don’t think enough people know about these quirks, and it can lead to issues like yours.
Glad to hear you found a workaround! However, it's a bit frustrating that you've shared the details on Medium as a "Member-only story." It makes it tough for others to access that info easily, you know?
Yeah, Medium can be a pain for that. A public GitHub issue would definitely be better.
Totally agree! Member-only stories really limit the audience.
You might be misunderstanding how GitOps works with Argo. The resources might be synced in the cluster, but health checks are separate. Argo verifies that the declared state matches, but it doesn’t account for operational health. For that, you'd need to set up proper monitoring tools like Datadog or Prometheus. Argo isn’t the end-all for resource health; it’s more about deployment success.
Exactly! Proper observability is key. Crossplane should provide logs you can monitor to catch those issues.
Very true. Relying on Argo for everything can be misleading.
I did consider it, but the maintainers said it’s not a pressing issue. That's why I wrote up my experience—hoping to save someone else from the same headaches.