I've been dealing with a really frustrating problem using ArgoCD with Crossplane. Despite ArgoCD showing resources as 'Healthy' and 'Synced', Crossplane is failing to provision AWS resources, throwing 400 errors. This has led to issues like Lambda functions not updating and RDS instances getting stuck. I feel like I'm the only one experiencing this since there's hardly any information online about it. The problem seems to lie in the health check logic where the order of status condition checks causes ArgoCD to misreport the health of resources, ignoring actual errors. Has no one else encountered this? Are others not using health checks, or is everyone just keeping an eye on AWS directly? It's baffling, especially after I figured out a workaround by reordering the checks. Am I alone in this?
5 Answers
It's great that you found a workaround for the issue! But seriously, Medium? A member-only article? That's not very helpful for the community.
Totally agree! I'd prefer to see it discussed openly where everyone can benefit.
I think you're not alone in this at all! It's a tricky issue, and I've faced this behavior from Argo before too. There's definitely a gap in documentation when it comes to how these health checks should work together. Just keep sharing your findings; you might help someone else in the trenches!
My sentiments exactly! The more we share these experiences, the better it gets for everyone.
Exactly! Community contributions can lead to genuine improvements in these tools.
I hear you! But it seems that your misunderstanding of GitOps principles is making it seem worse than it is. ArgoCD is doing its job by making sure your resource definitions match the cluster’s state. The fact that Crossplane is having issues afterward isn't something Argo can handle directly. It’s more about monitoring and alerting being set up outside of ArgoCD itself. You might want to consider using monitoring tools like Datadog or Grafana for better insight into your resource health.
Couldn't agree more. It's not Argo's job to keep track of every underlying system's health.
Exactly! ArgoCD is just for deployment; for monitoring health, you'll need to set up proper observability.
Seems to me you should put that explanation on GitHub as an issue. Migrating to the same stack means a lot of us might run into this. It would be more effective than just posting about it on Medium.
True! Might be worth trying again, especially if it affects others.
I get that you talked to maintainers, but an official issue might still prompt some action.
I've run into this before, and honestly, it's not surprising. Many people assume Argo will just handle everything, but you usually need to create custom health checks for things like Crossplane. It seems like you have a good grasp on that now, though. Just remember to share your findings with the community for others who may go through this!
Yep, this should definitely be spread around more. It'll save a lot of future headaches!
I think raising awareness about custom health checks is vital for anyone using these tools.
Right? If it's going to be behind a paywall, maybe share it somewhere more accessible.