I'm currently handling multiple self-managed RKE2 clusters in an airgapped environment, and I think we have too many Kubernetes Cronjobs. Previously, a teammate migrated a bunch of Java-based quartz crons to these Cronjobs, which run jobs that can be scheduled from once a day to once a month, transferring large datasets that can be hundreds of GBs. The problem is that many of these jobs fail frequently, and since they are Cronjobs, the logging is quite poor and inconsistent. Ideally, I'd prefer to switch to a step function model instead, but the team insists on sticking with RKE2. Also, we are using Oracle Cloud, which adds further complications. I'm reaching out to see if anyone has suggestions for a more effective deployment model for these scenarios.
4 Answers
I hear ya! We faced similar issues with Cronjobs too. The key fix for us wasn’t just enhancing the cron logic but improving our visibility into job logging. Centralizing job logs and setting up alerts for failed or missed runs worked wonders. If a job doesn’t log a success within a certain timeframe, it triggers an alert. You could look into open-source solutions like Fluent Bit + Loki or even ELK for log aggregation, as they help manage ‘silent failures’ really well!
I initially misread your post and thought you meant too many kubecons! But seriously, I feel like there are so many options out there to tackle the issues with Cronjobs; you've got this!
It really sounds like you might just need a more efficient solution overall. Sometimes there's a need for a better job management system to handle these processes more smoothly, rather than just fixing Cronjobs themselves.
What you’re dealing with sounds like a classic issue with Kubernetes Cronjobs. One solution could be to implement Argo Workflows for better orchestration of your tasks. It allows for more visibility and control over workflows, which might help with the failure rates you're seeing.

Related Questions
How To: Running Codex CLI on Windows with Azure OpenAI
Set Wordpress Featured Image Using Javascript
How To Fix PHP Random Being The Same
Why no WebP Support with Wordpress
Replace Wordpress Cron With Linux Cron
Customize Yoast Canonical URL Programmatically