I'm currently developing a system that helps diagnose alerts in Kubernetes using runbooks which are stored in a vector database, specifically ChromaDB. My existing setup is a code-driven retrieval augmented generation (RAG) model that processes alerts like 'PodOOMKilled' using a predefined list of keywords. Based on the similarity scores gathered from this list, I determine whether to reuse, adapt, or create new runbook guidance. However, I'm considering a shift to an agentic approach where the agent would have tools to generate its own queries and make decisions about the runbooks. I'm really unsure whether this change is genuinely beneficial or just complicates things. I'd like to know more about how others manage issues of non-determinism in agentic RAG approaches and whether sticking with code-driven RAG might actually be better in some cases. Any insights or experiences would be greatly appreciated!
3 Answers
Great points raised here! When handling potential partial results or mixed data, I've found that it's essential to first identify relevant runbooks using small search fragments, but then retrieve the complete runbook for clarity. That way, you mitigate confusion from incomplete information. Non-determinism in agentic systems can be a double-edged sword, so ensuring robust monitoring and defined fallback rules is crucial.
It's natural to question whether going for an agentic approach is worth it, especially since teams often shift from deterministic systems to more AI-driven solutions. My take is to stick with code-driven RAG for now since its predictability is its strength. You should only switch if it's clear that your alerts need more nuanced reasoning. Maybe run a parallel test with both systems using historical alerts to see how they perform side by side before making any big changes.
Switching to an agentic RAG can definitely offer more flexibility since the agent can craft queries better than just a static keyword list. But you have to be aware that with that flexibility comes a higher degree of unpredictability. The non-determinism is a valid concern—sometimes, outputs might not match your expectations based on the scoring system you relied on. For now, if your current code-driven system works and gives you repeatable results, I’d be cautious about making the jump without a thorough test first.

Related Questions
Can't Load PhpMyadmin On After Server Update
Redirect www to non-www in Apache Conf
How To Check If Your SSL Cert Is SHA 1
Windows TrackPad Gestures