Hey everyone! I'm an MBA student diving into the resolution challenges junior engineers face while managing K8s incidents. I'm specifically considering a situation where a pod keeps restarting: the junior engineer on call could either wake up a senior engineer or potentially spend a lot of time debugging on their own. I'm exploring the idea of an AI system that could provide step-by-step guidance, like checking pod logs and suggesting actions based on the findings.
For my research at Kelley, I'm interested in how effective AI-driven guidance could be for junior engineers, specifically in K8s environments that use open-source monitoring tools. I've created a brief survey to gauge what percentage of incidents these junior engineers think they could resolve with detailed guidance. The average response I'm seeing is around 68%. Any thoughts on this?
4 Answers
I think your survey might be a bit misleading. It seems to imply that an AI tool would work perfectly for every incident, but in reality, AI suggestions are often more of a guideline. People might rate how confidently they’d follow an AI's advice instead of how effective it is.
This sounds a lot like runbooks that rely on non-deterministic outcomes. I’m not convinced that’s the best approach.
I believe AI should be reserved for more experienced engineers. Novice staff should go through the learning process themselves; they tend to retain information better that way. Having AI do the work for them might lead them to skip important fundamentals, which can backfire.
I see your point! Especially in the situation OP described, a junior engineer might end up making some risky moves based on AI suggestions, like deleting critical resources.
You might want to check out Context7 MCP. I think it could be really beneficial for helping junior SREs troubleshoot by connecting them to relevant documentation and resources.
I appreciate the suggestion! Context7 is actually one of my inspirations for this project, but I’m looking to expand beyond just vendor documentation.
Thanks for the feedback! I didn’t mean to suggest perfection. I guess the goal is more about aspiration rather than assumption. We do need tools that are effective, or else they just become more clutter in our toolkit.