Hey everyone! I'm diving into building AI agents and I'm keen on making one that truly serves my team in the DevOps/SRE realm. My goal is to develop a bot that can help troubleshoot issues, remember past incidents, and identify patterns that we might overlook—essentially a second brain that never forgets those tricky root causes.
Right now, here's what I'm working on:
- Parsing incident documentation and creating embeddings for semantic searches, which isn't too tough.
- Allowing interaction with the bot for troubleshooting or recalling prior issues (as long as the app is running).
- Starting with a local CLI while planning to evolve it into a Slack bot or web interface in the future.
Now, I'd love to hear from you: if you had a tool like this, what features would add genuine value for you and your team? Would you want it to automatically highlight similar past incidents, suggest known fixes, clarify tricky Terraform or Kubernetes configurations, help with triaging alerts and logs, or even say things like, 'Hey, this seems like that outage back in April'?
Also, are any of you using tools like this already? I'm interested in scripts, platforms, or any vendor solutions available, and if they're worth the investment. I'm not looking to pitch anything—just hoping to learn from those of you who are building or using AI in this field. I appreciate all feedback and suggestions!
5 Answers
To be honest, it sounds good at first, but generative models often add unnecessary fluff and complexity. Complexity is usually where things start to fall apart, you know? Maybe keep it simple.
I think it's going to be tough to make it genuinely useful. DevOps involves so many complex systems and business logic that it’s hard to pin down. But maybe starting with something simple, like basic documentation and issue tracking, could be a better approach.
Focus on improving your documentation first. Having well-structured docs and accessible logs is critical for any automation to work effectively. Trust me, good documentation makes all the difference.
Seriously, it's more challenging than you think. Instead of building a complex AI, why not enhance search functionalities for your existing documentation? Things like CoPilot are already out there to help explain code, which could simplify things. You might just be getting ahead of yourself.
I get where you're coming from, but I really wouldn't trust an AI agent to not hallucinate and throw out random info that could waste a lot of time. It's a bit of a gamble, honestly.
Related Questions
Can't Load PhpMyadmin On After Server Update
Redirect www to non-www in Apache Conf
How To Check If Your SSL Cert Is SHA 1
Windows TrackPad Gestures