Hey everyone! I'm currently working on a new project in the AI agent space, specifically related to DevOps and site reliability engineering (SRE). I'm curious about your thoughts on how AI agents might impact observability. With observability being one of the major expenses in infrastructure alongside cloud services and salaries, I'm wondering if these agents, which can analyze logs, perform cross-referencing, and handle initial investigations, will ultimately reduce our reliance on traditional observability tools. Do you think that as AI takes on more of these tasks, we might see a decrease in spending on observability tools, maybe just requiring basic logging and dashboards? I'd love to hear your insights!
6 Answers
In the future, AI might lower costs, but we're not there yet. It'll take a few more years for that to materialize.
If your services aren't naturally observable, AI can't help much in observing them at all. It really depends on how well your initial setup is designed for observability.
I see how you'd think AI could replace parts of observability, but the solution lies in doing observability right. Effective logging and monitoring strategies can significantly lower costs. There are plenty of existing smart detection features in tools already, and I fear your idea might just add to a crowded market without offering something truly innovative.
Thanks for the input! Just to clarify, I’m not aiming to replace observability. I'm interested in how the interface and exposure of observability data might shift with agents assisting in operations.
Observability isn't going anywhere; it will evolve. While AI agents can handle some initial grunt work, the increased automation will actually create a need for more robust observability. I've yet to see an AI effectively debug issues in a complex system—it just adds more factors to consider.
That's what I'm working on right now! If agents can handle the initial triage, do you think that enables DevOps teams to analyze data on a higher level, or will we still need to dig through everything?
I don't think AI will entirely replace observability tools anytime soon. Sure, you can utilize large language models to assist with diagnostics, but they'll complement existing processes rather than replace them. You'll still want to monitor what the AI is doing to ensure proper oversight.
Agreed. It's crucial to store and audit agent activities. If we can trust these agents, do we need to keep the same extensive logs, or can we start evaluating based on what the agents provide?
A lot of observability costs stem from the need to obtain, store, and process data. Even with AI, if we still need extensive information, processing costs will rise, not fall. AI might give us fresh insights or chat interfaces, but it won't really lower expenses significantly.

What do you think would need to happen for that shift to actually occur? Maybe the observability tools will transition to an 'agentic interface' to streamline focus onto the most relevant data?