I'm curious about why so many teams building AI solutions like co-pilots and chatbots seem to neglect proper observability. It raises some important questions: When an AI assistant fails to provide a correct answer, how do we figure out where it went wrong? And if a business misses out on a sale because a bot failed to transfer the conversation to a human, how can we trace that issue?
For observability to be effective in AI, we should have:
- Detailed traces for every step the AI takes, including calls and actions,
- Structured logs that we can actually query for insights,
- Metrics to evaluate the ROI, comparing good responses against errors, and
- Dashboards that are easy for business owners to understand.
It's crucial for small to medium businesses to build trust, developers to have ways to troubleshoot, and enterprises to maintain audit trails, yet many projects treat AI systems like a black box without proper visibility. If you work on an AI product, I'm interested in learning: What do you currently trace in your project? What do you feel is missing in your logs? And what would a complete end-to-end observability setup look like for your use case? I'm working on this topic now, and I've shared more details in a longer post if you're interested.
3 Answers
I've noticed a few mindsets holding back AI observability in some teams. Some think AI is so complex that regular observability doesn't apply. Others might believe that if their AI is observable, it means it isn't powerful enough, and there's this blame game where observability reveals issues that some might want to deflect. It's a mindset challenge that my team had, and it significantly slowed our project's progress.
It's worth mentioning that many non-AI projects also suffer from a lack of good observability. It's more about the overall culture of the team rather than the technology itself.
True, but with the rise of automation, especially in AI-to-AI interactions, we need to take observability way more seriously. Unlike deterministic code, AI can change, making it harder to predict behavior. Now feels like the time to embrace better observability!
Oh look, an ad!
You bring up great points! In my video, I emphasized the importance of using evaluations like thumbs up/thumbs down to log AI responses. It's not just about what you see, but questioning what you don’t see and why. We need to meet the AI where they are to enhance observability.