I recently started working at a large product company, and I'm finding the onboarding process quite challenging. The documentation is outdated, the inventory is unclear, and it seems like no one really knows how many clusters we have—except maybe the CTO. Our virtual machines are spread across OCI, AWS, and Azure, with hundreds of build configurations in TeamCity for various purposes. It's taking me months to get a handle on this infrastructure, and I'm still discovering things I never knew existed. I'm curious if a tool, possibly an AI assistant, could answer questions like how many VMs we have with Windows ARM 64 or which Kubernetes clusters are still running below version 1.30. Would this type of tool be beneficial for your team? Would it alleviate some of the operational stress like it would for me?
5 Answers
That sounds like exactly the kind of challenge a CMDB (Configuration Management Database) system is meant to solve. It can help keep track of all your resources in one place.
Without Infrastructure as Code (IaC), you're in a tough spot. If you had IaC, you could just check the repository, and an agent could help navigate a large setup.
What you're describing sounds like a breakdown in management. High turnover often results in half-finished projects and chaos. Even large tech companies like Microsoft deal with similar messy situations.
CMDBs can be useful, but they're often static and can become outdated without constant maintenance. It's a lot of work to keep them current.
At my last job, we built an AI tool that queried our multi-cloud setup for infrastructure details—like VM usage and cluster statuses. It made onboarding so much easier and cut down on operational overhead. An 'infra-aware GPT' could really transform how teams handle messy multi-cloud environments!

That sounds impressive! I'd love to hear more about how that worked in practice.