How to Manage Infrastructure Audits with Multiple Monitoring Tools?

0
15
Asked By TechExplorer98 On

Our team recently completed our annual audit of internal monitoring tools, and I wanted to share some of what we do. We audit alerts across various platforms like Cloudwatch, Splunk, Chronosphere, Grafana, and custom cron jobs to determine if they're still necessary and accurate. Additionally, we review AWS Auto Scaling Groups (ASGs) to ensure they have the right resources and that they're still owned by our team. This is just a small part of our audit process. It often involves pulling data from different systems to assess the current status of our infrastructure and tools. We compile everything into a spreadsheet, and tasks are assigned to different team members. I'm interested in knowing:
- How often are you auditing your infrastructure and tools?
- Do you have any advanced tools for this process beyond just spreadsheets?
- What is the typical time frame for your audits?
I'd love to hear what strategies work well for others!

5 Answers

Answered By FutureBuilder01 On

We're still in the early stages, but I’m developing a context layer that maps dependencies between our tools. We're collaborating with larger teams to clarify service ownership, aiming to uncover any unknown gaps in our setup.

Answered By DevOpsGuru84 On

We recently started using Drata for compliance management, and it has been helpful. For alerts, we add new ones when incidents occur that our existing setup didn’t catch. It can be a problem if alerts go unanswered, but we tackle that separately!

Answered By SimplicitySeeker42 On

We streamlined our monitoring by consolidating multiple tools. We transitioned from Icinga, Munin, and Graphite to Prometheus, which makes it much simpler to pull in data from Cloudwatch and report on it from a single system.

Answered By NFRAnalyst54 On

We rely on a detailed Excel sheet that outlines all systems in play alongside non-functional requirements (NFRs). We start with one reliable system as a benchmark and identify gaps based on empty cells—it's a straightforward way to pinpoint issues.

Answered By CuriousDev123 On

In the world of DevOps, there are always unknowns that an audit might not reveal. For example, how can you tell if something wasn’t logged at all? Sometimes it feels like a chaos monkey is the only real solution to manage outages effectively.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.