I've been digging into some messy, inherited cloud accounts and I'm facing a common challenge: figuring out which resources like VMs or databases are actually important and which ones are just costing us money. The original creators of these resources are long gone, and there's zero documentation, making it feel like a risky game of guesswork. For example, there might be a VM secretly running a crucial cron job for HR, or it could be completely unnecessary. However, shutting it down could lead to chaos, while keeping it results in wasted expenses.
I know that strategies like tagging and strict controls can help prevent this situation, but unfortunately, I'm often dealing with environments where that was never set up. I'm developing a tool designed to automate the discovery of these undocumented resources by analyzing things like network connectivity and relationships, instead of relying solely on tagging. I'm looking to validate whether this is a common issue among other sysadmins and would love to chat with a few folks about how you manage this dilemma. In exchange for your time, I'm offering unlimited lifetime access to the tool when it launches, or a $20 gift card if it doesn't align with your needs. I'd appreciate any insights or methods you currently use to handle this issue!
4 Answers
One approach is to gradually turn off resources to see if anything breaks—a technique sometimes called the "Scream Test". When you turn off a VM or service, wait for issues to arise before turning it back on. Just make sure to document what’s going on to avoid missing anything crucial. That way, you'll know which resources are needed and can plan for migration or shutdown without risking downtime.
One key step is to utilize a configuration management database. This can help link resources to IT owners, and if a resource isn’t claimed, it’s easier to shut it down. I also recommend monitoring network traffic for a while to see what’s actually in use. This can guide your decisions significantly, but it’s a time-consuming process.
Exactly! And a tool that can automate that assessment would save a ton of time. If I could categorize resources by traffic and usage, I’d jump at the chance!
Definitely turn off what you can and see who complains! And while you're at it, start documenting everything. It’s essential to keep track of critical business processes because you won’t have a good recovery plan if you don’t even know the resource exists. Plus, those undocumented systems can be major security risks as well.
Right? It's so important to know who 'owns' each resource, or else it just becomes chaos when issues pop up. I'd recommend using a configuration management tool if possible.
When I dealt with this a few years ago, I did a deep monitoring stint for a couple of weeks—logging processes, checking net stats—just to figure out what resources were actually being used. After that, I was able to systematically shut down unnecessary resources without causing many issues.
That’s a great method! The key is to assess before making any moves. If a tool could simplify that approach, I’d definitely be interested!

Yeah, that method works but it’s tricky in larger companies. You might not know who to contact when someone starts yelling about a missing service!