I'm developing a system monitoring app that allows users to set their own custom alerts, but I'm unsure about which metrics are truly worth notifying users about. For example, I believe that alerting on a single CPU core load may not be very useful. I'm curious what other people think—what system metrics do you find important enough to warrant alerts? Key metrics like CPU, RAM, disk space, and network usage are monitored, but I'd love to hear your insights on this!
5 Answers
Every user's needs can differ quite a bit. There isn't a universal list when it comes to alerts, which is what 'user-defined alerts' really aim for. It’s essential to tailor alerts to fit different user scenarios.
Start by monitoring everything, but don’t alert on everything. Identify critical situations where immediate action is required and set alerts for those events. For instance, if a system normally sits at 70% memory and suddenly drops to 30%, that’s definitely worth alerting on—but if the information isn't actionable at 4 AM, maybe just keep it on a dashboard instead.
Absolutely, focusing on what's actionable is crucial! We should avoid unnecessary noise.
Raw metrics are just part of the picture; what you want to detect are anomalies. For example, if a server is usually at 90% CPU load and suddenly drops to 10%, that could indicate a service stop. Alerts should be triggered by unusual fluctuations rather than raw thresholds. Also, monitor backup statuses closely—if something seems off with average backup volumes, that's definitely a red flag.
I recommend conducting a service mapping to pinpoint your most critical metrics. This could help identify what requires urgent monitoring and alerts.
If users can set custom alerts, they should define the conditions for them too! Just because you think a metric isn’t useful doesn’t mean the user feels the same way. It’s all about flexibility for users.
Exactly! Each setup is unique, so the alerts should be just as personalized.