I've been struggling with monitoring the SMART data for my hard drives in a Linux environment. I manage six drives whose identification labels keep changing every time the system restarts, which causes issues with the monitoring solutions I've tried. They continue to read the SMART data regardless of the actual physical disks they're attached to, leading to a mix-up of historical data. I've explored a few methods: first, I tried using Zabbix with the SMART by Zabbix agent 2 template, but it triggers warnings every reboot because it identifies disks by /dev/sd* labels. Next, I looked into Prometheus monitoring using a script, but it also relies on those same labels, resulting in the same issues. While I found some relief with smartd.conf by configuring disks manually using /dev/disk/by-id/ paths, I'm still uncertain about the best way to ensure reliable historical monitoring of SMART data. What should I be doing differently?
1 Answer
I recommend using the smartctl_exporter for Prometheus. It allows you to map the metadata to the actual device name, which avoids the problem of changing device IDs. You can utilize the device info metrics to keep track of everything properly without losing historical data every time the system reboots.
That's interesting! Where do you send this data to get alerts for any faults?