I have a Lenovo Storage system packed with 24 10K SAS hard drives from 2019, and it's running three RAID pools 24/7 as a VMware data store. The usage isn't extremely high—most VMs are Windows servers with some Linux handling things like domain control, file serving, SQL, and network logging. While there are peak times, especially during updates and maintenance sessions, I'm mainly focused on the drives' health.
Recently, I checked these drives using SSH, and thankfully, none showed any signs of issues like bad sectors or concerning metrics. However, I recall that a study from Backblaze indicated a significant rise in failure rates for hard drives after they reach 7 years of age. Since all my drives are identical, I'm a bit worried they might fail around the same time due to being from the same production batch. This overlap could complicate replacing drives and rebuilding the RAID, potentially leading to data loss.
Is my concern justified, and how should I properly assess the situation?
5 Answers
I’d recommend starting to phase out the older drives now. Maybe replace a quarter of them, wipe them, and keep them as backups in case something fails. Then plan to replace more each year. It’s a proactive approach that could save you from a lot of hassle in case of a drive failure later on.
7 years is a good benchmark to consider for HDDs, but for high RPM drives like yours, I’d suggest being cautious at around 5 years. If you can, think about gradually replacing some of the drives now to avoid any RAID nightmare scenarios down the line. If you’ve got solid backups, then you can take more chances with the replacements when needed.
The worst thing you can do for mechanical drives is let them spin down and then spin back up. If your drives have been running 24/7 without any power cycles, you’re in a good spot. You can check SMART counters to confirm this. As drives get older, especially over 7 years, keep an eye out for sector reallocations climbing up, and if you notice anyone starting to fail, proactively swap them out.
Agreed! Those gradual failures can sneak up on you, so better to be safe than sorry.
Your worries about batch failures are spot on. If drives come from the same manufacturer, share the same firmware, and experience similar workloads, they age together. Consider starting a staggered replacement plan now while everything's still functioning well. Waiting for signs could lead to a bad scenario during a RAID rebuild.
You’ve definitely got a valid concern about your drives potentially failing close together, especially since they were all installed around the same time. If your system is approaching the end of its lifespan, it’s a good idea to start looking at replacement quotes. In the meantime, consider keeping a stock of spare drives ready to go, just in case you need them.

Totally get where you're coming from! I’ve seen a few drives die suddenly after a power outage, so I always recommend keeping backups ready if you can.