In our research group, we operate a workstation for running large language models. We currently have one enterprise SSD (Micron 5210) that's nearing the end of its service life—it's been in use for about 4.3 years and shows a 31% life expectancy left according to smartctl. The machine isn't heavily utilized, having accumulated years of unused data, totaling around 10 TB written over the years. Given that our models take up about 500GB of space, I'm wondering if we could safely switch to a consumer-grade SSD, possibly using RAID 1, instead of spending $600 on a new 3.8TB enterprise SSD. We have a UPS that provides at least 10 minutes of backup power, but I'm concerned about the risks if the drive fails—potential downtime or loss of easily regenerable research data. Criticality of the system is high, and we need 24/7 uptime, although some outages are acceptable. It's our organization's money we're dealing with, and if we save, we could allocate it elsewhere.
5 Answers
While there's no technical barrier to using a consumer SSD, keep in mind that they generally offer less reliability. If you have a support contract for your enterprise SSD, switching could mean losing that coverage. Evaluate whether the potential savings are worth it for your setup.
If you choose consumer SSDs, I recommend getting extra backups on hand. The RMA process for a faulty drive can take days, and you don't want that delay interrupting your team's work!
Enterprise SSDs often come with capacitors that help protect data during unexpected power failures by allowing the SSD to flush its cache to flash memory. Consumer drives typically lack this, which means there's a risk of bricking them if power goes out while they're writing. It's not common, but it can happen. Since you have a UPS and don't usually deal with major outages, you might be okay.
That's good to know! The UPS should help mitigate that risk.
You might want to consider sticking with enterprise SSDs. They may cost more upfront, but they come with features like battery backup and reserved space that help achieve better longevity and reliability, especially for critical systems. It’s important to weigh the risk of drive failure against any cost savings you might find with consumer options.
Given the usage, it might be worth going for a consumer SSD like the Samsung 870 EVO—it has good TBW ratings. If your research group has NVMe slots available, consider upgrades that would enhance performance better than SATA options, but it sounds like you've got to work within your current budget.
We do have free NVMe slots, so that’s definitely an option to explore!

I appreciate that advice! Considering the situation, I can manage to clone the drive quickly if needed.