I'm managing a four-node Hyper-V cluster running on Windows Server 2019 with S2D storage. Each node has two dedicated 25GbE NICs for storage replication. I've noticed that the resync times for each node have been steadily increasing after maintenance, now taking up to 15 hours just for the first node. I used to be able to patch all four nodes over the weekend, but I'm now having to pull late nights just to get the third node patched in time for the fourth to finish before Monday. Microsoft informed me that with only 1TB free on a 117TB pool, this is causing the delays, but I'm puzzled as previous visibility showed 116TB used consistently. Despite purging over 10TB from a CSV, the resync times still climb, indicating that the system may be processing a vast amount of used data during resyncs. Given that I can't replace the system until next year's budget, I'm exploring options—like whether a full cluster shutdown for simultaneous patching would work or if I should consider backing everything up and recreating the storage pool to optimize it. Any other suggestions?
3 Answers
First off, what's your OS version? Just checking since you mentioned Windows Server 2019. I had similar trouble with large, active workloads during storage sync on that version too. Switching to Server 2022 made things way smoother for us—much more efficient at processing those sync jobs. Also, is your setup Hyper-V converged? That could influence some aspects here.
Microsoft actually had a documented offline patching process for S2D that was once their go-to recommendation. It’s still suggested because it helps prevent lengthy downtimes during patching. Just a heads up though, even offline can take a few hours, but it might still beat spending an entire weekend in a sleep-deprived state just to patch four nodes. I’d take that extra few hours any day!
I recently switched from S2D to VMware, but if I remember right, make sure your volumes are sized properly. One massive volume won't cut it—it's better to create multiple smaller volumes. Ideally, with four nodes, you should set up three volumes along with the automatic health/performance volume. This balances the I/O across the cluster and can improve your resync times. Plus, with 2022, you can tweak resync performance settings, which might help if fixing the volumes doesn't cut it.

Related Questions
Can't Load PhpMyadmin On After Server Update
Redirect www to non-www in Apache Conf
How To Check If Your SSL Cert Is SHA 1
Windows TrackPad Gestures