I'm running Rook Ceph version 1.18.1 on K3s with Reef 18.2.4 using 3 nodes, each with an Intel Xeon W-2145 CPU (16 threads) and 64GB RAM. I've got Samsung 990 EVO Plus 1TB NVMe drives in each node, which should be capable of 868,000 IOPS, but I'm only achieving around 3,260 IOPS—just 0.4% of what this hardware can do. I've ruled out network issues since I'm using a dual bonded 25GbE setup and validated saturation with iperf3. I'm looking for specific tuning advice for Rook and Ceph to get better performance, particularly any settings that might be causing a bottleneck or limiting IOPS in a Kubernetes environment. Any insights would be super helpful!
2 Answers
It's odd that you're only seeing such low IOPS. Using a single 1TB NVMe per node might be limiting, as you generally can't expect to squeeze out the full specs of the drives with that configuration. You could try using bigger volumes or even consider tweaking some settings in your Ceph configuration related to OSD performance to improve your results.
You might want to reconsider using enterprise-grade SSDs instead of consumer models like the Samsung 990 EVO Plus. For higher performance with Ceph, enterprise drives with power loss protection are essential. They handle the stresses of I/O operations better and are designed for durability. Check out Ceph's hardware recommendations for more insights!
Thanks for the feedback! I wasn't aware of the importance of enterprise drives for this kind of workload. I'll definitely look into upgrading!

I get that! I'm not hoping for the full 900k IOPS, but even hitting around 30-40% of it would be great. Any specific settings you suggest tweaking?