Hey everyone! I'm seeking some advice on setting up a distributed file system. Right now, we have an NFS server that handles shared libraries and serves application-related files (like images), which works fine, but it's a single point of failure. I need a solution that is POSIX compliant, offers a single namespace, can be accessed via NFS, and has non-snapshot based geo-replication. Ideally, I'd prefer something with synchronous geo-replication, although that's not a deal-breaker. I've primarily looked at Ceph, but I heard that CephFS only supports snapshot-based replication. I also checked out Ceph-RGW with NFS exposed through Ganesha but ran into some issues. Any recommendations would be fantastic, thank you!
5 Answers
Ceph is great because it inherently keeps multiple copies of data across different servers. By default, it keeps three copies of each block, so cephfs could be exactly what you need!
Check out GlusterFS and Hadoop too! Depending on your hardware and network speed, you might even consider mirroring your NFS share device with DRBD for block device replication over the network. A small HA cluster with seamless failover to a secondary server could be viable as well.
I’d suggest giving Hadoop a shot too.
GlusterFS could be your best bet. Just a caution, I wouldn’t recommend synchronous replication across WAN due to the potential performance hit. However, their async replication is not snapshot-based, which fits your needs better.
You might want to consider Microsoft Windows DFS. Setting up a solution with two or more servers could be pretty straightforward.
Just a heads up, using DFS shares, which are CIFS/SMB, over NFS historically causes a lot of oplocking issues. Might want to keep that in mind.