How to Speed Up Rsync for Large File Transfers?

0
4
Asked By CuriousCat93 On

Hey everyone! I'm in the process of moving a large environment from on-premises to Azure, and I've been doing delta syncs with rsync every few days to prepare for the cutover. I'm currently dealing with a pretty rough situation—22 million files totaling about 5TB. Unfortunately, these delta syncs are taking over 3 days to complete, which seems excessive. I've tried tweaking various settings like nconnect and not using atime, as well as several other professional suggestions I could think of. I'm running it on an Azure VM with an on-premises Isilon share and an Azure NFS share mounted. Splitting the directories for multi-threading hasn't helped much since they're quite nested and unbalanced in file counts. I'm looking for any suggestions or maybe some hacky solutions to speed this up. I don't have limitations on bandwidth or VM resources. It just takes an enormous amount of time to compare the metadata of these 22 million files. Any ideas?

6 Answers

Answered By RcloneFanatic On

If you're open to alternatives, rclone might work better for you. You can use 'parallel' to run multiple instances of rclone to speed things up significantly. Though gathering the file lists takes time, the actual transfer is pretty snappy with the right setup!

Answered By FileTransferWizard On

Rsync can be pretty sluggish for large file transfers, especially from network shares like NFS. Have you considered using azcopy instead? It's specifically designed for cloud transfers and might serve you better.

Answered By AlternativeEnthusiast On

If preserving file permissions isn't crucial, Resilio Sync could be a good alternative. After the initial hash is calculated, it monitors for changes and only transfers new data, which saves a ton of time. It uses a bit torrent protocol to manage this efficiently.

Answered By ArchiveMaster On

If the initial copy of 5TB is all that matters right now, consider creating a compressed archive for seeding, unpack it in the cloud, and then just run rsync for any delta updates. If you need to do the transfer directly, I suggest using tar for the initial copy: `(cd /src/dir && tar cf - .) | (cd /dst/dir && tar xvf -)`. You could use SSH or add mbuffer to enhance the transfer speed.

CuriousCat93 -

I’ve actually completed the initial copies and I’m exploring nsync now. I thought nconnect might help with metadata handling, but it didn’t improve anything.

Answered By SpeedySyncGuru On

The main bottleneck with rsync is definitely its handling of metadata, especially with such a massive number of files. A simpler alternative could be using tar combined with mbuffer for maximum performance; however, that approach won't support updates to files. You might want to run something like this: `tar -cf - /data/folder | mbuffer -m 8G | tar -xf - -C /data/folder`. Make sure to track the speed using `pv` if you go this route. For updated files, you could use the `find` command to only copy files modified in the last week.

Answered By MetaDataMaverick On

When I last checked, rsync works on a single thread by default. As soon as I switched to multi-threading, I was able to fully utilize my bandwidth. Perhaps give that a go!

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.