Improving AWS EFS Performance with Rsync for Real-time Sync

0
0
Asked By CuriousPenguin42 On

I'm trying to improve my current setup, which feels both over-provisioned and under-optimized. Right now, I have a single large EC2 instance with a 5TB gp3 EBS volume that acts as a central sync node. Several smaller machines are syncing their data with a specific subfolder on the central node's disk using rsync every 5 minutes. This instance also reads data from the disk to send to an external API, acting as a middle layer. The size of the EC2 instance is based on peak usage, which means it's often underutilized during off-peak times, leading to high costs. I switched to EFS to enable autoscaling with multiple smaller instances, but found that EFS performed poorly with rsync due to many small files and metadata operations, causing delays in syncing. I tried different EFS modes but ran into IOPS bottlenecks and faced similar issues when considering FSx and EBS multi-attach solutions. I need a setup that allows real-time syncing and independent scaling of compute and storage without compromising on performance, especially for the API forwarding process. Anyone faced a similar architecture challenge?

6 Answers

Answered By OverthinkerOwl On

You might be approaching the problem the wrong way. It seems like using a filesystem isn't the best fit here—perhaps you need a transaction-based system instead. It would be good to dive deeper into why you're handling files this way if you're publishing to an API afterward.

Answered By UnifiedDataguy On

Something's not adding up in your scenario. Who exactly is writing the data? If all writes come to this central EC2, it seems like you might want a change data capture approach. Plus, using a database as your single source of truth could make the entire process smoother.

Answered By FileSizeSleuth On

How small are the files you’re syncing? If they’re really tiny, that might be part of the problem with performance.

Answered By SyncSpecialist On

Have you looked into AWS DataSync? It simplifies transferring files between your storage and AWS, which could help with performance.

Answered By TechieTiger91 On

Consider making your architecture event-driven by moving your data to S3. You can use an event trigger to automatically send that data to the external API using a Lambda function. Just keep in mind that S3 events could have around a 20-second delay, so it may not meet your real-time requirements.

PracticalPanda77 -

Yeah, that delay could be a deal-breaker if you need instant syncing. Might be worth exploring alternatives if real-time is a must.

Answered By DataEngineerDude On

EFS with rsync and a lot of small files can overwhelm metadata management, leading to performance issues. If possible, switching to S3 could help with scalability, but be mindful of the object tax for files smaller than 128k. Also, consider treating EBS volumes with caution, as they can present locking issues—use RAID across them if you opt for that route.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.