I'm looking to build a straightforward file upload API where customers can send their files to a single endpoint at `https:///upload`. The requirements are that the files can be up to 100 MB in size, and I need to ensure some server-side validation during the upload. This includes computing a hash of the file to check against another service before I accept it. I also want to give the client immediate feedback on whether the upload was successful or failed, along with returning an ID for the file. My goal is to keep the process as simple as possible – just one request to `/upload`, without needing any presigned URL round trips.
I've checked out AWS's guidelines on S3 upload patterns and have already dismissed a few options:
1. API Gateway as a direct proxy due to its 10 MB payload limit and the challenge of implementing custom validation.
2. API Gateway with presigned URLs since that would require multiple client requests and wouldn't let me compute/validate a hash in the same request.
3. CloudFront with Lambda@Edge because it has a 1 MB body limit for Lambda@Edge, preventing full upload validation.
Given these constraints, what AWS services and architecture do you recommend? I'm currently leaning towards using an ALB and ECS Fargate. Also, the validation needs to check if the exact file already exists; if it does, I want to return the existing file ID rather than generating a new one. I'm a bit perplexed about how to handle the notifications to users in an async manner after the upload. Any insights would be appreciated!
3 Answers
It sounds like you've got a solid approach going with the ALB and ECS Fargate! But I'd suggest you reconsider presigned URLs; that's how a lot of apps are handling uploads seamlessly with S3. They just work smoothly without adding complexity on your end. If you can, it might save you a lot of additional coding and effort down the line!
If you absolutely must do hash validation and can't use S3's built-in checksum headers, you could look into a two-stage upload. First, have clients get a presigned URL to upload their files, then trigger a Lambda on that upload to validate the hash and move it to a final bucket if it's valid. You can manage IDs with S3 metadata, so that could allow better tracking through your process!
How strict is your requirement for the 100 MB limit and avoiding presigned URLs? It seems like you're setting up a lot of extra complexity for something S3 is quite capable of handling natively. Presigned URLs would let you offload some of that burden and still protect uploads. Just a thought!
That's a clever workaround! Using Lambda to handle validations post-upload might be a great way to keep your endpoint simple while ensuring security and data integrity. Good thinking!