I'm working on an API that triggers a long-running job in ECS to generate artifacts and upload them to S3. Currently, the workflow I have set up is:
1. API Gateway receives a request with a Cognito access token and invokes a Lambda function.
2. The Lambda sets up the request and starts a standalone ECS task.
3. The ECS container runs for about 7 to 8 minutes, generating artifacts and uploading them to S3.
4. After the upload, the Lambda retrieves metadata from S3 and responds back to the API.
I'm concerned about potential API/Lambda timeouts since the ECS task might take longer due to EC2 scaling or image download times. I've considered a couple of alternatives:
1. **Step Functions** - I'm not too familiar with these and want to know if they're a good fit for my case.
2. **Asynchronous Approach** - The API would start the ECS task and return the task details to the user, who would then wait to retrieve the artifact metadata later. This seems easier to implement, but I'm unsure how to handle around 10-15 concurrent requests.
A few extra details:
- The job can't be migrated to Lambda as it runs third-party software for artifact generation.
- Expected API traffic is low (around 20-30 requests a day).
- I'm using EC2 instead of Fargate because the container images are quite large (7-8 GB) and can be pre-cached on EC2 as they change infrequently.
- EKS isn't an option since the rest of the team isn't familiar with it and isn't keen on learning.
I'd greatly appreciate any recommendations or best practices for refining this workflow! Thanks!
4 Answers
You might want to consider hooking EventBridge to run your ECS task directly without involving Lambda, if that's suitable. The API could create an event, and then you could emit another event when the job's done to notify the appropriate parties about where the artifacts are stored.
Since you're only handling about 30 requests a day, incorporating a polling mechanism could make sense. Let's say you return a UUID from the Lambda that tracks the job. The frontend can periodically poll the API to check if the job has succeeded or failed. It keeps things simple and user-friendly!
Keep in mind that API Gateway has a max timeout of 29 seconds, which won't work if your ECS container runs for 7-8 minutes. Trigger a Lambda that starts your ECS task and confirms that the job has kicked off. Then, once the artifacts are uploaded to S3, you can have another Lambda fire off an update to your frontend, so the user knows when it's done.
I usually go with the async approach. You could set up websockets to push updates when the process is complete. Also, don’t feel like you have to choose between using Step Functions and going async; you can actually do both! Just avoid having long-running tasks idle on Lambda.
Related Questions
Set Wordpress Featured Image Using Javascript
How To Fix PHP Random Being The Same
Why no WebP Support with Wordpress
Replace Wordpress Cron With Linux Cron
Customize Yoast Canonical URL Programmatically
[Centos] Delete All Files And Folders That Contain a String