Programming

Best AWS Setup for Short-lived LLM Workflows with Fast Regex Searches?

April 17, 2025

Asked By TechieExplorer123 On April 17, 2025

I'm designing an API that triggers a workflow to process large folders containing codebases, typically around 1GB each. The workflow isn't heavily compute-driven, but I need fast regex searches across files. I want to keep costs low and the architecture simple since this will be used infrequently on-demand. Here's my current setup:

- I plan to store each project folder as a zipped file in S3.
- When a request comes in, I'll use a Lambda function to:
- Download and unzip the folder
- Perform regex searches and run some tasks with an LLM (using the OpenAI API).

More details:
1. Total size: 1GB per project.
2. Expected use: 10-20 requests/day for one specific project, with plans to expand.
3. Response time isn't critical; the entire workflow averages 15-20 seconds.
4. The regex requirement is specific to client needs, generating patterns based on various inputs.
5. Semantic or symbol-aware search isn't necessary.

6 Answers

Answered By EfficientDev26 On April 19, 2025

Since you're getting about 10-20 requests daily with 1GB files, don't overcomplicate it! What you have in mind works fine. Consider exploring if streaming the process is possible for efficiency: stream from S3, unzip, search, and collect results. If streaming isn't feasible, try running AWS Lambda Power Tuner to optimize memory usage for the best price-performance balance.

Answered By ProcessPilot90 On April 19, 2025

One consideration is how long it takes to process that 1GB file—Lambda can timeout after 15-20 minutes. What’s your workflow once you get the results? You mentioned producing a text report using regex . If your workflow is not resource-heavy and the regex part is the most intensive, then deploying this via API sounds reasonable.

TechieExplorer123 - April 20, 2025

The whole workflow takes about 20-30 seconds, and I want to trigger this via an API and return the generated output for use by different services.

Answered By CloudGuru88 On April 19, 2025

I do something similar with AWS Step Functions. My process goes like this: when the API request comes in, I create a presigned URL for a user to upload their file. Then, an S3 event triggers an SQS queue that starts the Step Function. Using the Map state, you can handle multiple pages at once, which is helpful if you're processing many items. This method is serverless and lets you integrate easily with various AWS services. You can also output results through an SNS topic or a signed URL for the user to download.

CuriousStartup89 - April 20, 2025

Cool. Thanks for the input! We’re aiming to keep things straightforward and serverless since we're a small startup and only have one client using this workflow for now.

Answered By DataWhiz24 On April 18, 2025

I think you're on the right track, but I need a few more details to really help. What’s the total size of your data? How often do the files change, and do you have any specific access control requirements? It might be beneficial to consider having the data stored locally on a server and then using a simple grep if that's a feasible option.

TechieExplorer123 - April 20, 2025

I've updated the post with more details, please check!

Answered By ResourceSaver15 On April 18, 2025

Here’s a couple of suggestions: Zipping might not provide enough savings to justify the overhead. You could use tar and stream the content for regex processing instead. Additionally, consider using shared EFS to cache projects efficiently. Clearing out projects based on LRU could help manage space effectively.

TechieExplorer123 - April 20, 2025

Thanks for the suggestion!

Answered By DataDynamo37 On April 18, 2025

AWS Athena might be perfect for your needs. It can directly query your files in S3 and decompress zip files transparently. This could save you some processing time.

Best AWS Setup for Short-lived LLM Workflows with Fast Regex Searches?

6 Answers

Related Questions

How To: Running Codex CLI on Windows with Azure OpenAI

Set Wordpress Featured Image Using Javascript

How To Fix PHP Random Being The Same

Why no WebP Support with Wordpress

Replace Wordpress Cron With Linux Cron

Customize Yoast Canonical URL Programmatically

LEAVE A REPLY Cancel reply