Struggling with Bedrock for My Synthetic Data Project – Any Tips?

0
0
Asked By WanderingCoder42 On

Hey everyone, I participated in a hackathon and earned $300 in credits to create a synthetic data generator. I'm feeling a bit stuck, so I could really use your help! I'm trying to generate thousands of dataset rows. I initially attempted using Claude 3.7 on Bedrock but found it could only produce about 100 rows at a time. I ended up batching my requests at 80 rows each, which got me to 1000 rows, but it took around 13 minutes. Is there a faster way to do this, maybe through an async method or a different model? I tried using aioboto3, but it didn't seem to work, possibly due to limitations with Claude 3.7 or something else. Also, after generating those 1000 rows earlier today, I'm now getting read timeout errors with the same code. Why is that happening? Any advice would be greatly appreciated!

2 Answers

Answered By CodeSmith2021 On

Typically, models struggle with large outputs at once. You might want to switch to a smaller and less expensive model that allows a longer context length, like Llama 4. Keep your output limit low and generate in multiple iterations. Batching your requests is a better approach since larger outputs tend to be ineffective all at once.

Answered By SynthDataPro On

Could you clarify what your final goals for the project are? You're creating a synthetic data generator – is that essential for the $300 grant? If you already secured the grant, what's your end goal? A simpler method might be to get the LLM to generate code for a library designed for fake data generation based on your data specifications. Running that code could lead to thousands of rows in seconds!

JuniorDev2023 -

I'm aiming for realistic datasets to help train models, not just fake data. I did get the $300 for this project, and I want to build something that can generate complex datasets that existing libraries can't manage. The only hiccup is the high cost and slow generation time. The process starts with generating a schema first, and then I can create the dataset if the schema works.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.