How to Vectorize Text for AWS OpenSearch in a CRM System?

0
10
Asked By CuriousCoder42 On

Hey everyone! I'm working on creating a search engine for our CRM that needs to handle text search. I'm planning to vectorize the text before I insert it into OpenSearch, but I'm not quite sure how to tackle this. We have a massive amount of historical text messages — about 300 million — along with receiving around 500,000 new messages daily. I'll be using the HTTP API for data insertion. Any advice on how to effectively handle this would be greatly appreciated! Thanks!

4 Answers

Answered By EngineerEager On

It might be beneficial to start with some software engineering courses if you're new to this. Building software requires a solid understanding of the fundamentals, so get those skills down and then tackle the project!

Answered By CloudGuru99 On

You might want to look into using Amazon Kendra or Bedrock Knowledge Base for your needs. They automate the vectorization process when you upload your data. While OpenSearch is powerful, S3 Vectors can be a cheaper alternative for storage, although they might have slower retrieval times compared to OpenSearch. Just keep your project's latency requirements in mind!

Answered By TechSavvyWizard On

For a project of this scale, it's better to handle the vectorization outside of OpenSearch. Consider using a dedicated embedding model like Bedrock or SageMaker to generate your vectors before indexing them. Here's a game plan:

1. Use an external model to vectorize your text.
2. Store these vectors in a knn_vector field alongside your original text.
3. Leverage OpenSearch's k-NN or vector search features for similarity searches.

A few tips:
- Don't try to stream all the data—instead, backfill in batches.
- Make use of bulk APIs instead of individual HTTP inserts.
- If possible, opt for a smaller embedding size; it can significantly affect performance.
- Be mindful of costs and indexing time; 300 million documents is a big task, so consider sharding by time or CRM entity for efficiency.

Answered By DataNerdDave On

OpenSearch does include some machine learning features that you could experiment with, but results can be unpredictable. It might be worth checking it out to see if you can get a prototype to work. If ML features don’t cut it, diving into a solid book on OpenSearch could level up your skills, or consider getting some professional help if needed.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.