Hey everyone! I'm currently looking into the semantic search features offered by AWS Opensearch, specifically using the vectorsearch collection type. From what I gather in the documentation, it seems like I need to generate the embeddings for a field before I can even ingest my documents. I was hoping there might be an automatic way to generate these embeddings when I set up a knn_vector type. I've also read about integrating with Sagemaker or Bedrock, but I'm not seeing that option available for the serverless collection. Any insights or guidance would be really helpful, thanks!
5 Answers
Right? Pinecone definitely seems like a better deal when you look at AWS Opensearch pricing!
You could also utilize Bedrock's knowledge bases to generate embeddings automatically when you sync your data source, like S3. It involves mapping the fields properly, but it can be done with your existing OpenSearch setup.
It doesn't look like automatic embedding generation is supported right off the bat. However, you can set it up with the ML plugin along with an ingestion pipeline. This is based on the regular AWS Opensearch, so I can't say how it would play out in the serverless scenario though! Check out this link for more details.
Honestly, I'd recommend considering Pinecone instead. It's much more budget-friendly compared to AWS Opensearch costs.
Yep, you'll need to create the embeddings yourself. You can utilize AWS Bedrock, particularly with the Titan model, to help with this. Remember, embeddings are simply vectors that represent your text or other data in a certain space. OpenSearch won't know what you're trying to represent, whether it's a document field, the whole document, or something like an image. Just check those out!
Got it, I'm actually looking to use the pre-trained models mentioned in the OpenSearch docs. Thanks for the tip!