I've created an embedding model using a Hugging Face transformer, and it's working well for accuracy. However, I'm running into performance and latency problems when processing large batches of text data. Since I'm already using AWS for hosting, I'm wondering if there's an AWS-native service that can help me generate embeddings easily, similar to what OpenAI or Cohere offer via API. I'd prefer a solution that doesn't require me to manage model inference myself or deploy any models on AWS. Any suggestions would be appreciated!
2 Answers
Consider using Amazon Bedrock, which features Titan Embeddings, Cohere, and Anthropic models accessible via API. It's a fully managed service, so you won’t need to worry about infrastructure management or batch processing. If you want to stick with SageMaker, check out SageMaker JumpStart for text embedding models with real-time inference endpoints—this option has a way lower latency than handling Hugging Face transformers yourself.
Just a quick tip: Titan Embeddings on Bedrock are totally serverless now! You can simply enable Bedrock in your AWS account and access it directly via SDK or CLI without the hassle of managing endpoints.
You might want to check out Amazon Bedrock. It provides embeddings from AWS as well as access to Cohere. While Cohere may offer better performance, it can be pricier. You can find detailed info on AWS's documentation. Also, don't forget that Bedrock Knowledge Bases and S3 Vectors might help create a cost-effective solution for your needs!
This approach sounds solid! Plus, I've heard AgentCore has resources to ease the deployment of a Retrieval-Augmented Generation (RAG) setup with embedding.
Sounds good! I’ll definitely give this a shot.

I’m looking into something similar, so I’d love to hear how it goes for you!