I'm working on a project where I use a Hugging Face transformer model to generate text embeddings. While the accuracy is great, I'm facing some performance and latency challenges, especially when handling large batches of data. Since I'm already using AWS for hosting, I'm curious if there are any AWS-native or managed services that can generate embeddings directly via API, like the APIs from OpenAI or Cohere. Ideally, I'd prefer a solution that doesn't require me to deploy any models myself. Any suggestions?
2 Answers
You might want to consider using Amazon Bedrock for your needs. It includes Titan Embeddings and models from Cohere and Anthropic, all accessible via API. It's designed to handle scale seamlessly, so you won’t have to worry about managing infrastructure. If you're looking for a SageMaker option, SageMaker JumpStart has 'text embedding' models with real-time endpoints that can provide lower latency compared to raw Hugging Face models. I’d recommend starting with Titan Embeddings on Bedrock since it's serverless and integrates well with other AWS services.
Have you checked out Amazon Bedrock? It provides embeddings from both AWS and Cohere. Some users say Cohere has better performance, but it might come at a higher cost. Here's a couple of links to get you started: [Bedrock User Guide](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-embed-v4.html) and [Titan Models](https://docs.aws.amazon.com/bedrock/latest/userguide/titan-embedding-models.html). You could also explore using Bedrock Knowledge Bases for creating cheaper retrieval-augmented generation (RAG) setups with embeddings.

Related Questions
Neural Network Simulation Tool
xAI Grok Token Calculator
DeepSeek Token Calculator
Google Gemini Token Calculator
Meta LLaMA Token Calculator
OpenAI Token Calculator