I'm interested in deploying an OCR model found on Hugging Face (specifically, the rednote-hilab/dots.ocr model). I have experience using Sagemaker Realtime endpoints, but I've found them to be quite expensive. I'm looking for cheaper alternatives that can also minimize cold start times, so any suggestions for deploying on AWS that fit this criteria would be greatly appreciated!
1 Answer
Serving models can get pricey. If you're looking to save costs, you might want to check out Bedrock since it’s more serverless-centric, potentially offering some low usage benefits. Just keep in mind that it could still be costly per token. If that doesn't suit your budget, options outside of AWS like Digital Ocean, Lambda Labs, or Runpod might be worth exploring.

I agree, but I heard Bedrock has some limitations on model compatibility. It's always a bit tricky to find something that meets both budget and functionality if you're tied to AWS.