I'm currently scaling up our use of Amazon Textract to process around 50,000 documents a month, including invoices, contracts, and forms. I'm hoping to gather some insights from others using Textract at this scale. Specifically, I'm curious about a few areas:
1. Accuracy rates: What levels are you achieving based on document types? We're seeing about 92% accuracy on structured forms and 85% on semi-structured documents—is this typical, or do you see room for improvement?
2. Cost management: What strategies do you have for keeping costs stable? We've noticed significant variability depending on document complexity.
3. Queries feature: Is the extra cost justified when compared to doing custom post-processing?
4. Handling exceptions: What's your approach for human reviews? Do you use in-house tools or off-the-shelf solutions?
5. Alternatives: Has anyone compared Textract with other AWS AI services like Comprehend or Bedrock for document processing?
Overall, I'm satisfied with Textract but looking to optimize our implementation and learn from the community's experiences.
4 Answers
Hey there! I’ve worked with Textract for similar workflows handling various languages. My accuracy was a bit lower—much closer to 80% on some documents. What we did was to use free OCR tools like EasyOCR when our accuracy fell below a certain threshold. If that didn't cut it, we resorted to a human review process to make sure everything was right.
I actually recommend steering clear of Textract. It seems outdated and can be quite pricey and inaccurate, especially with foreign languages. There are better alternatives—consider something like ChatGPT or Claude for your needs!
I had an interesting take with Textract—we used it on documents that were handwritten and hard to read. After extracting the data with Textract, we’d run it through Bedrock to fix any typos based on context. This helped improve accuracy significantly without needing perfect input. You might want to test this approach as well.
Wow, processing 50,000 documents a month is some serious volume! When I was at AWS, we had clients with even larger workloads. They were typically from industries like legal and healthcare, which have tons of paperwork. Definitely manageable as long as you plan for it!

Related Questions
Neural Network Simulation Tool
xAI Grok Token Calculator
DeepSeek Token Calculator
Google Gemini Token Calculator
Meta LLaMA Token Calculator
OpenAI Token Calculator