Looking for Modern Alternatives to Textract for Document Parsing

0
4
Asked By TechSavvyGiraffe23 On

I've been using Textract for document parsing and text extraction, and while it works well, I'm on the lookout for alternatives that can better handle table layouts and save results as markdown strings. I've heard good things about IBM's Docling and FB's Nougat, but I'm really interested in hearing about people's real-world experiences with different tools in production environments. Any suggestions? Also, I just found a fork called MarkItDown API that seems to fit my needs perfectly, thanks to a recommendation!

3 Answers

Answered By ParserExtraordinaire On

Another option to look into is Marker by Vik Paruchuri. It might have the functionalities you're after.

ThankfulUser01 -

Awesome, thanks!

Answered By DataDynamo76 On

I initially thought you were asking about AWS Textract, which is great for handling tables too, by the way. I've been using it for a few years now, and it really does a good job for various document types.

OnPremiseNinja -

In my case, everything has to run on-premise, unfortunately.

Answered By CodeWizard99 On

You might want to check out Microsoft's MarkItDown for your needs. It seems to handle markdown outputs quite well!

CuriousCat88 -

Awesome! Thank you! Are you using it right now?

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.