Applications

Has anyone used Textract for extracting images and tables from PDFs?

April 19, 2025

Asked By CloudyNinja47 On April 19, 2025

I'm looking to extract images, tables, and figures from research papers. I've tried a few Python libraries like pymupdf and pdffigures2, but I've found them either too slow or their extraction quality is pretty poor. For instance, pymupdf doesn't handle tables at all. I'm curious if Textract or similar paid tools are worth considering for this task.

5 Answers

Answered By DataWhiz99 On April 20, 2025

I've been using Textract for about two years, but I've recently switched to the Anthropic API. I find it cheaper and more accurate, plus it has a full LLM feature that I really enjoy.

Answered By QuestionMasterX On April 20, 2025

The best way to know if it's right for you is to give it a try. Textract is a managed OCR platform and has a solid set of features to work with.

Answered By TechyGuru21 On April 20, 2025

I created a repo specifically for extracting fields and tables from images using vision language models. You can check it out [here](https://github.com/NanoNets/docext), and it should help with your table extraction needs. Plus, you can run the whole setup in a Colab notebook linked in the repo!

CloudyNinja47 - April 20, 2025

Thanks, I’ll check it out!

Answered By SkyHighCoder On April 20, 2025

Textract definitely has content extraction capabilities. You can test it out in their web console; all you need is an AWS account. It’ll cost a few cents, but I think the demo features are free!

Answered By PixelPioneer42 On April 20, 2025

I’ve heard that Sonnet 3.5 performs better when it comes to extracting information from images compared to Textract, so that might be worth looking into.

Has anyone used Textract for extracting images and tables from PDFs?

5 Answers

Related Questions

Fix Not Being Able To Add New Categories With Intuitive Category Checklist For Wordpress

Get Real User IP Without Installing Cloudflare Apache Module

How to Get Total Line Count In Visual Studio 2013 Without Addons

Install and Configure PhpMyAdmin on Centos 7

How To Setup PostfixAdmin With Dovecot and Postfix Virtual Mailbox

Dovecot Error Unknown database driver mysql

LEAVE A REPLY Cancel reply