Best Way to Scan Hybrid PDFs on Linux Mint?

0
0
Asked By CuriousCat92 On

I'm fairly new to using Linux, and I'm currently on Linux Mint Cinnamon (also dabbling with XFCE and Debian). I have a collection of old papers that I want to scan into hybrid PDFs, which means I need the scans to include both an OCR text layer and the original images. I've attempted to use gImageReader along with Tesseract's OCR engine and tried gt5, but one was super slow, while the other produced low-quality results. I experimented with settings from 300 to 1200 DPI, but that didn't make a difference either. I also used OCRFeeder, but it kept hanging when I tried to save my work as a hybrid PDF. Does anyone have a better solution or an alternative tool? My setup is an i7 dual-core with 4 threads, 16GB of DDR3 RAM, a SATA SSD, and a fresh install of Linux Mint Cinnamon 22.1.

1 Answer

Answered By TechWhiz78 On

Have you tried using OCRmyPDF? It's pretty reliable and works well with my setup. I use it alongside paperless-ngx, and it hasn't given me any trouble. Maybe that could be a solution for your hybrid PDF needs!

CuriousCat92 -

Thanks for the suggestion! I’m not sure what went wrong with gImageReader, but it kept misreading spaces as characters. Some processing times were ridiculous too.

GadgetGuy42 -

I installed gImageReader with Tesseract on a totally different machine, and while it was fast, the output was just garbage. Maybe it's a scanner issue?

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.