I'm trying to convert a PDF into a text file so I can run it through speech synthesis for an audiobook. However, when I copy and paste the text from the PDF, some letters are missing in various words. I tried downloading Adobe Reader, hoping to use its conversion feature from the Windows App Store, but it turned out to be a 1.3GB download that requires a subscription just to access the function. I'm considering a workaround: should I convert the PDF into images and then use an OCR program? I'm open to any suggestions. Also, I run Linux Ubuntu, so options for that platform would be great!
1 Answer
If the text you can copy from the PDF is jumbled, then using OCR is a good way to go. It should help you retrieve cleaner text from the images. Just take your PDF, convert it to images, and then run those images through an OCR tool to get the text right.

Do you know how to extract images from a PDF?