Hi everyone! I'm looking for a way to convert a PDF into HTML in Python, preferably with options that are either open-source or paid. The PDF I have includes specific elements like bold, italic text, font sizes, new lines, tab spaces, etc. I want to maintain all these formatting options so I can render the PDF content directly in the UI. Additionally, I'd like to know if there's a way to create a new PDF based on any updates made in the UI. Any suggestions?
3 Answers
Check out pandoc; it's about as close as you can get to what you're looking for. Just a heads up, though—it might not be 100% accurate in the conversion.
Honestly, PDF format is kind of a pain to work with; I'd recommend exploring other options if possible. It usually leads to more hassle than it’s worth.
You should try pdf2htmlex! It does a great job converting PDFs to HTML while retaining all the original styles. Plus, you can use PyMuPDF for text extraction and formatting adjustments.
Related Questions
Set Wordpress Featured Image Using Javascript
How To Fix PHP Random Being The Same
Why no WebP Support with Wordpress
Replace Wordpress Cron With Linux Cron
Customize Yoast Canonical URL Programmatically
[Centos] Delete All Files And Folders That Contain a String