Looking for a Good Word.docx to HTML Parser

0
11
Asked By SillyMonkey47 On

I'm struggling to find an effective way to parse Word.docx files into HTML. I've tried some tools like LibreOffice, but they don't handle Word headers properly or capture images well. Ideally, I'm looking for a server-based solution that can integrate into my backend. The workflow I envision is: parse the Word document, convert it to HTML, edit the HTML, and then create a PDF from it. If anyone has recommendations, especially if they support SVGs, I'd really appreciate the help!

2 Answers

Answered By TechSavvy81 On

There are a ton of ways to handle this! Do you need maximum fidelity? Are you leaning toward a desktop or server solution? If you're after server options, consider exploring cloud services. There's also COM automation if you want Word to manage the process directly. For a more direct integration, you could check out Aspose.

SillyMonkey47 -

I'm set on a server-based solution, ideally looking for something that can parse the Word docs and turn them into editable HTML. Any specific tools you recommend that fit that workflow?

Answered By HappyCoder92 On

You might want to check out Pandoc. It does a pretty solid job, although it's not perfect. Still, it's a reliable option for converting documents.

CuriousDev11 -

Yeah, it's not flawless, but for many use cases, it works well enough!

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.