I'm looking for a parser that can convert .docx files into HTML effectively. I've tried a few options like LibreOffice, but they struggle with some elements, particularly headers and images. Ideally, I want a server-based solution since I'm looking to integrate it into a backend workflow. Any recommendations?
2 Answers
Have you checked out Pandoc? It’s not flawless but a solid option for this task!
There are tons of ways to accomplish this! What’s your priority? If it’s maintaining quality, consider using server-based solutions or cloud services. You could use COM automation with Word for a native conversion or look into Aspose. If you really want server-side, you should be able to integrate that smoothly!
Look into Mammoth. It's designed to help with .docx to HTML, especially if you're dealing with SVGs!

I need a server-side solution. I'm planning to parse the Word files into HTML, edit the HTML, and then convert to PDF. Also, the documents may include SVGs; do you have anything specific in mind?