Hi everyone! I'm working on a project called SecurePages, a privacy-focused printing platform, and I'm encountering a significant issue. The process is straightforward: users upload a document from their devices, we detect how many pages it has, and then they pay before we print it. Since this project is based in Ghana, we primarily accept Mobile Money for payments instead of traditional credit and debit cards, making it crucial to get the page count right.
The problem is that I've had a hard time finding a dependable method to accurately determine the number of pages in .docx files. Many of the tools I've tried either miscount pages or struggle with complex document formatting, and their page counts often don't align with what Microsoft Word shows. Because .docx is the main format that our users upload, this has become a major hurdle.
My tech stack includes HTML, CSS, and JavaScript for the frontend and Node.js for the backend. Unfortunately, none of the Node.js libraries I've tested provide consistent or accurate page counts for .docx files.
I would greatly appreciate any recommendations on libraries, rendering engines, or best practices for figuring out the number of pages in .docx files—whether that's through direct parsing, server-side rendering, or converting to PDF first. Thanks for your help!
4 Answers
I hear you! The .docx format can be quite tricky. Have you considered only allowing PDF files for printing? PDF page counting is much simpler, and it typically leads to fewer formatting issues during the printing process.
A possible solution could be to use a headless browser like Puppeteer to get the page counts. Check out this link for more info: [How to Get Number of Pages Using Puppeteer](https://stackoverflow.com/questions/53294512/how-to-get-number-of-pages-using-puppeteer). It could work for your needs!
Counting pages in a .docx file is really complex due to factors like printer settings and paper size. It might be best to find a reliable .docx to PDF renderer. This way, you can estimate print sizes more accurately and allow a percentage for variance in quotes.
Word documents weren't designed for print the same way PDFs are, so sticking to PDFs can simplify everything. Just my two cents!
That’s a fair point, but I'd worry about losing potential customers who find file conversion a hassle.

I understand where you're coming from, but limiting to PDFs might alienate some customers. Many people aren't comfortable converting their files, so we still want to support .docx uploads.