Web Tools

Best Practices for Safely Converting Untrusted HTML Invoices to PDF

November 24, 2025

Asked By CuriousCoder42 On November 24, 2025

Hey everyone! I'm working on a background worker that processes invoice emails. If an email doesn't have a PDF attached, we grab the HTML content, sanitize it using DOMPurify, and convert it to PDF with Puppeteer to display for our users. We're careful with several security measures, including disabling JavaScript in Puppeteer, intercepting network requests to allow only data URLs, and sanitizing the HTML to remove any harmful tags or attributes. I'm considering implementing more restrictions, like limiting inline image sizes and blocking file URIs. We're also thinking about switching to an API service like DocRaptor or API2PDF to lower operational risks and enhance security. I'm curious, how does everyone else handle the conversion of untrusted HTML to PDF? Do you prefer using an API or self-hosted solution? How do you tackle SSRF, inline-image DoS, or any other security threats? For those using an API, which ones have you found reliable in terms of security, cost, and overall performance? I'd appreciate any real-world experiences or insights. Thanks!

5 Answers

Answered By TechSavvyDude On November 25, 2025

Instead of converting the HTML to PDF directly, consider taking a screenshot of it, converting that to PDF, and then running OCR on it. This way, you can avoid some of the risks associated with loading untrusted HTML in a browser.

QuickQuery - November 25, 2025

That sounds interesting! But how do you capture the HTML without potentially executing untrusted scripts? I'd be wary of that.

InquisitiveMind - November 25, 2025

Doesn't that come with similar challenges? You still need a method to ensure it's rendered safely.

Answered By PDFTransitions On November 25, 2025

I recently moved from API2PDF to a self-hosted setup using Gotenberg. It's faster, cheaper, and more reliable. I just send it a URL of the page I want to convert along with an authorization token. Plus, it can handle linked Word and Excel documents, merging them into the final PDF seamlessly.

Answered By DockerHater99 On November 24, 2025

Running in an isolated Docker container might sound appealing, but Docker isn't designed as a secure sandbox and might lead to unexpected issues. It's better to consider other isolation techniques to truly harden your setup.

CautiousCoder - November 25, 2025

I agree! Docker can be risky for this sort of application if not managed properly.

Answered By PixelPusher On November 24, 2025

When it comes to inline images, it's better to limit their dimensions rather than just file sizes. I once had a small PNG that expanded to nearly 1 GB when uncompressed! Also, make sure your worker only has local network access—if it does escape the sandbox, it shouldn't reach the internet.

Answered By SecureDev01 On November 24, 2025

If I were really concerned about security, I'd avoid directly converting the raw HTML of an email to PDF. Instead, I'd use sanitized text to produce clean HTML. Additionally, using a rendering engine that operates in a controlled JavaScript environment is crucial. I've had good luck with wkhtmltopdf for over a decade, though I'm not sure if there's a Node wrapper for it.

Best Practices for Safely Converting Untrusted HTML Invoices to PDF

5 Answers

Related Questions

Keep Your Screen Awake Tool

Favicon Generator

JWT Token Decoder and Viewer

Ethernet Signal Loss Calculator

Glassmorphism CSS Generator with Live Preview

Remove Duplicate Items From List

LEAVE A REPLY Cancel reply