How do you convert untrusted invoice HTML to PDF securely?

0
14
Asked By CreativeStorm42 On

Hey everyone! I'm currently developing a background worker that processes invoice emails. When there's no PDF attached, we extract the HTML from the email, clean it up with DOMPurify, and convert it to a PDF using Puppeteer. This PDF is then displayed to users on the frontend, allowing them to send us their invoices via email. To keep this process secure, we've implemented several safety measures: disabling JavaScript in Puppeteer, intercepting all network requests to allow only data URLs, and sanitizing the HTML to remove risky tags and attributes. I'm also considering imposing further limits like maximum sizes for inline images and blocking file URIs. Additionally, I'm exploring the idea of switching to an API service like DocRaptor or API2PDF to lower operational risks and help with security. My main questions are: if you're converting untrusted HTML to PDF, do you use a service or host it yourself? How do you handle issues such as SSRF, inline image denial-of-service, or other potential attack vectors? For those who've used an API, which service do you prefer or regret, especially in terms of security, cost, and reliability? I appreciate any insights or real-world experiences you can share!

5 Answers

Answered By PDFGuru22 On

If I were really concerned about security, I might not convert HTML from an email directly to a PDF. I’d probably prefer to use plain text to generate HTML for the PDF. Also, using a rendering engine that executes JavaScript outside of a secure browser setup could be risky. That's why I've been sticking with wkhtmltopdf for years — not sure if there's a Node wrapper though.

Answered By GotenbergFan On

I switched from API2PDF to a self-hosted Docker setup with Gotenberg. It's faster, cheaper, and way more reliable. I just send a link to the page I want converted, along with an authorization token. Plus, it can handle linked Word and Excel documents too, merging everything into one PDF seamlessly!

Answered By SafeDockerDude On

Running the conversion inside an isolated Docker container seems like a solid idea, but be careful—Docker shouldn't be relied upon as a security measure by itself. It might backfire if there's a breach.

SecurityNerd -

Exactly! Docker's isolation isn't foolproof; you should seek out other security layers.

Answered By PrintWizard99 On

One alternative you could consider is taking a screenshot of the HTML and then converting that image to a PDF. Afterward, you could run OCR for text extraction.

ImageMuncher -

Right? I feel like you'd end up in a similar predicament!

SnapToPDF -

True, but how would you do the screenshot without rendering it in a browser? Doesn't that pose some of the same risks?

Answered By ImageManager33 On

Make sure to limit image dimensions instead of just file sizes. I once had a small 45kb PNG that expanded to nearly 1GB when uncompressed — it's wild! Also, consider restricting the background worker to only local network access; if something escapes your sandbox, limiting external access could help.

Related Questions

Keep Your Screen Awake Tool

Favicon Generator

JWT Token Decoder and Viewer

Ethernet Signal Loss Calculator

Remove Duplicate Items From List

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.