I'm considering using PayloadCMS for a new project and will need to migrate an old database, which is essentially a collection of HTML files, into Payload. I'm wondering how well it performs with around 850,000 records. Will this affect visual performance? Additionally, has anyone migrated a large number of posts to a database before? What was the process like in terms of converting those files into a more manageable format and then importing them into the database?
4 Answers
850,000 files sounds like a lot, but for a modern database, that's manageable. You’ll want to ensure your database is well-indexed to keep performance smooth, especially with Payload's admin UI, which can get a little sluggish without pagination. The actual speed of queries will depend on whether you’re using PostgreSQL, MongoDB, or something else, and how you structure your data.
For migrating those HTML files, I recommend scripting the extraction to JSON first, validating the structure, and then using Payload's API for the batch import. You can typically ingest a few thousand records per minute if everything is set up correctly. Keep in mind, if your HTML files reference images or have links to other posts, you’ll have to manage those connections carefully during the migration. Testing with 1,000 records initially can help catch any issues before the big move.
Definitely parse those HTML files to JSON before importing! You’d save a lot of hassle. Also, make sure you’re sourcing all your media files correctly during the migration. It sounds like in most cases there aren’t links between posts, which is good—just make sure each HTML file is in its own directory with its assets. It’ll save you time if that holds true. But if things are inconsistent after 20 years of content, you're right to expect some headaches.
Honestly, just write a parser for those HTML files and migrate the data into a SQL database directly—it’s a one-hour job compared to a nightmare of dealing with separate files. You’re on the right track by planning to parse first. Parsing can be tedious, but once you get it down, the process becomes much smoother.
The size isn't really the problem; it's more about how your data is structured and what kind of queries you’ll be running. Just ensure your queries can utilize your indexed columns efficiently. Also, consider whether using an existing CMS suits your needs or if it might be better to create a custom solution. Sometimes, existing systems can complicate things if the data models don’t align.

Related Questions
How to Build a Custom GPT Journalist That Posts Directly to WordPress
Cloudflare Origin SSL Certificate Setup Guide
How To Effectively Monetize A Site With Ads