I'm working on a large project that will manage thousands of documents, with user permissions set for uploads and individual access based on roles. I'm curious about the best technologies for managing such a vast amount of documents efficiently, especially regarding tagging and metadata to ensure quick performance. The documents will vary in size, from small files to large PDFs with many pages, and I'll need to generate thumbnails for each. I've looked into Paperless-ngx, but it seems primarily geared towards personal use. Are there better options or architectural patterns for document management at this scale?
5 Answers
You might explore enterprise options like MayanEDMS, which is designed for larger setups, or you could look into OpenText Content Server for archival needs, although it might hit the pricier side.
Paperless-ngx is more for personal setups. For your project, a custom solution would be ideal as it gives you better control over performance and user management.
For the kind of scale you're talking about, AWS S3 is widely recommended due to its reliability and cost efficiency, though some argue it's pricey. You could consider other options too—BackBlaze is popular for being cheaper and efficient.
When it comes to storage, think about how large your documents will be. Having a range from 2-600 MB means you need ample disk space. Fast access is crucial, so organizing them based on usage frequency might be a good strategy.
You’ll want to consider document types—are they all text, or do they include images and charts? For searching documents, ElasticSearch is a solid choice because it handles various formats well. For storage, something like AWS S3 or BackBlaze B2 would be appropriate as they are designed for large data volumes.
Related Questions
Cloudflare Origin SSL Certificate Setup Guide
How To Effectively Monetize A Site With Ads