I'm in the process of designing a web-based OCR system that will handle document uploads and manage OCR results. I need to set up the frontend, backend, database, and deployment environment. There will be two types of users: general users who upload documents and view the OCR results, and admins who will manage users and documents. I'm dealing with five types of documents, where two have different layouts requiring OCR for specific details like names and document types, while another follows a two-column key-value format (e.g., 'First Name: John') that should allow manual corrections of OCR results. I'm leaning towards using React.js with shadcn/ui for the frontend, as I'm most familiar with that. For the backend, I'm considering FastAPI for handling file uploads, authentication, and OCR processing, potentially using PaddleOCR. I have a few questions: Is React.js with shadcn/ui a suitable choice, or does Next.js offer distinct advantages? Is FastAPI good for an OCR-heavy workflow? Are there any known issues with deploying Next.js or React alongside FastAPI? And what database would be best for storing user info, document metadata, OCR results, and any corrections? I want to avoid any architectural mistakes that could hinder scaling or deployment. Thanks!
2 Answers
I've worked on several OCR projects for clients using AWS Textract. I typically upload documents to S3 buckets for processing. It streamlines things and takes care of some heavy lifting for you.
Just a heads up from my recent experience with OCR: large language models (LLMs) can be surprisingly effective for this. They have shown excellent performance, even on sometimes difficult text.
What LLM did you use for your project? Did you need to fine-tune it?

Did you have to fine-tune or train the model for specific documents?