I'm working for the Italian government, and we have a huge backlog of unorganized police reports that make it impossible to find what I'm looking for. I'm considering proposing a solution to my supervisor that involves scanning all these reports into a massive PDF and converting the images to searchable text. Is this feasible? What programs would be best for text recognition? Just to clarify, the documents mainly have printed dates and protocol numbers, and while there's some handwriting, I'm only focused on the printed text for searching purposes. Any advice would be appreciated!
5 Answers
I'd recommend looking into Kofax software along with some reliable scanner hardware. They specialize in document management and can streamline the entire process.
DocuWare is another solution that might suit your needs for organizing and searching through documents.
You definitely want to use OCR (Optical Character Recognition) for this. Think about how you plan to store the data after scanning. Are you looking at searchable PDFs, or would Word documents work better for you? I would lean towards searchable PDFs myself, especially for those historical police reports.
The simplest route is to scan directly into full text PDF. No need for additional conversions. Just check with your scanner provider or IT team on this!
Make sure to confirm if the reports are handwritten, as that will complicate things significantly in terms of OCR accuracy. Just from what you’ve said, recognizing printed text should be manageable, but handwritten parts could throw a wrench in the works.
We're really only interested in those printed items like dates and protocol numbers, so I think we can manage without focusing on the handwriting.
Just a heads up, when using OCR, there might be some errors during conversion from image to text. It works much better with printed text, but do keep in mind that extra spaces or mistakes can happen. But from what you've described, it sounds totally doable!
As long as you're just focusing on protocol numbers and dates, the printed elements should come through much clearer for searching!
For your needs, I think searchable PDFs would be the safest bet. They retain the look of the reports while still allowing for quick searches.