How to Handle Repeating Groups in Azure Document Intelligence?

0
6
Asked By RandomUser983 On

I'm working with Azure Document Intelligence to extract data from long scanned documents that contain sequences of letters. Each letter can vary in length, from a half page to several pages, and they can start or end anywhere on the page. Some pages have multiple letters while others might have parts of a letter that connects to another. Each letter has a similar structure, with certain consistent fields and others that are optional or may span across pages.

I've heard about something called "repeating groups" in this service. However, I've been told to use dynamic tables instead, but I find that approach lacking. While some suggest pre-processing or post-processing, pre-processing isn't an option for me since I can't read the content of scanned images. I can do post-processing, but the letters' layout is quite fluid, making it tricky to identify the parts accurately without AI assistance. Can anyone help with this?

2 Answers

Answered By CuriousCoder81 On

I can definitely relate to the struggle with fluid document structures. Post-processing can be complex if the layout changes often. Have you considered breaking the documents into smaller segments beforehand? That might help in identifying and isolating those repeating groups better.

RandomUser983 -

Yeah, I've thought about splitting them up, but then I run into issues with context. Each letter connects to the previous one in some ways, so it's tough to manage.

Answered By TechSavvy25 On

Have you tried the new Azure AI Content Understanding service? It might be more suitable for your scenario compared to the Document Intelligence service. It's designed to extract data from various formats, including PDFs. I found it impressive in my tests and I'm hopeful it will improve even further once it's generally available, though it does have some limitations right now.

DocumentDude33 -

Interesting! I haven't used that service yet. So, it can extract content from PDFs? What do you think are its main advantages over Document Intelligence?

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.