I'm curious if anyone has experience with comparing two unstructured documents, like a Purchase Order and an Invoice. I know that we can extract text from both documents, but I'm trying to figure out if it's technically feasible to compare the data within them and spot any discrepancies. Any insights or experiences with this?
6 Answers
You should check out this solution accelerator on GitHub that combines Document Intelligence and Azure OpenAI: [GitHub Repo](https://github.com/microsoft/azurechat). It could give you a good starting point for your comparison task.
You can definitely tackle this with a mix of Document Intelligence and the Azure OpenAI service. The idea is to extract the document data using Document Intelligence first, then use something like GPT-4 on Azure to compare the contents. If coding isn't your thing, you could also consider orchestrating the process with Logic Apps. Here's a tutorial that might help you get started: [Logic Apps Tutorial](https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/tutorial/logic-apps?view=doc-intel-4.0.0&pivots=workflow-onedrive). Let me know if you have questions—I’m happy to help!
Absolutely, it’s possible using SearchAI Assist! Here’s a link for more info: [SearchAI Assist](https://www.searchblox.com/products/searchai-assist). It should help you get started with comparing your PDFs.
Have you looked into the document comparison feature in Egnyte? It allows you to compare different versions of a file or even two separate files to highlight differences. This might be exactly what you need for your use case! Here's a couple of links: [Using Document Comparison](https://egnyte-university.egnyte.com/using-document-comparison-159/2153922) and a guide on the feature: [Document Comparison Help](https://helpdesk.egnyte.com/hc/en-us/articles/29138059200141-Document-Comparison). Let me know if you're interested, and we can discuss this further!
Thanks for the tips! I'll definitely check out Egnyte.
Honestly, the short answer might be no. To effectively compare those documents, you'd have to create a program that can truly understand the content, and even then, it might not be 100% reliable. Just something to consider!
You might want to look into the new agent orchestration feature in Foundry. It allows you to have two agents extract data, one for each document type, and then a third agent to compare the extracted data. It sounds promising for your needs!
That sounds like a solid approach! I might check out Logic Apps for my project as well. Thanks!