I recently started using Azure Document Intelligence and I'm impressed with how accurately it extracts data from tables in PDFs and Excel files. However, the issue I'm facing is that the resulting JSON output is incredibly large, around 30,000 lines, and contains a lot of unnecessary fields. Ideally, I want a JSON that focuses solely on the table data without losing the column relationships. Does anyone have solutions or suggestions for handling this?
2 Answers
You might want to consider training a custom model instead. Tailoring it to extract just the fields you need can improve accuracy significantly. Let me know if you need any guidance!
It sounds like the service is giving you separate responses for each table in the JSON. You can definitely parse it to get the table data you need. It's a bit of extra work, but it should help you get to the specifics you want!

Related Questions
Neural Network Simulation Tool
xAI Grok Token Calculator
DeepSeek Token Calculator
Google Gemini Token Calculator
Meta LLaMA Token Calculator
OpenAI Token Calculator