Tesseract or AI for Transcribing Menus: What Should I Use?

0
6
Asked By TechExplorer99 On

Hey everyone! I'm an IT student working on a project to transcribe a restaurant's menu into a JSON file. I've been using an AI called Healer Alpha, which is free but consumes a lot of tokens (around 6000 to 9000 per request). I've heard that uploading the menu file to the database first could help with this. However, I've also tried using OCR, but the results were disappointing. I'm looking for recommendations or suggestions on whether I'm using Tesseract incorrectly (I uploaded the image to the website) or any better alternatives that could help with this task. P.S. English isn't my first language, so I apologize if I'm not expressing myself as clearly as I'd like!

3 Answers

Answered By AIenthusiast2023 On

Have you tried using ChatGPT? It could help with transcribing, though the models aren't free. I understand your frustration with that token count after uploading the image to the database, it can be pretty annoying!

TechExplorer99 -

That’s true, but I really want to avoid running into those token issues with the models!

Answered By AskMeAnything_12 On

You could also post your question on Stack Overflow. There might be some insightful replies there!

TechExplorer99 -

Sounds like a good plan! I’ll share the link here once I do.

Answered By WebDevWiz84 On

Have you considered using BeautifulSoup? It’s a great tool for scraping HTML. Instead of going the OCR route, you might be able to directly grab the text or HTML from the restaurant's website.

CuriousCoder21 -

Which HTML? I’ve never heard of BeautifulSoup, but I’ll check it out!

FoodieFinder37 -

Just a heads up, if the PDF you have is a raster image, it might not actually contain any text at all. For your project, I’d start with text or HTML extraction, then OCR, and last resort with AI if needed.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.