I'm looking to convert informal Persian user inputs that represent monetary amounts into a single integer value in rials using Python. The inputs can be quite varied and may include Persian number words (like 'یک' or 'پنجاه'), informal spellings (like 'یه' or 'پونسد'), mixed digits and words, magnitude units (like 'هزار', 'میلیون', 'میلیارد'), and even mixed currency units (like 'تومان' and 'ریال') all in one sentence. My goal is to correctly parse this text and compute the numeric value rather than just extracting the digits. For example, how would I handle phrases like 'صد و پنجاه و دو تومان' and convert that into an integer like '1520000 ریال'? What's the best approach or any existing libraries that can help with this?
3 Answers
Using a large language model (LLM) could be a decent approach if it can interpret the context well, but they’re not great with precise calculations or consistent rule applications. I tried it with some models, and they didn't yield high accuracy for numeric conversions—averaging only 40-62%. It might be better to stick with a more structured algorithm.
You might want to start with a dictionary of key-value pairs that can map those Persian number words to their respective numeric values. This approach can help in parsing out the amounts accurately. You can also consider using regular expressions to match common patterns in the input to streamline the extraction process.
Instead of relying solely on LLMs, check if there are specific NLP libraries that focus on Persian language processing. Libraries like Hazm could assist in understanding and converting colloquial Persian texts into structured data format.

Related Questions
How To: Running Codex CLI on Windows with Azure OpenAI
Set Wordpress Featured Image Using Javascript
How To Fix PHP Random Being The Same
Why no WebP Support with Wordpress
Replace Wordpress Cron With Linux Cron
Customize Yoast Canonical URL Programmatically