Looking for a Universal Transliteration Library

0
11
Asked By CreativeCactus88 On

I'm working on a fun project involving an AI model that can recognize the language of words and sentences. I've got everything sorted out except for the transliteration aspect. I want to keep the original script so I can easily identify which language a word belongs to. However, when it comes to converting input into a cleaned, romanized format (without spaces), I'm facing challenges with existing libraries. The unidecode library does what it's supposed to, but it poorly handles vowels in Indic and Arabic scripts. I also tried using aksharamukha, which is great for Semitic languages but lacks support for Asian scripts. Plus, I need a library that detects scripts automatically without requiring me to specify the original script each time. In summary, I'm on the hunt for a comprehensive transliteration library that can cover all major scripts and convert them into Latin script seamlessly.

2 Answers

Answered By TransliterationExpert77 On

Have you checked out translitcodec? I've been using it for over a decade, but I'm not sure if it'll fit your needs. It might give you what you’re looking for, though!

CuriousCoder45 -

Sounds interesting! What about it doesn't fit my use case?

LanguageLover33 -

I remember that one stopped working with the newer versions of Python a while back, unfortunately.

Answered By LinguisticGuru22 On

Transliteration can be tricky, especially since different languages can have unique transliteration rules. It's tough to find a one-size-fits-all solution. Hebrew, Cyrillic languages, and CJK characters all have their quirks. Even within languages like Chinese, you have to decide between Mandarin and Cantonese pronunciations!

TriviaWizard99 -

I thought unidecode might work for Asian languages since it's quite decent, but combining it with aksharamukha for the others would be a solid strategy.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.