Transliteration of Named Entity: Bengali and English as Case Study

Authors

Asif Ekbal

Sivaji Bandyopadhyay

Track:

All Papers

Downloads:

Abstract:

This paper presents a modified joint-source channel model that is used to transliterate a Named Entity (NE) of the source language to the target language and vice-versa. As a case study, Bengali and English have been chosen as the possible source and target language pair. A number of alternatives to the modified joint-source channel model have been considered also. The Bengali NE is divided into Transliteration Units (TU) with patterns C+M, where C represents a consonant or a vowel or a conjunct and M represents the vowel modifier or matra. An English NE is divided into TUs with patterns C*V*, where C represents a consonant and V represents a vowel. The system learns mappings automatically from the bilingual training sets of person and location names. Aligned transliteration units along with their contexts are automatically derived from these bilingual training sets to generate the collocational statistics. The system also considers the linguistic features in the form of possible conjuncts and diphthongs in Bengali and their corresponding representations in English. Experimental results of the 10-fold open tests demonstrated that the modified joint source-channel model performs best during Bengali to English transliteration with a Word Agreement Ratio (WAR) of 74.4% for person names, 72.6% for location names and a Transliteration Unit Agreement Ratio (TUAR) of 91.7% for person names, 89.3% for location names. The same model has demonstrated a WAR of 72.3% for person names, 70.5% for location names and a TUAR of 90.8% for person names, 87.6% for location names during back transliteration.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.