An Analysis of Automatic Phone Recognition and Identification of a Few Languages from North Eastern India


  • Department of Electronics and Communication Engineering, North Eastern Hill University, Shillong – 793022, Meghalaya, India


Objective: Phones are basic sound units available in the spoken data. Languages differ among themselves due to use of different phone sets. This paper analyzes some aspects of automatic phone recognition and subsequent identification of a few languages from North Eastern region of India using Phonetic Engine (PE). Methods and Statistical Analysis: PE is a system which converts a speech sample into some symbolic form so that these symbols are capable of capturing all the information carried by the speech sample. In the development of PEs, the International Phonetic Alphabet (IPA) symbols are used in the data transcription process. In modelling the phonetic units Hidden Markov Models (HMM) have been used in the training phase. These trained HMMs are then used in phone recognitions leading to the identification of language(s) of unknown test utterances. Findings: PEs are built for three Indian Languages and one dialect, namely Manipuri, Assamese and Bengali and the Kakching dialect of Manipuri. These languages are widely spoken across the North Eastern region of India. The overall phone recognition accuracies reported by the PEs for the above selected languages are 62:11% for standard Manipuri language, 59:0% for Kakching dialect of Manipuri, 43:28% for standard Assamese and 48:58% for Bengali language. Application: Automatic LID is possible using a set of PEs in testing unknown utterances due to the language bias of these systems. Various level of identification rates reported in some LID tasks carried out with PEs are discussed here to make an analysis of the issues belonging to it.


Hidden Markov Models (HMM), International Phonetic Alphabet (IPA), Language Dependency, Language Identification (LID), Phone Recognition, Phonetic Engine (PE).

