Total views : 8864

Confusion Matrix Analysis of Syllable-Like Unit Extracted from Hindi Continuous Speech


  • Maharaja Surajmal Institute of Technology, Affiliated to GGSIPU, Janak Puri − 110058, New Delhi, India


Objectives: Speech segmentation is important as a pre-processing task to improve the quality of TTS, and therefore an important field of research. The basic requirement is to build segmented and labeled speech corpora. The paper describes the database, the segmentation technique and confusion matrix analysis of segmentation of syllable structures occurring in the database. Methods: A set of 115 sentences, spoken by single male speaker chosen for providing information to passengers of Delhi metro rail are recorded. The analysis of speech database shows that syllable structures, namely, CV and CVC are most highly distributed covering 57.38% and 33.8% respectively and structures of CCVC, C C, CVCC ( where C=consonant, V=vowel and ‘ ’ represents nasalized- vowel sound)covers less than 2% in our database. The recorded speech sentences are segmented into syllable-like unit by using group-delay Algorithm. The segmentation technique has been tested with speech data of 704 tokens of syllables occurring in our database. The evaluation has been undertaken under two sections - the segmentation algorithm and the human perception approach. Findings: The quality of the segmentation evaluated using syllabic confusion matrix for various syllables structures demonstrate segmentation accuracy rate of 79%, 88.5%, for CV and CVC respectively. The overall accuracy of segmentation achieved on for metro rail passenger information system (MRPIS) task was 80.6%. Applications: The database be used in speech synthesizers and speech recognizers.


Database, Delhi Metro Rail Corporation, Group- Delay Algorithm, Metro Rail Passenger Information Corpus, Segmentation, Syllable

Full Text:

 |  (PDF views: 59)


  • EAGLES Spoken Language Working Group. Spoken Language Systems, 1994.
  • Balyan A, Agrawal SS, Dev A. Development of Database for Speech Synthesizer in Hindi Language using Festvox.Proceedings of Joint Conference of the ACL-NLP of the Asian Federation of Natural Language Processing, Singapore 2009, p. 1−4.
  • Arora K, Arora S, Verma K, Agrawal SS. Automatic Extraction of Phonetically Rich Sentences from Large Text Corpus of Indian Languages. Proceedings of Inter-speech, 2004, p. 2885−88.
  • Prasad VK, Nagarajan T, Murthy HA. Automatic Segmentation of Continuous Speech using Minimum Phase Group Delay Functions, Speech Communication. 2004 Apr; 42 (3):429−46.
  • Praat Doing Phonetics by Computer. Date accessed: 7/12/2016.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.