Total views : 2224

Speaker Adaptation on Hidden Markov Model using MFCC and Rasta-PLP and Comparative Study

Affiliations

  • KIIT College of Engineering, Gurgaon - 122102, Haryana, India
  • Ansal University, Gurgaon - 122003, Haryana, India

Abstract


This work compares the performance of the Mel-Frequency Cepstral Coefficient (MFCC) and Perceptual Linear Prediction (PLP) features for developing a text-dependent speaker identification system. Continuously spoken Hindi speech sentences have been used to train the HMM models using HTK toolkit for each speaker separately. The experiments have been performed using a set of 200 continuously spoken sentences with vocabulary of 20000 isolated words using a database of 100 speakers. The results show an accuracy of 92.26% recognition when PLP features have been used and accuracy of 91.18% for MFCC features. A confusion matrix has been created for all the 20 test speakers based on the recognition scores obtained for each of these speakers and their confusion with other speakers. Performance has been compared in the closed set and open set conditions of testing and as it is expected, the performance in the closed set condition is far better than the open set. We propose that if PLP features are used in place of MFCC, they may provide improvement in speaker identification accuracy by reducing the cases of false acceptance.

Keywords

Hindi Speech, HMM, MFCC, RASTA-PLP, Speaker Identification.

Full Text:

 |  (PDF views: 278)

References


  • Rabiner LR, Juang BH. Fundamentals of speech recognition. PTR Prentice-Hall; 1993.
  • Agrawal SS, Bansal S, Pandey D, Tayal H. A Hidden Markov Model (HMM) based speaker identification system using mobile phone database of NATO (National Atlantic Treaty Organization) Words; ICA-Montreal, Canada. 2013.
  • Yuan J, Liberman M. Speaker identification on the Scotus Corpus. Available from: http://www.ling.upenn. edu/~jiahong/publications/c09.pdf
  • Stern RM, Ponce P, Singh R. Feature extraction for robust automatic speech recognition using synchrony of zero crossings. Available from: http://citeseerx.ist.psu.edu/viewdoc/ download?doi=10.1.1.407.1507&r ep=rep1&type=pdf
  • HTK Hidden Markov Toolkit Version 1.4 Manual. Cambridge University Engineering Department, Speech Group; Cambridge. 1992.
  • Viterbi algorithm. Available from: http://en.wikipedia.org/ wiki/Viterbi_algorithm
  • Reddy V, Das PK. A HMM based text-prompted speaker verification system using HTK. Oriental COCOSDA; 2004 Nov 17-19. p. 313–6.
  • Shrawankar U, Thakare V. Techniques for feature extraction in speech recognition system: A comparative study. Available from: http://arxiv.org/ftp/arxiv/papers/1305/ 1305.1145.pdf
  • Skowronski MD, Harris JG. Human factor cepstral coefficients. Cancun, Mexico; 2002 Dec.
  • Aida–Zade KR, Ardil C, Rustamov SS. Investigation of combined use of MFCC and LPC features in speech recognition systems. Proceedings of World Academy of Science, Engineering and Technology; 2006 May; 13. ISSN 13076884.
  • Hermansky H. Perceptual Linear Predictive (PLP) analysis of speech. J Acous L Soc Am. 1990 Apr; 87(4).
  • Hermansky H, Morgan N, Bayya A, Kohn P. RASTAPLP speech analysis. Technical Report (TR-91-069). International Computer Science Institute; Berkeley, CA. 1991.
  • Davis S, Mermelstein P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. Published in Processing, IEEE Transactions on Acoustics, Speech and Signal. 1980 Aug; 28( 4):357–66.
  • Gauvain JL, Lamel LF, Adda G, Adda- Decker M. (1994): Speaker-independent continues speech dictation. Speech Communication. 1994; 15:21–37.
  • Gupta R. Speech recognition for Hindi. M. Tech. Project Report. Department of Computer Science and Engineering, Mumbai, India: Indian Institute of Technology Bombay; 2006.
  • Kral P, Jezek K, Jedlicka P. Evaluation of automatic speaker recognition approaches. WESPAC X; Beijing, China. 2009.
  • Petr Motlicek. Feature extraction in speech coding and recognition. Report of PhD Research Internship in ASP Group, OGI-OHSU; 2001/2002
  • Useful links. 1. http://www.nist.gov/speech, 2. http:// www. voxforge.ac.in, 3. http:// www.phon.ucl.ac.uk

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.