Total views : 186

Speaker Identification using a Novel Prosody with Fuzzy based Hierarchical Decision Tree Approach


  • Department of Computer Science, PSG College of Arts and Science, Coimbatore – 641015, Tamil Nadu, India
  • Department of Computer Science, Bharathiar University, Coimbatore – 641046, India


Objectives: The proposed speaker identification using a novel prosody with fuzzy based hierarchical decision tree approach and is used to modifying the limitations of existing traditional methods. It improves the performance of speaker identification in given population under noisy environments. Methods/Statistics: The key idea of this approach is to achieve an enhanced efficiency speaker cluster group using prosody features with fuzzy clustering at each level to construct the hierarchical decision tree. At each level, a speaker at belong to same groups are constructed. The proposed method has novelty of prosody as pitch and loudness with fuzzy clustering are used. Findings: An experimental result shows that the proposed model using prosody features outperforms the efficiency of speaker accuracy rate of 93.75 when compare to vocal source accuracy rate of 81.25 under noisy environments. Applications: Gender and age identification, banking and smart voice based technology operation.


Fuzzy Clustering, Large Population Speaker Identification, Prosody Feature Extraction, Prosody with Fuzzy based Hierarchical Decision Tree.

Full Text:

 |  (PDF views: 166)


  • Reynolds D, Rose R. Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans Speech Audio Process. 1995 Jan; 3(1):72–83.
  • Makhoul J. Linear prediction: A tutorial review. Proceedings of IEEE; 1975 Apr. p. 561–80.
  • Reynolds D. Large population speaker identification using clean and telephone speech. IEEE Signal Process Lett. 1995 Mar; 2(3):46–8.
  • Reynolds D. Speaker identification and verification using Gaussian mixture speaker models. Speech Commun. 1995; 17(1–2):91–108.
  • Pellom B. Hansen J. An efficient scoring algorithm for Gaussian mixture model based speaker identification. IEEE Signal Process Lett. 1998 Nov; 5(11):281–4.
  • Baraldi A, Blonda P. A survey of fuzzy clustering algorithms for pattern recognition. IEEE Trans Syst, Man, Cybern, B: Cybern. 1999 Dec; 29(6):778–5.
  • Reynolds D, Quatieri T, Dunn R. Speaker verification using adapted Gaussian mixture models. Digital Signal Process. 2000; 10(1–3):19–41.
  • Wang C. Prosodic modeling for improved speech recognition and understanding [PhD dissertation]. Cambridge, MA: Mass. Inst. of Technol.; 2001.
  • Ezzaidi H, Rouat J, O’Shaughnessy D. Towards combining pitch and MFCC for speaker identification systems. Proceedings of 7th European Conference on Speech Communication and Technology; 2001.
  • Spoken Language Processing. In: X. Huang, et al, Editors. Upper Saddle River, NJ: Prentice-Hall; 2001.
  • Chaudhari U, Navrratil J, Ramaswamy G, Maes S. Very large population text-independent speaker identification using transformation enhanced multi-grained models. Proceedings IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP’01); 2001. p. 461–4.
  • De Cheveigne A, Kawahara H. Yin, a fundamental frequency estimator for speech and music. J Acoust Soc Amer. 2002; 111:1917.
  • Xiong Z, Zheng T, Song Z, Soong F, Wu W. A tree-based kernel selection approach to efficient Gaussian mixture model-universal background model based speaker identification. Speech Commun. 2006; 48(10):1273–82.
  • Kenny P, Boulianne G, Ouellet P, Dumouchel P. Joint factor analysis versus eigenchannels in speaker recognition. IEEE Trans Audio, Speech, Lang Process. 2007 May; 15(4):1435–47.
  • Hosseinzadeh D, Krishnan S. Combining vocal source and MFCC features for enhanced speaker recognition performance using GMMS. IEEE 9th Workshop Multimedia Signal Process (MMSP’07); 2007. p. 365–8.
  • Narvaez L, Perez J, Garcia C, Chi V. Designing 802.11 WLANS for VOIP and data. IJCSNS. 2007; 7(7):248.
  • Brummer N, Burget L, Cernocky J, Glembek O, Grezl F, Karafiat M, van Leeuwen D, Matejka P, Schwarz P, Strasheim A. Fusion of heterogeneous speaker recognition systems in the STBU submission for the NIST speaker recognition evaluation 2006. IEEE Trans Audio, Speech, Lang. Process. 2007 Sep; 15(7):2072–84.
  • Grimaldi M, Cummins F. Speaker identification using instantaneous frequencies. IEEE Trans Audio, Speech, Lang Process. 2008 Nov; 16(6):1097–111.
  • Apsingekar V, De Leon P. Speaker model clustering for efficient speaker identification in large population applications. IEEE Trans Audio, Speech, Language Process. 2009 May; 17(4):848–53.
  • Sarkar A, Rath S, Umesh S. Fast approach to speaker identification for large population using MLLR and sufficient statistics. Proceedings of National Conference on Communication (NCC); 2010. p. 1–5.
  • Togneri R, Pullella D. An overview of speaker identification: Accuracy and robustness issues. IEEE Circuits Syst Mag. 2011; 11(2):23–61.
  • Dehak N, Kenny P, Dehak R, Dumouchel P, Ouellet P. Front-end factor analysis for speaker verification. IEEE Trans Audio, Speech, Lang Process. 2011 May; 19(4):788–98.
  • Hu Y, Wu D, Nucci A. Pitch-based gender identification with two-stage classification. Security Commun Netw. 2011.
  • Wang N, Ching P, Zheng N, Lee T. Robust speaker recognition using denoised vocal source and vocal tract features. IEEE Trans Audio, Speech, Lang Process. 2011 Jan; 19(1):196–205.
  • Diez M, Penagarikano M, Varona A, Rodriguez-Fuentes L, Bordel G. On the use of dot scoring for speaker diarization. Pattern Recogn and Image Anal. 2011:612–9.
  • Nakagawa S, Wang L, Ohtsuka S. Speaker characterization and recognition-speaker identification and verification by combining MFCC and phase information. IEEE Trans Audio, Speech, Lang Process. 2012 May; 20(4):1085.
  • Hu Y, Wu D, Nucci A. Fuzzy-clustering-based decision tree approach for large population speaker identification. IEEE Transactions on Audio, Speech and Language Processing. 2013 Apr; 21(4).
  • Chandra E, Manikandan K, Sivasankar M. A proportional study on feature extraction method in automatic speech recognition system. IJIREEICE. 2014 Jan; 2(1).
  • Subhashree R, Rathna GN. Speech emotion recognition: Performance analysis based on fused algorithms and GMM modeling. Indian Journal of Science and Technology. 2016 Mar; 9(11).Doi no:10.17485/ijst/2016/v9i11/88460
  • Chithra PL, Aparna R. Performance analysis of windowing techniques in automatic speech signal segmentation. Indian Journal of Science and Technology. 2015 Nov; 8(29).Doi no:10.17485/ijst/2015/v8i29/83616
  • Sajeer K, Rodrigues P. Novel approach of implementing speech recognition using neural networks for information retrieval. Indian Journal of Science and Technology. 2015 Dec; 8(33).Doi no:10.17485/ijst/2015/v8i33/81115
  • Kim D-I, Kim B-C. Speech recognition using hidden markov models in embedded platform. Indian Journal of Science and Technology. 2015 Dec; 8(34).Doi no: 10.17485/ijst/2015/v8i34/85039


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.