Total views : 383
Automatic Speech Recognition of Pathological Voice
Abstract
Background/Objectives: Automatic speech recognition (ASR) benefits human beings in many useful applications. Various ASR systems exhibiting good performance have been developed for normal speakers. The speech produced by a voice disordered patient is not like a normal speaker due to irregular vibration and incomplete closure of vocal fold. Therefore, an investigation is required by exploring the different speech features to develop an ASR system which can perform well for both pathological and normal speakers. Methods: In this paper, we proposed an automatic speech recognition system using Hidden Markov Model Toolkit (HTK) for normal and pathology voice. Four techniques are applied for feature extraction; Mel Frequency Cepstral Coefficient (MFCC), Perceptual Linear Prediction (PLP), RelAtiveSpecTrA - Perceptual Linear Predictive (RASTA-PLP), and linear prediction coefficients (LPC). The database that used to evaluate the performance of the developed system; includes a total of 297 speakers 121 of them were normal speakers and the remaining containing five types of vocal fold disorders. Findings: Experimental results show that the developed system gives good accuracies for normal and pathology voice. The highest accuracy of 94.44 % with a word error rate 5.55% is achieved in case of normal voice, and 88.63 % with a word error rate 11.63 % in case of pathology voice. Fuzzy logic controller is proposed to automatically segmentation the normal and disorders voice.
Keywords
Automatic Speech Recognition, Fuzzy Logic Control, HTK, Voice Pathology.
References
- Kumar K, Aggarwal R. Department of Computer Enginerring, National Institute of Technology, Kurukshetra: Hindi speech recognition system using HTK. Int J Comput Bus Res. 2011; 2. ISSN (Online).
- Kumar K, Aggarwal R, Jain A. A Hindi speech recognition system for connected words using HTK. International Journal of Computational Systems Engineering. 2012; 1(1):25-32.
- Al-Qatab BA, Ainon RN, editors. Arabic speech recognition using hidden Markov model toolkit (HTK). IEEE, 2010 International Symposium in Information Technology (ITSim). 2010.
- Tripathy S, Baranwal N, Nandi GC, editors. A MFCC based Hindi speech recognition technique using HTK Toolkit. 2013 IEEE Second International Conference on Image Information Processing (ICIIP). IEEE, 2013.
- Lee Y, Hwang K-W. Selecting good speech features for recognition. ETRI Journal. 1996; 18(1):29-40.
- Anusuya M, Katti S. Comparison of different speech feature extraction techniques with and without wavelet transform to Kannada speech recognition. International Journal of Computer Applications. 2011; 26(4):19-24.
- Saenz-Lechon N, Godino-Llorente JI, Osma-Ruiz V, Gomez-Vilda P. Methodological issues in the development of automatic systems for voice pathology detection. Biomedical Signal Processing and Control. 2006; 1(2):120-8.
- Han W, Chan C-F, Choy C-S, Pun K-P, editors. An efficient MFCC extraction method in speech recognition. 2006 ISCAS 2006 Proceedings 2006 IEEE International Symposium on Circuits and Systems. IEEE, 2006.
- Borde P, Varpe A, Manza R, Yannawar P. Recognition of isolated words using Zernike and MFCC features for audio visual speech recognition. International Journal of Speech Technology. 2015; 18(2):167-75.
- Hai NT, Van Thuyen N, Mai TT, Van Toi V, editors. MFCC-DTW Algorithm for Speech Recognition in an Intelligent Wheelchair. Springer: 5th International Conference on Biomedical Engineering in Vietnam. 2015.
- Shrawankar U, Thakare VM. Techniques for feature extraction in speech recognition system: A comparative study. arXiv preprint arXiv:13051145. 2013.
- Psutka JV. Comparison of MFCC and PLP parameterizations in the speaker independent continuous speech recognition task. In: 2001 E, editor. Eurospeech 2001; Scandinavia 2001. p. 1813-6.
- Ramljak M, Stella M, Saric M, editors. Front-End Signal Processing for Speech Recognition. 1st International Conference on Wireless and Mobile Communication Systems (WMCS’13). 2013.
- Saudi ASM, Youssif AA, Ghalwash AZ. Computer aided recognition of vocal folds disorders by means of RASTA-PLP. Computer and Information Science. 2012; 5(2):39.
- Alsulaiman M, Muhammad G, Ali Z, editors. Classification of Vocal Fold Diseases Using RASTA-PLP. Proceeding of the 2013 International Conference on Bioinformatics and Computational Biology, (BIOCOMP’13). 2013.
- Ma J, Cole R, Pellom B, Ward W, Wise B. Accurate automatic visible speech synthesis of arbitrary 3D models based on concatenation of diviseme motion capture data. Computer Animation and Virtual Worlds. 2004;15(5):485-500.
- Musti U, Toutios A, Ouni S, Colotte V, Wrobel-Dautcourt B, Berger M-O, editors. Hmm-based automatic visual speech segmentation using facial data. Interspeech 2010. 2010.
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution 3.0 License.










