Total views : 284

Backend Tools for Speech Synthesis in Speech Processing


  • Electronics Engineering, Jain University, Bengaluru - 560069, Karnataka, India
  • Department of ECE, K.E.C, Kuppam - 517425, Andhra Pradesh, India
  • A.I.T., Tumkur - 572106, Karnataka, India


Speech synthesis is an artificial method of converting written text into machine generated speech. The Main Theme of this paper is to describe an overview of existing popular and advanced speech synthesizing techniques. Speech synthesis is a method of artificially generating the human speech from mathematical models and machines using some control parameters. The device which synthesizes is called as a synthesizer, the device which synthesizes is called as synthesizer and the synthesizer may be implemented as hardware or software. So for the research is not successful in generating a speech signal, usually researchers extract the speech parameters from the recorded speech and synthesize the original signal from it. A synthesizer can be viewed as Mathematical modeling of the vocal tract by extracting the acoustic/vocal features to produce the artificial generated speech output. The generated speech quality is described by its naturalness and intelligibility. The naturalness of synthesized speech gives an idea about how closely the generated output resembles the sounds produced by human speech production. Intelligibility describes how about the output can be understood by the listener after the perceiving it.


HMM, Modelling, Speech Synthesis, TTS, Vocal Tract

Full Text:

 |  (PDF views: 283)


  • Rainer LR. Applications of voice processing to telecommunications.Proc IEE. 1994; 82:199–228.
  • Allen J, Hunncutt S, Klatt DH. From text–to–speech: The MI talk Systems. Cambridge: Cambridge University Press; 1987.
  • Hon HW, Acero A, Huang X, Liu J, Plump M. Automatic generation of synthesis units for trainable text-to-speech systems. Proceedings of the IEEE International Conference on Acousitics, Speech and Signal Processing; 1998.
  • Styger T, Keller E. Format synthesis. Fundamental of Speech Synthesis and Speech Recognition; basic concept. State of the Art and Future Challenges; p. 109–28.
  • Klatt DH. Review of text–to-speech conversion for English.Journal of the Acoustical Society of America. 1987; 82(3):737–93.
  • Kroger B. Minimal rules for articulatory speech synthesis.Proceedings of EUSIPCO92; 1992. p, 331–4.
  • Stylianao Y. Modeling speech based on the harmoni plus noise models. Springes; 2005.
  • Tucodo K. et al. Hidden semi-marrov model based speech synthesis. Inter Speech; 2004. p. 1185–90.
  • Moulines E, Charpertier F. Pitch synchronous waveform processing techniques for text-to–speech synthesis using diphones. Speech Communications. 1990; 9:453–67.
  • O’Saughnessy D. Speech synthesis. Speech Communications – Human and Machine. Hyderabad: University Press (India) Limited; 2001.
  • Lemmetty S. Review of speech synthesis technology[Master’s Thesis]. Helsinki, Finland: Dept of Electrical and Communication Engineering, Helsinki University of Technology; 1999 Mar.
  • Klatt DH. Review of text-to-speech conversion for English.J Acoust Soc Am. 1987 May; 82(3):737–93.
  • Hanson HM, Stevens KN. A quasi-articulatory approach to controlling acoustic source parameters in a Klatt-type formant synthesizer using HLsyn. J Acoust Soc Am. 2002 May; 112(3):1158–82.
  • Bickley C, Bruckert E. Improvements in the voice quality of DECtalk®. Proc of 2002 IEEE Workshop on Speech Synthesis; Santa Monica, California. 2002 Sep. p. 55–8.
  • Stevens KN. Toward formant synthesis with articulatory controls. Proc of 2002 IEEE Workshop on Speech Synthesis; Santa Monica, California. 2002 Sep. p. 67–72.
  • Atkinson JE. Correlation analysis of the physiological factors controlling fundamental voice frequency. J Acoust Soc Am. 1978; 63(1):211–22.
  • Badin P. Acoustics of voiceless fricatives: Production theory and data. Quarterly Progress and Status Report of Dept of Speech, Music and Hearing; KTH Computer Science and Comm, Stockholm, Sweden. 1989.
  • Baer T, Gore GC, Gracco LC, Nye PW. Analysis of vocal tract shape and dimensions using magnetic resonance imaging: Vowels. J Acoust Soc Am. 1991; 90(2):799–828.
  • Klatt DH. Software for a cascade/parallel formant synthesizer.J Acoustic Soc America. 1980; 67:971–5.
  • Klatt DH. Review of text to speech conversion for English. J Acoust Soc America. 1987; 82:737–45.
  • Scanghnessy O, Diphone speech synthesis. Speech Communication. 1988; 7:55–65.
  • Sondhi MM, Schoeter J. A hybrid time frequency domain articulatory speech synthesiser. IEEE Transactions ASSP.1987; 35:955–67.
  • Paliwal KK, Rao PVS. A synthesis based method for pitch extraction. Speech Communication. 1983; 2:37–45.
  • Greenwood AR. Articulatory speech synthesis using diphone units. IEEE International Conference on Acoustics, Speech and Signal Processing; 1997. p. 1635–8.
  • Donovan RE, Eide EM. The IBM trainable speech synthesis system. International Conference on Spoken Language Processing, Sydney. 1998; 5:1703–6.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.