Total views : 374

Lip Detection and Lip Geometric Feature Extraction using Constrained Local Model for Spoken Language Identification using Visual Speech Recognition

Affiliations

  • Department of Information Technology Engineering, MET’s Institute of Engineering, Nashik - 422207, Maharashtra, India
  • Department of IT Engineering, SSBT’s College of Engineering and Technology Bhambori, Jalgaon – 425001, Maharashtra, India

Abstract


Background/Objectives: The aim of our research is to guess the language of spoken utterance by using the cues from visual speech recognition i.e. from movement of lips. The first step towards this task is to detect lips form face image and then to extract various geometric features of lip shape in order to guess the utterance. Methods/Statistical Analysis: This paper presents the methodology for detecting lips from face images using constrained local model (CLM) and then extracting the geometric features of lip shape. The two steps involved in lip detection are CLM model building and CLM search. For extracting lip geometric features, twenty feature points are defined on lips and lip height, width, area are defined using these twenty feature points. Findings: CLM model is build using images from FGnet Talking face video database and tested using images from FGnet Talking face video database and also using other images. The detection accuracy is more for FGnet images as compare to other images. Feature vector defining the lip shape consists of geometric parameters like height, width and area of inner and outer lip contours. Feature vector is calculated for all test images after detecting lips from face image. So the error in detecting lips leads to the error in feature vector. This indicates the speaker dependency of visual speech recognition systems. Application/Improvements: The proposed approach is useful in visual speech recognition for lip detection and feature extraction. Minimizing the speaker dependency and generalizing the approach should be considered for further improvements.

Keywords

CLM, Lip Detection, Language Identification, Visual Speech.

Full Text:

 |  (PDF views: 561)

References


  • Jacob L. Newman and Stephen J. Cox. Language Identification Using Visual Features. IEEE Transactions on audio, speech, and language processing. 2012 Sep; 20(7):1936–7.
  • Kale N, Bhadade US. An overview of spoken language Identification using Visual Cues from Speech. Cyber Time International Journal of Technology and Management.
  • Apr; 7(2):219–25
  • Hassanat ABA. Visual Speech Recognition, Speech and Language Technologies. IvoIpsic editor. ISBN:978-953-307322-4,InTech. Available from: http://www.intechopen.com/ books/speechandlanguagetechnologies/visual-speech-recognition
  • Hassanat ABA, Jassim S. Color-based Lip Localization Method. Proceedings of SPIE- The International Society for Optical Engineering. 2010 Apr; 7708.
  • Cootes TF, Hill A, Taylor CJ, Haslam J. The use of active shape models for locating structures in medical images. J Image Vis Comput. 1994; 12(6):355–66
  • Cootes TF, Taylor CJ, Cooper DH, Graham J. Active shape models their training and application. J Comput Vis Image Underst. 1995; 61(1):38–59
  • Cootes TF, Edwards GJ, Taylor CJ. Active appearance models. Proc European Conf Comput Vis. 1998 Jun. p. 484–98.
  • Kass M, Witkin A, Terzopoulos D. Snakes: Active contour model. Int J Comput Vis. 1987; 1:321–31.
  • Matthews I, Cootes TF, Bangham JA, Cox S, Harvey R. Extraction of visual features for lip-reading. IEEE Trans Pattern Anal Mach Intell. 2002; 24(2):198–213.
  • Cootes TF, Taylor CJ. Active Shape Models - Smart Snakes. Proc. British Machine Vision Conference, Springer-Verlag. 1992. p. 266–75.
  • Mehrotra H, Agrawal G, Srivastava MC. Automatic Lip Contour Tracking and Visual Character Recognition for Computerized Lip Reading. International Journal of Electrical, Computer, Energetic, Electronic and Communication Engineering. 2009 Apr; 3(4):664–73.
  • Eveno N, Caplier A, Coulon PY. New color transformation for lips segmentation. Proc IEEE 4th Workshop Multimedia Signal Proc, France. 2001; 3–8.
  • Wark T, Sridharan S, Chandran V. An approach to statistical lip modeling for speaker identification via Chromatic Feature Extraction. Proc 4th Intl Conf Pattern Recognition. Brisbane, Australia. 1998. p. 123–5.
  • Yan X. Constrained Local Model for Face Alignment, a Tutorial. Available from: http://sites.google.com./site/ xgyanhome/home/projects/clm-implementation.Date Accessed: 28/08/2015
  • Chitu AG, Rothkrantz LJM. Visual Speech Recognition Automatic System for Lip Reading of Dutch. Journal of Information Technologies and Control. 2009; 3:2–9.

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.