|Supervisor:||Ing. Jan Čech Ph.D.|
Lip reading (also called speech reading) is a ability to understand human speech by looking at mouth and lips, without actually hearing the audio. This ability is sometimes present at deaf people. There are evidences in literature that audio-visual speech recognition, i.e. a fusion of classical auditory and video recognition methods, can lead to improvement in speech recognition algorithms , especially in case of low quality audio or overlapping speech.
There are works that employs video channel to understand isolated spoken phrases, uttered digit, or simple commands, e.g. to control mobile devices or audio system in a car, [2,3].
First, the target will be the isolated word recognition. A design of a basic method, feature extraction, classifier, ground-truth experiment with emphasis on studying the effect of camera resolution, angle, and subject distance. We will provide a code for precise estimation of facial landmarks (nose tip, eye corners, mouth corners) from images or videos . Then it should be straightforward to extract representative features from the mouth regions.
If the work is successful it can be further extended and integrated with automatic speech recognizers to disambiguate the overlapping speech. Therefore the topic can be chosen as semestral, bachelor, or master project.
The image is a reproduction from .
 Petr Cisar, Milos Zelezny. Using of Lip-Reading for Speech Recognition in Noisy Environments. In Speech Processing, 2003. http://musslap.zcu.cz/en/audio-visual-speech-recognition/
 Kate Saenko, Karen Livescu, Michael Siracusa, Kevin Wilson, James Glass, and Trevor Darrell. Visual Speech Recognition with Loosely Synchronized Feature Streams. In ICCV, 2005.
 Dana Segev, Yoav Y. Schechner, Michael Elad. Example-based Cross-Modal Denoising. In CVPR, 2012.
 Jan Cech, Vojtech Franc, Jiri Matas. A 3D Approach to Facial Landmarks: Detection, Refinement, and Tracking. In Proc. ICPR, 2014.