سال انتشار: ۱۳۸۵

محل انتشار: چهاردهمین کنفرانس مهندسی برق ایران

تعداد صفحات: ۶

نویسنده(ها):

Vahid Asadpour – Department of Biomedical Engineering Amirkabir University of Technology, Tehran, Iran
Farzad Towhidkhah – Department of Computer Engineering Amirkabir University of Technology, Tehran, Iran
Mehdi Homayoun poor – Department of Computer Engineering Amirkabir University of Technology, Tehran, Iran

چکیده:

Biometry is the science of human identification by their specific physical characteristics. Features like fingerprints, voice, iris and many others have been used by biometry techniques. However, the use of dynamic Audio-Visual features has the advantage of improving robustness to the environmental noise. These parameters are exclusively dependent on the neuromuscular properties of speaker, so imitation of valid speakers and false acceptance could be reduced to a large extent. Furthermore, we have focused on visual feature extraction and proposed a dynamic lip model system to extract the intrinsic features of moving limbs such as viscosity, elasticity, damping and mass from speaker recordings. These features are complementary to the vectors of lip motion and their first and second order derivations. Audio features are extracted using noise robust relative spectra perceptual linear prediction (RASTA-PLP) and combination of audio and video features is done using a multistream pseudo-synchronized hidden Markov model. The superior performance for the proposed system is demonstrated on a large multispeaker database of continuously spoken digits and a sentence that is phonetically rich. On a recognition task at 15 dB acoustic signal-to-noise ratio (SNR) the noise robust acoustic features lead to 9% error rate and combined noise robust acoustic features and dynamic muscle features to 1.5% error rate. False rejection has been reduced up to 0.5 percent and true identification has been increased up to 98.5% in low signal to noise ratios as 3 dB for the audio-visual system.