Title:A invariant structure approach for media representation and recognition
(基于不变结构的媒体表示与识别)
Speaker:Dr. Yu QIAO (乔宇)
中国科学院深圳先进技术研究院多媒体中心
报告时间:2012年8月10日,10:30AM-12:00AM
报告地点:tyc86太阳集团5号楼307会议室
Abstract:
This talk will be divided into two parts. In the first part, I will spend time to explain our recent work on structural representation of media, with speech as an example. One of the major challenging problems in speech engineering is to deal with non-linguistic variations contained in speech signals. These variations are caused by the difference of speakers, communication channels, environment noise, etc. Modern speech approaches mainly rely on statistical methods (such as GMM and HMM) to model the distributions of acoustic features. These methods always require a large amount of data for training. It is well-known that the performance of speech recognizers drops significantly if mismatch exists. We proposed an invariant structural representation of speech which aims at removing the non-linguistic factors from speech signals. Different from classical speech models, the structural representations make use of globally contrastive features to model the global and dynamic aspects of speech and discard the local and static features. It can be proved that these contrastive features (f-divergence) are invariant to any invertible transformations and thus are robust to non-linguistic variations. Experimental results on connected Japanese vowel utterances show that the structural approach achieves better recognition rates than HMM. In the second part, I will review several ongoing projects in Multimedia laboratory, Shenzhen Institutes of Advance Technology, including image retrieval, activity classification, 3D reconstruction, and face recognition.
Biography:
乔宇,研究员,博士生导师。