Peking University team work extensively in the research area of Machine perception and made major breakthrough in the visual and speech perception.

1) Auditory Perception. On research emphasis of speech perception is to process speech in complex scene. Binaural processing in complex scene. Listeners can achieve significant benefit from binaural hearing when sound sources are spatially separated, which is termed as. Series of works have been conducted to explore the spatial release from masking (SRM). Research has also been conducted to reveal the effect of factors the underlying mechanisms on speech perception in noisy environment and, from the aspects of human behavior performance and to neuro-physiology responses. New speech processing algorithms have been developed for hearing aids and cochlear implants. In the field of automatic speech recognition (ASR), researchers at PKU introduced DNN and made significant progress in recent years. A series of new models have been developed including a GMM-free method for training DNN acoustic model, the deep recurrent neural network based acoustic model, and the multi-scale convolutional neural network based acoustic model.

2) Visual Perception. In processing images, Inexact augmented Lagrangian multiplier method is the first highly efficient algorithm for solving Robust PCA (1000X speedup). The method is widely applied to solve various low-rank problems, such as 3D shape recovery, underwater image recovery, video stabilization, etc. PKU also pioneered the study on multiple-block Alternating Direction Method of Multipliers and gave provably convergent algorithms. Fast algorithms for non-convex problems that enhance low-rankness and sparsity were developed to significantly advance the optimization techniques.

3) Team at PKU also exploits structured prediction and deep learning techniques to reveal the interplay between semantic recognition and 3D reconstruction, and establish the interdependency between semantic and geometric entities at different levels of understanding, from low-level primitives like primitives and parts to high-level descriptions like scene layout and objection constitution. Key contributions made in the field of scene analysis and modeling include structure-sensitive over-segmentation, global-structure-based salient object detection, supervised kernel descriptors for visual recognition, similarity-aware depth image super-resolution, and fast semantic segmentation via structured patch prediction.

4) Machine Perception in Driving. On the application side of machine perception, modeling and reasoning the driving behaviors in real-world traffic for autonomous driving, advanced driving assistance systems (ADAS). An instrumented vehicle with embedded software of multimodal perception has been developed to collect naturalistic driving data. It monitors simultaneously the driver, the ego vehicle and the environment during driving on the public motorways in Beijing for more than 7000km since 2013. Some data (sole-authored by PKU) has been web published, which, to the best of our knowledge, is the first dataset containing large sets of vehicle trajectories that are collected through naturalistic on-road driving. A major feature of studies is that the driver/ego’s behavior is analyzed within the context of traffic to model driving as social behaviors.

SRM works have been published on top academic journals, such as JASA, IEEE-ASL. Additionally, a new head-related transfer function (HRTF) database was built up to facilitate our studies, which has been widely cited in the world. Two papers on automatic speech recognition with DNN have been awarded as the “Best Paper” in important international conferences. The intelligent system equipped with these technologies has been ranked as top 1 in the international competition organized by International Speech Communication Association in 2013. Patents about speech perception in noisy environment algorithms have been issued, and one of them has been applied on the first cochlear implant product in China. The research results on scene analysis and modeling has been provided as core algorithms in many real products.

Research results of the machine perception has been published in the top journals and conferences, including IEEE TPAMI, IEEE TIT, IEEE TIP, IEEE TNNNLS, IJCV, ICCV, CVPR, ICML, NIPS, AAAI, and IJCAI. The technical report on inexact augmented Lagrangian multiplier method and its formal publication at NIPS 2011 together received 1900+ Google Scholar citations. Work on multiple-block Alternating Direction Method of Multipliers and gave provably convergent algorithms was published on premier journals and conferences and further received 900+ GS citations.