Home

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Ma, Yufei

Assistant Professor 

Research Interests: FPGA Hardware System, Deep Learning Acceleration, Energy-efficient VLSI Design

Email: yufei.ma@pku.edu.cn

Yufei Ma received the B.S. degree from Nanjing University of Aeronautics and Astronautics in 2011, the M.S.E. degree from University of Pennsylvania in 2013, and the Ph.D. degree from Arizona State University in 2018. From 2018 to 2019, he was with FABU America Inc., where he worked on high-performance ASIC chip for ADAS/AD Applications. From 2019 to 2020, he was with Nanjing University as a Research Associate Professor.

In 2020, he joined the School of EECS and Institute of Artificial Intelligence at Peking University as an Assistant Professor. His current research focuses on energy-efficient algorithm-hardware co-design for deep learning with FPGA and ASIC acceleration.

Dr. Ma has obtained the following research achievements.

1) FPGA Acceleration of Deep Learning Algorithms: Designed an FPGA-based inference accelerator to realize real-time image recognition and object detection using deep learning algorithms. Proposed to optimize the detection algorithm for efficient hardware design. The proposed CNN accelerator on Intel Stratix 10 FPGA achieved 2.1 TOPS throughput and 2.8× superior energy efficiency compared to high-end GPU.

2) Automatic Compilation of Diverse CNNs onto FPGAs: Developed an RTL-level CNN compiler that automatically generates customized FPGA hardware for the inference tasks of various CNNs, in order to enable high-level fast prototyping of CNNs from software to FPGA and still keep the benefits of low-level hardware optimization. The proposed methodology was demonstrated on Intel Stratix V, Arria 10, and Stratix 10 FPGAs for various well-known CNN algorithms, e.g. AlexNet, NiN, VGG, GoogLeNet and ResNet CNN models.

3) Stochastic Computing: Designed digital stochastic computing circuit in 65nm CMOS to reconstruct compressively sensed bioelectrical signals, e.g. ECG and EMG, with 5× energy-delay product improvement and 2× area reduction compared to a conventional non-stochastic design.