Senior R&D Engineer at Baidu, focusing on Computer Vision, Multimodal Large Language Models, and Autonomous Driving research.
# **Sen Yang's Personal Website** ## About Me - **Computer Vision Researcher** - Research Interests: - Computer Vision - Multimodal Large Language Models - Autonomous Driving ## Education - **Ph.D.**:Southeast University (2019.5-2023.3) - Master:Southeast University (2017.9-2019.1) - Bachelor:Jilin University (2013.9-2017.7) ## Work Experience - **Baidu VIS Senior R&D Engineer** (2023.7-Present) - Tencent TPG Intern (2021.12-2022.8) - Megvii Intern (2021.1-2021.10) ## Research Publications - **Autonomous Driving** - TopoSD: Topology-Enhanced Lane Segment Perception - MGMapNet: Multi-Granularity Representation Learning - **Multimodal Large Models** - Vision Remember: Alleviating Visual Forgetting in Efficient MLLM - **Pose Estimation** - Detecting and grouping keypoints - Capturing the motion of every joint - Searching part-specific neural fabrics - SimCC: A Simple Coordinate Classification - TokenPose: Learning Keypoint Tokens - TransPose: Keypoint Localization via Transformer ## Technical Skills - **Multimodal Large Models** - MLLM Architectures: LLaVA, Qwen2.5-VL, LISA - Training Techniques: SFT, Autoregressive Models, RL - Visual Token Compression, Large-scale Distributed Training - **Autonomous Driving Perception** - BEV Visual Mapping, Temporal Modeling - Multimodal Fusion: Vision + Map Structured Data - Navigation Map Integration, Probabilistic Planning - **Deep Learning Frameworks** - PyTorch, Python, C++ - Transformer Models, GPU/Ascend NPU Development ## Contact Information - Email: yangsenius@gmail.com - Blog: senyang-ml.github.io - Google Scholar Profile
I am a research engineer at Baidu, primarily engaged in computer vision, multimodal large language models, and autonomous driving. I received my Ph.D. from Southeast University in 2023. My research focuses on computer vision and deep learning, with particular attention to 2D/3D human pose estimation, autonomous driving perception, and visual multimodal foundation models. I am passionate about developing innovative solutions that combine cutting-edge research with practical applications.
My research interests include:
Senior R&D Engineer
2023.7 - Present
Responsible for in-depth research and innovative applications in multimodal large models, computer vision perception, and decision-making algorithms, aiming to push the boundaries of technology and solve complex challenges. My work encompasses the entire process from cutting-edge algorithm design to product deployment, focusing on translating theoretical breakthroughs into practical business value and achieving significant progress in multiple core areas.
Intern
2021.12 - 2022.8
Responsible for 3D human reconstruction and motion generation project. Proposed an independent token representation method based on the parameterized SMPL model, achieving high-precision 3D human reconstruction and joint motion capture, improving 3DPW metrics by 8%. The paper was published in ICLR-2023 (spotlight, top25%).
Intern
2021.1 - 2021.10
Participated in human pose estimation projects. Designed a Transformer-based pose estimation model using token representation (ICCV-2021). Researched attention patterns in Transformer (Pattern Recognition). Pioneered a new coordinate classification paradigm, SimCC, breaking through the precision bottleneck of traditional regression and heatmap methods (ECCV 2022 Oral, adopted as a core solution by mainstream pose estimation frameworks).
arXiv preprint arXiv:2503.07168, 2025
ICLR 2025
Pattern Recognition
ICLR 2023 (spotlight, top 25%)
Pattern Recognition
ECCV 2022 (oral, top 5%)
yangsenius@gmail.com