Senior R&D Engineer at Baidu, focusing on Computer Vision, Multimodal Large Language Models, and Autonomous Driving research.
# **Sen Yang's Personal Website**
## About Me
- **Computer Vision Researcher**
- Research Interests:
- Computer Vision
- Multimodal Large Language Models
- Autonomous Driving
## Education
- **Ph.D.**:Southeast University (2019.5-2023.3)
- Master:Southeast University (2017.9-2019.1)
- Bachelor:Jilin University (2013.9-2017.7)
## Work Experience
- **Baidu VIS Senior R&D Engineer** (2023.7-Present)
- Tencent TPG Intern (2021.12-2022.8)
- Megvii Intern (2021.1-2021.10)
## Research Publications
- **Autonomous Driving**
- TopoSD: Topology-Enhanced Lane Segment Perception
- MGMapNet: Multi-Granularity Representation Learning
- **Multimodal Large Models**
- Vision Remember: Alleviating Visual Forgetting in Efficient MLLM
- **Pose Estimation**
- Detecting and grouping keypoints
- Capturing the motion of every joint
- Searching part-specific neural fabrics
- SimCC: A Simple Coordinate Classification
- TokenPose: Learning Keypoint Tokens
- TransPose: Keypoint Localization via Transformer
## Technical Skills
- **Multimodal Large Models**
- MLLM Architectures: LLaVA, Qwen2.5-VL, LISA
- Training Techniques: SFT, Autoregressive Models, RL
- Visual Token Compression, Large-scale Distributed Training
- **Autonomous Driving Perception**
- BEV Visual Mapping, Temporal Modeling
- Multimodal Fusion: Vision + Map Structured Data
- Navigation Map Integration, Probabilistic Planning
- **Deep Learning Frameworks**
- PyTorch, Python, C++
- Transformer Models, GPU/Ascend NPU Development
## Contact Information
- Email: yangsenius@gmail.com
- Blog: senyang-ml.github.io
- Google Scholar Profile
I am a research engineer at Baidu, primarily engaged in computer vision, multimodal large language models, and autonomous driving. I received my Ph.D. from Southeast University in 2023. My research focuses on computer vision and deep learning, with particular attention to 2D/3D human pose estimation, autonomous driving perception, and visual multimodal foundation models. I am passionate about developing innovative solutions that combine cutting-edge research with practical applications.
My research interests include:
Senior R&D Engineer
2023.7 - Present
Responsible for in-depth research and innovative applications in multimodal large models, computer vision perception, and decision-making algorithms, aiming to push the boundaries of technology and solve complex challenges. My work encompasses the entire process from cutting-edge algorithm design to product deployment, focusing on translating theoretical breakthroughs into practical business value and achieving significant progress in multiple core areas.
Intern
2021.12 - 2022.8
Responsible for 3D human reconstruction and motion generation project. Proposed an independent token representation method based on the parameterized SMPL model, achieving high-precision 3D human reconstruction and joint motion capture, improving 3DPW metrics by 8%. The paper was published in ICLR-2023 (spotlight, top25%).
Intern
2021.1 - 2021.10
Participated in human pose estimation projects. Designed a Transformer-based pose estimation model using token representation (ICCV-2021). Researched attention patterns in Transformer (Pattern Recognition). Pioneered a new coordinate classification paradigm, SimCC, breaking through the precision bottleneck of traditional regression and heatmap methods (ECCV 2022 Oral, adopted as a core solution by mainstream pose estimation frameworks).
arXiv preprint arXiv:2503.07168, 2025
ICLR 2025
Pattern Recognition
ICLR 2023 (spotlight, top 25%)
Pattern Recognition
ECCV 2022 (oral, top 5%) (cited 200+ times)
ICCV 2021 (cited 400+ times)
yangsenius@gmail.com