Sen Yang Photo

Sen Yang

Computer Vision | Multimodal Large Language Models | Autonomous Driving

Senior R&D Engineer at Baidu, focusing on Computer Vision, Multimodal Large Language Models, and Autonomous Driving research.

Email: yangsenius@gmail.com

Google Scholar: Profile

Blog: senyang-ml.github.io

TL;DR (Overview)

# **Sen Yang's Personal Website**
## About Me
- **Computer Vision Researcher**
- Research Interests:
  - Computer Vision
  - Multimodal Large Language Models
  - Autonomous Driving
## Education
- **Ph.D.**:Southeast University (2019.5-2023.3)
- Master:Southeast University (2017.9-2019.1)
- Bachelor:Jilin University (2013.9-2017.7)
## Work Experience
- **Baidu VIS Senior R&D Engineer** (2023.7-Present)
- Tencent TPG Intern (2021.12-2022.8)
- Megvii Intern (2021.1-2021.10)
## Research Publications
- **Autonomous Driving**
  - TopoSD: Topology-Enhanced Lane Segment Perception
  - MGMapNet: Multi-Granularity Representation Learning
- **Multimodal Large Models**
  - Vision Remember: Alleviating Visual Forgetting in Efficient MLLM
- **Pose Estimation**
  - Detecting and grouping keypoints
  - Capturing the motion of every joint
  - Searching part-specific neural fabrics
  - SimCC: A Simple Coordinate Classification
  - TokenPose: Learning Keypoint Tokens
  - TransPose: Keypoint Localization via Transformer
## Technical Skills
- **Multimodal Large Models**
  - MLLM Architectures: LLaVA, Qwen2.5-VL, LISA
  - Training Techniques: SFT, Autoregressive Models, RL
  - Visual Token Compression, Large-scale Distributed Training
- **Autonomous Driving Perception**
  - BEV Visual Mapping, Temporal Modeling
  - Multimodal Fusion: Vision + Map Structured Data
  - Navigation Map Integration, Probabilistic Planning
- **Deep Learning Frameworks**
  - PyTorch, Python, C++
  - Transformer Models, GPU/Ascend NPU Development
## Contact Information
- Email: yangsenius@gmail.com
- Blog: senyang-ml.github.io
- Google Scholar Profile
                    
Tip: You can drag and click to explore the mindmap, "Reset" refreshes the view
gantt dateFormat YYYY.MM title Sen Yang Personal Experience Timeline section Education Background Bachelor's Degree :edu1, 2013.09, 2017.07 Master's Degree :edu2, 2017.09, 2019.01 Ph.D. Degree :edu3, 2019.05, 2023.03 section Work Experience Megvii Intern :work1, 2021.01, 2021.10 Tencent PCG Intern :work2, 2021.12, 2022.08 Baidu Senior R&D Engineer :work3, 2023.07, 2025.06

About Me

I am a research engineer at Baidu, primarily engaged in computer vision, multimodal large language models, and autonomous driving. I received my Ph.D. from Southeast University in 2023. My research focuses on computer vision and deep learning, with particular attention to 2D/3D human pose estimation, autonomous driving perception, and visual multimodal foundation models. I am passionate about developing innovative solutions that combine cutting-edge research with practical applications.

My research interests include:

  • Computer Vision
  • Deep Learning
  • Human Pose Estimation
  • Autonomous Driving Perception
  • Multimodal Foundation Models

Work and Internship Experience

Baidu VIS

Senior R&D Engineer

2023.7 - Present

Responsible for in-depth research and innovative applications in multimodal large models, computer vision perception, and decision-making algorithms, aiming to push the boundaries of technology and solve complex challenges. My work encompasses the entire process from cutting-edge algorithm design to product deployment, focusing on translating theoretical breakthroughs into practical business value and achieving significant progress in multiple core areas.

Tencent PCG

Intern

2021.12 - 2022.8

Responsible for 3D human reconstruction and motion generation project. Proposed an independent token representation method based on the parameterized SMPL model, achieving high-precision 3D human reconstruction and joint motion capture, improving 3DPW metrics by 8%. The paper was published in ICLR-2023 (spotlight, top25%).

Megvii Technology

Intern

2021.1 - 2021.10

Participated in human pose estimation projects. Designed a Transformer-based pose estimation model using token representation (ICCV-2021). Researched attention patterns in Transformer (Pattern Recognition). Pioneered a new coordinate classification paradigm, SimCC, breaking through the precision bottleneck of traditional regression and heatmap methods (ECCV 2022 Oral, adopted as a core solution by mainstream pose estimation frameworks).

Education Background

Bachelor
Jilin University, College of Communication Engineering
Automation
2013.09 - 2017.07
Master
Southeast University, School of Automation
Pattern Recognition and Intelligent Systems
2017.09 - 2019.01
Ph.D.
Southeast University, School of Automation
Pattern Recognition and Intelligent Systems
2019.03 - 2023.05

Research Publications

HisTrackMap: Global Vectorized High-Definition Map Construction via History Map Tracking

Jing Yang*, Sen Yang*, Xiao Tan, Hanli Wang.

arXiv preprint arXiv:2503.07168, 2025

TopoSD: Topology-Enhanced Lane Segment Perception with SDMap Prior

Sen Yang, Minyue Jiang, Ziwei Fan, Xiaolu Xie, Xiao Tan, Yingying Li, Errui Ding, Liang Wang, Jingdong Wang.

2024 Preprint

MGMapNet: Multi-Granularity Representation Learning for End-to-End Vectorized HD Map Construction

Jing Yang*, Minyue Jiang*, Sen Yang*, Xiao Tan, Yingying Li, Errui Ding, Hanli Wang, Jingdong Wang.

ICLR 2025

Detecting and grouping keypoints for multi-person pose estimation using instance-aware attention

Sen Yang, Ze Feng, Zhicheng Wang, Yanjie Li, Shoukui Zhang, Zhibin Quan, Shu-tao Xia, Wankou Yang.

Pattern Recognition

Capturing the motion of every joint: 3D human pose and mesh recovery with independent tokens

Sen Yang, Wen Heng, Gang Liu, Guozhong Luo, Wankou Yang, Gang Yu.

ICLR 2023 (spotlight, top 25%)

Searching part-specific neural fabrics for human pose estimation

Sen Yang, Wankou Yang, Zhen Cui.

Pattern Recognition

SimCC: A Simple Coordinate Classification perspective for human pose estimation

Yanjie Li, Sen Yang, Peidong Liu, Shoukui Zhang, Yunxiao Wang, Zhicheng Wang, Wankou Yang, Shu-Tao Xia.

ECCV 2022 (oral, top 5%)

TokenPose: Learning Keypoint Tokens for Human Pose Estimation

Yanjie Li, Shoukui Zhang, Zhicheng Wang, Sen Yang, Wankou Yang, Shu-Tao Xia, Erjin Zhou.

ICCV 2021

TransPose: Keypoint Localization via Transformer

Sen Yang, Zhibin Quan, Mu Nie, Wankou Yang.

ICCV 2021

Professional Skills

  • Solid theoretical and practical experience in computer vision, with research and development experience in multiple sub-fields; focused on deep learning, Transformer models, and human pose estimation during Ph.D. studies; accumulated strong research and engineering practice experience in deep learning and large models in various work and internship projects.
  • Proficient in deep learning model design and optimization, integrating cross-domain innovative thinking to transform theoretical advantages into practical engineering applications, with strong code implementation capabilities.
  • Experienced in large enterprise project development and cross-departmental collaboration, emphasizing efficient communication and teamwork.
  • Proficient in mainstream development frameworks such as PyTorch, familiar with programming languages such as Python and C++, and skilled in Linux development.
  • Strong self-driven learning ability (Transfer & Meta Learning), passionate about cutting-edge technologies, constantly seeking new knowledge, proficient in learning and utilizing tools, and valuing efficiency.

Contact Me

yangsenius@gmail.com