Zunnan Xu

Zunnan Xu

Student at Tsinghua University

Hi, I am a student at Tsinghua University. Currently, my research centers on multi-modal learning and generative models, with the ultimate goal of building a unified perception–generation engine that can understand, predict, and interact with the physical world.

I am especially interested in how these models can be distilled into efficient engines that enable systems to perform robust visual reasoning and generative modeling in complex, unstructured environments. I now focus more on rethinking problems and offering simple, effective solutions. If you have any use cases you would like to share, please feel free to contact me!

News

2025One paper accepted to NeurIPS 2025.
2025Two papers accepted to ICCV 2025.
2025Received Outstanding Reviewer Award at CVPR 2025.
2025Three papers accepted to CVPR 2025.
2024One paper accepted to EMNLP 2024.
2024Won 3rd prize in CVPR 2024 OVD challenge.
2024Two papers accepted to ICML 2024 workshop.
2024One paper accepted to NeurIPS 2024.
2023One paper accepted to ICCV 2023.

Research Interests

Selected Publications

(* denotes equal contribution)

2D Video & Image Generation
Hunyuanportrait: Implicit condition control for enhanced portrait animation
Zunnan Xu, Zhentao Yu, Zixiang Zhou, Jun Zhou, and others
CVPR 2025
HunyuanVideo: A Systematic Framework For Large Video Generative Models
Core contributor in Hunyuan Foundation Model Team
Technical Report, 2024
Audio-visual controlled video diffusion with masked selective state spaces modeling for natural talking head generation
Fa-Ting Hong, Zunnan Xu, Zixiang Zhou, Jun Zhou, Xiu Li, Qin Lin, Qinglin Lu, Dan Xu
ICCV 2025
Alignment is All You Need: A Training-free Augmentation Strategy for Pose-guided Video Generation
Xiaoyu Jin*, Zunnan Xu*, Mingwen Ou, Wenming Yang
CVG@ICML 2024
Zero-shot 3D-Aware Trajectory-Guided image-to-video generation via Test-Time Training
Ruicheng Zhang*, Jun Zhou*, Zunnan Xu*, Zihao Liu, Jiehui Huang, Mingyang Zhang, Yu Sun, Xiu Li
AAAI 2026
Fireedit: Fine-grained instruction-based image editing via region-aware vision language model
Jun Zhou, Jiahao Li, Zunnan Xu, Hanhui Li, Yiji Cheng, Fa-Ting Hong, Qin Lin, Qinglin Lu, Xiaodan Liang
CVPR 2025
InterAnimate: Taming Region-aware Diffusion Model for Realistic Human Interaction Animation
Yukang Lin, Yan Hong, Zunnan Xu, Xindi Li, Chao Xu, Chuanbiao Song, Ronghui Li, Haoxing Chen, Jun Lan, Huijia Zhu, and others
ACMMM 2025
Vision-Language Models
Bridging vision and language encoders: Parameter-efficient tuning for referring image segmentation
Zunnan Xu*, Zhihong Chen*, Yong Zhang, Yibing Song, Xiang Wan, Guanbin Li
ICCV 2023
Igniting vlms toward the embodied space
Core contributor in X Square Robot Team
Technical Report, 2025
SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning
Jiaqi Huang*, Zunnan Xu*, Jun Zhou, Ting Liu, Yicheng Xiao, Mingwen Ou, Bowen Ji, Xiu Li, Kehong Yuan
NeurIPS 2025
Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation
Jiaqi Huang*, Zunnan Xu*, Ting Liu, Yong Liu, Haonan Han, Kehong Yuan, Xiu Li
AAAI 2025
Enhancing Fine-grained Multi-modal Alignment via Adapters: A Parameter-Efficient Training Framework
Zunnan Xu, Jiaqi Huang, Ting Liu, Yong Liu, Haonan Han, Kehong Yuan, Xiu Li
WANT@ICML 2024
MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension
Ting Liu*, Zunnan Xu*, Yue Hu, Liangtao Shi, Zhiqiang Wang, Quanjun Yin
EMNLP 2024
3D Motion & Assets Synthesis
Mambatalk: Efficient holistic gesture synthesis with selective state space models
Zunnan Xu, Yukang Lin, Haonan Han, Sicheng Yang, Ronghui Li, Yachao Zhang, Xiu Li
NeurIPS 2024
Chain of generation: Multi-modal gesture synthesis via cascaded conditional control
Zunnan Xu, Yachao Zhang, Sicheng Yang, Ronghui Li, Xiu Li
AAAI 2024
Separate to Collaborate: Dual-Stream Diffusion Model for Coordinated Piano Hand Motion Synthesis
Zihao Liu*, Mingwen Ou*, Zunnan Xu*, Jiaqi Huang, Haonan Han, Ronghui Li, Xiu Li
ACMMM 2025
Freetalker: Controllable speech and text-driven gesture generation based on diffusion models
Sicheng Yang, Zunnan Xu, Haiwei Xue, Yongkang Cheng, Shaoli Huang, Mingming Gong, Zhiyong Wu
ICASSP 2024
Atom: Aligning text-to-motion model at event-level with gpt-4vision reward
Haonan Han, Xiangzuo Wu, Huan Liao, Zunnan Xu, Zhongyuan Hu, Ronghui Li, Yachao Zhang, Xiu Li
CVPR 2025
Reparo: Compositional 3d assets generation with differentiable 3d layout alignment
Haonan Han, Rui Yang, Huan Liao, Jiankai Xing, Zunnan Xu, Xiaoming Yu, Junwei Zha, Xiu Li, Wanhua Li
ICCV 2025
Consistent123: One image to highly consistent 3d asset using case-aware diffusion priors
Yukang Lin, Haonan Han, Chaoqun Gong, Zunnan Xu, Yachao Zhang, Xiu Li
ACMMM 2024

Honors & Awards

2025Outstanding Scholarship of Tencent Rhino-Bird Research Elite Program
2025Golden Award in International Exhibition of Inventions Geneva
2024Graduate National Scholarship
2024Tencent Rhino-Bird Research Elite Program Student
2023Outstanding Graduate (Bachelor's Level)
2022Golden Award in iGEM Competition
2021Golden Award in National College Student Algorithm Design and Programming Challenge
2020Undergraduate National Scholarship

Academic Services

Reviewer for: