HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation

Zunnan Xu^1,2, Zhentao Yu², Zixiang Zhou², Jun Zhou³, Xiaoyu Jin¹, Fating Hong⁴,
Xiaozhong Ji², Junwei Zhu², Chengfei Cai², Shiyu Tang², Qin Lin², Xiu Li¹, QingLin Lu²

¹Tsinghua University, ²Hunyuan Tencent, ³Sun Yat-sen University, ⁴HKUST

Arxiv

Code

Huggingface

We introduce HunyuanPortrait, a diffusion-based condition control method that employs implicit representations for highly controllable and lifelike portrait animation. Given a single portrait image as an appearance reference and video clips as driving templates, HunyuanPortrait can animate the character in the reference image by the facial expression and head pose of the driving videos. In our framework, we utilize pre-trained encoders to achieve the decoupling of portrait motion information and identity in videos. To do so, implicit representation is adopted to encode motion information and is employed as control signals in the animation phase. By leveraging the power of stable video diffusion as the main building block, we carefully design adapter layers to inject control signals into denoising unet through attention mechanisms. These bring spatial richness of details and temporal consistency. HunyuanPortrait also exhibits strong generalization performance, which can effectively disentangle appearance and motion under different image styles. Our framework outperforms existing methods, demonstrating superior temporal consistency and controllability.

(For the best viewing experience, please ensure your sound is enabled. If you are not hearing any audio, we recommend using Google Chrome.)

HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation

Disentanglement of Appearance and Facial Movements

Portrait Singing

Portrait Acting

Portrait Making Faces

Comparison with other methods

Self Reenactment

Cross Reenactment