HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation

1Tsinghua University,  2Hunyuan Tencent,  3Sun Yat-sen University,  4HKUST

We introduce HunyuanPortrait, a diffusion-based condition control method that employs implicit representations for highly controllable and lifelike portrait animation. Given a single portrait image as an appearance reference and video clips as driving templates, HunyuanPortrait can animate the character in the reference image by the facial expression and head pose of the driving videos. In our framework, we utilize pre-trained encoders to achieve the decoupling of portrait motion information and identity in videos. To do so, implicit representation is adopted to encode motion information and is employed as control signals in the animation phase. By leveraging the power of stable video diffusion as the main building block, we carefully design adapter layers to inject control signals into denoising unet through attention mechanisms. These bring spatial richness of details and temporal consistency. HunyuanPortrait also exhibits strong generalization performance, which can effectively disentangle appearance and motion under different image styles. Our framework outperforms existing methods, demonstrating superior temporal consistency and controllability.

empty

(For the best viewing experience, please ensure your sound is enabled. If you are not hearing any audio, we recommend using Google Chrome.)

Disentanglement of Appearance and Facial Movements

Portrait Singing

Portrait Acting

Portrait Making Faces

Comparison with other methods

Self Reenactment

Cross Reenactment