Hi, I am a student at Tsinghua University. Currently, my research centers on multi-modal learning and generative models, with the ultimate goal of building a unified perception–generation engine that can understand, predict, and interact with the physical world. I am especially interested in how these models can be distilled into efficient engines that enable systems to perform robust visual reasoning and generative modeling in complex, unstructured environments. I now focus more on rethinking problems and offering simple, effective solutions. If you have any use cases you would like to share, please feel free to contact me!
Email: xuzn3(at)outlook.com
Links:
[Publication]
[Github]
Program Committee Member / Reviewer for: