Skip to content

tulvgengenr/UniF2ace

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 

Repository files navigation

UniF²ace: A Unified Fine-grained Face Understanding and Generation Model

ArXiv Project Dataset

UniF²ace is the first unified multimodal model specifically designed for face understanding and generation, encompassing tasks such as visual question answering, face image captioning and text-to-face image generation.

overview

This repository contains code for the paper UniF²ace: A Unified Fine-grained Face Understanding and Generation Model.

🎉 Key Contributions

  • A unified face understanding and generation framework: We introduce UniF²ace, the first unified multimodal model for fine-grained face understanding and generation, establishing a solid baseline.

  • A novel Dual Discrete Diffusion (D3Diff) loss function and a hybrid MoE architecture: We introduce D3Diff, a novel loss function within that theoretically unifies score-based diffusion and masked generative models, leading to a better approximation of the negative log-likelihood for high-fidelity generation and fine-grained attribute control. Additionally, we explore a hybrid Mixture-of-Experts (MoE) architecture at the token and sequence levels, adaptively incorporating the semantic and identity facial embeddings to complement the attribute forgotten phenomenon in representation evolvement.

  • We construct UniF²aceD-1M, a dataset containing 1M VQAs with an automated pipeline. Extensive experiments demonstrate that UniF²ace significantly outperforms or is on par with existing state-of-the-art models with a similar scale on various benchmarks, all while providing a more unified and efficient solution.

method

🔥 News

2025.07.15 We have released the fine-grained face dataset UniF²aceD-1M with captions and VQAs !

dataset

Citation

@article{li2025unif2ace,
  title={Unif2ace: Fine-grained face understanding and generation with unified multimodal models},
  author={Li, Junzhe and Qiu, Xuerui and Xu, Linrui and Guo, Liya and Qu, Delin and Long, Tingting and Fan, Chun and Li, Ming},
  journal={arXiv preprint arXiv:2503.08120},
  year={2025}
}

License

All code within this repository is under Apache License 2.0.

About

(ICLR2026) UniF²ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages