DetailFlow🚀: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction

🌿 Introduction

We present DetailFlow, a coarse-to-fine 1D autoregressive (AR) image generation method that models images through a novel next-detail prediction strategy. By learning a resolution-aware token sequence supervised with progressively degraded images, DetailFlow enables the generation process to start from the global structure and incrementally refine details.

DetailFlow encodes tokens with an inherent semantic ordering, where each subsequent token contributes additional high-resolution information. On the ImageNet 256×256 benchmark, our method achieves 2.96 gFID with 128 tokens, outperforming VAR (3.3 FID) and FlexVAR (3.05 FID), which both require 680 tokens in their AR models. Moreover, due to the significantly reduced token count and parallel inference mechanism, our method runs nearly 2× faster inference speed compared to VAR and FlexVAR.

📰 News

2025.06.24: 🎉🎉🎉 The code and model weights of Detailflow have been open-sourced. 2025.06.16: The code and model weights are finalized and currently undergoing legal review. We expect to release them soon. 2025.05.28: 🎉🎉🎉 DetailFlow is released! 🎉🎉🎉 See our paper .

Model zoo

Pre-trained siglip2

Tokenizer	reso.	rFID	AR model	gFID
DetailFlow-16	256	1.13	DetailFlow-16-GPT-L	2.96
DetailFlow-32	256	0.80	DetailFlow-32-GPT-L	2.75
DetailFlow-64	256	0.52	DetailFlow-64-GPT-L	2.59

Installation

Install torch>=2.1.2.
Install other pip packages via bash init.sh.

Prepare the ImageNet dataset

assume the ImageNet is in `/path/to/imagenet`. It should be like this:

/path/to/imagenet/:
    train/:
        n01440764: 
            many_images.JPEG ...
        n01443537:
            many_images.JPEG ...
    val/:
        n01440764:
            ILSVRC2012_val_00000293.JPEG ...
        n01443537:
            ILSVRC2012_val_00000236.JPEG ...

Modify the path-related environment variables in the init.sh file.

Training Scripts

Tokenizer

50 epochs can already yield good results, worth a try.

# 128 tokens, group size 8
bash scripts/train_tokenizer.sh detailflow_128token --config configs/vq_8k_siglip_b_res_p02_pw15_enc.yaml --num-latent-tokens 128 --group-size 8 --epochs 250 --global-token-loss-weight 1

# 256 tokens, group size 8
bash scripts/train_tokenizer.sh detailflow_256token --config configs/vq_8k_siglip_b_res_p02_pw15_enc.yaml --num-latent-tokens 256 --group-size 8 --epochs 250 --global-token-loss-weight 1

# 512 tokens, group size 8
bash scripts/train_tokenizer.sh detailflow_512token --config configs/vq_8k_siglip_b_res_p02_pw15_enc.yaml --num-latent-tokens 512 --group-size 8 --epochs 250 --global-token-loss-weight 1

AR Model

# DetailFlow-16 and DetailFlow-32 can better utilize GPU resources by using a batch size of 512.
bash scripts/train_c2i_token.sh /xxx/DetailFlow-16/checkpoints/128.pt demo_task_name --global-batch-size 512 --epochs 300

The model will be stored in the directory ./logs/task/job_name.

config.json and config.yaml: files contain some configuration information about the model and are necessary for loading the model during subsequent inference.
checkpoints: These store the ckpt files, with only the most recent 10 ckpt files being retained.

Evaluation

If you encounter the error torch._dynamo.exc.CacheLimitExceeded: cache_size_limit reached, you can try setting args.compile = False, but this will slow down the inference speed.
Due to randomness, evaluation metrics may vary by up to 0.3; we recommend running with multiple seeds and reporting the average value.

# Tokenizer
bash scripts/eval_tokenizer.sh /path_to_ckpt/xxx.pt

# ar model, 128 tokens
bash scripts/eval_c2i.sh /xxx/DetailFlow-16/checkpoints/128.pt /xxx/DetailFlow-16-GPT-L/checkpoints/128.pt 1.45

# ar model, 256 tokens
bash scripts/eval_c2i.sh /xxx/DetailFlow-32/checkpoints/256.pt /xxx/DetailFlow-32-GPT-L/checkpoints/256.pt 1.5

# ar model, 512 tokens
bash scripts/eval_c2i.sh /xxx/DetailFlow-64/checkpoints/512.pt /xxx/DetailFlow-64-GPT-L/checkpoints/512.pt 1.32

Acknowledgement

We thank the great work from SoftVQ-VAE, LlamaGen and PAR.

📄 Citation

If our work assists your research, feel free to give us a star ⭐ or cite us using

@article{liu2025detailflow,
  title={DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction},
  author={Liu, Yiheng and Qu, Liao and Zhang, Huichao and Wang, Xu and Jiang, Yi and Gao, Yiming and Ye, Hu and Li, Xian and Wang, Shuai and Du, Daniel K and others},
  journal={arXiv preprint arXiv:2505.21473},
  year={2025}
}

🔥 Open positions

We are hiring interns and full-time researchers at the ByteFlow Group, ByteDance, with a focus on multimodal understanding and generation (preferred base: Beijing, Shenzhen and Hangzhou). If you are interested, please contact [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
autoregressive		autoregressive
configs		configs
dataset		dataset
evaluations		evaluations
inference		inference
losses		losses
modelling		modelling
scripts		scripts
tools		tools
train		train
utils		utils
LICENSE-2.0.txt		LICENSE-2.0.txt
README.md		README.md
__init__.py		__init__.py
init.sh		init.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DetailFlow🚀: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction

🌿 Introduction

📰 News

Model zoo

Installation

Training Scripts

Tokenizer

AR Model

Evaluation

Acknowledgement

📄 Citation

🔥 Open positions

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

ByteFlow-AI/DetailFlow

Folders and files

Latest commit

History

Repository files navigation

DetailFlow🚀: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction

🌿 Introduction

📰 News

Model zoo

Installation

Training Scripts

Tokenizer

AR Model

Evaluation

Acknowledgement

📄 Citation

🔥 Open positions

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages