|
| 1 | +<div align="center"> |
| 2 | + |
| 3 | +<h1> GenPercept: Diffusion Models Trained with Large Data Are Transferable Visual Models </h1> |
| 4 | + |
| 5 | +[Guangkai Xu](https://github.com/guangkaixu/), |
| 6 | +[Yongtao Ge](https://yongtaoge.github.io/), |
| 7 | +[Mingyu Liu](https://mingyulau.github.io/), |
| 8 | +[Chengxiang Fan](https://leaf1170124460.github.io/), |
| 9 | +[Kangyang Xie](https://github.com/felix-ky), |
| 10 | +[Zhiyue Zhao](https://github.com/ZhiyueZhau), |
| 11 | +[Hao Chen](https://stan-haochen.github.io/), |
| 12 | +[Chunhua Shen](https://cshen.github.io/), |
| 13 | + |
| 14 | +Zhejiang University |
| 15 | + |
| 16 | +### [HuggingFace (Space)](https://huggingface.co/spaces/guangkaixu/GenPercept) | [HuggingFace (Model)](https://huggingface.co/guangkaixu/GenPercept) | [arXiv](https://arxiv.org/abs/2403.06090) |
| 17 | + |
| 18 | +#### 🔥 Fine-tune diffusion models for perception tasks, and inference with only one step! ✈️ |
| 19 | + |
| 20 | +</div> |
| 21 | + |
| 22 | +<div align="center"> |
| 23 | +<img width="800" alt="image" src="figs/pipeline.jpg"> |
| 24 | +</div> |
| 25 | + |
| 26 | + |
| 27 | +## 📢 News |
| 28 | +- 2024.4.30: Release checkpoint weights of surface normal and dichotomous image segmentation. |
| 29 | +- 2024.4.7: Add [HuggingFace](https://huggingface.co/spaces/guangkaixu/GenPercept) App demo. |
| 30 | +- 2024.4.6: Release inference code and depth checkpoint weight of GenPercept in the [GitHub](https://github.com/aim-uofa/GenPercept) repo. |
| 31 | +- 2024.3.15: Release [arXiv v2 paper](https://arxiv.org/abs/2403.06090v2), with supplementary material. |
| 32 | +- 2024.3.10: Release [arXiv v1 paper](https://arxiv.org/abs/2403.06090v1). |
| 33 | + |
| 34 | + |
| 35 | +## 🖥️ Dependencies |
| 36 | + |
| 37 | +```bash |
| 38 | +conda create -n genpercept python=3.10 |
| 39 | +conda activate genpercept |
| 40 | +pip install -r requirements.txt |
| 41 | +pip install -e . |
| 42 | +``` |
| 43 | + |
| 44 | +## 🚀 Inference |
| 45 | +### Using Command-line Scripts |
| 46 | +Download the pre-trained models ```genpercept_ckpt_v1.zip``` from [BaiduNetDisk](https://pan.baidu.com/s/1n6FlqrOTZqHX-F6OhcvNyA?pwd=g2cm) (Extract code: g2cm), [HuggingFace](https://huggingface.co/guangkaixu/GenPercept), or [Rec Cloud Disk (To be uploaded)](). Please unzip the package and put the checkpoints under ```./weights/v1/```. |
| 47 | + |
| 48 | +Then, place images in the ```./input/$TASK_TYPE``` dictionary, and run the following script. The output depth will be saved in ```./output/$TASK_TYPE```. The ```$TASK_TYPE``` can be chosen from ```depth```, ```normal```, and ```dis```. |
| 49 | +```bash |
| 50 | +sh scripts/inference_depth.sh |
| 51 | +``` |
| 52 | + |
| 53 | +For surface normal estimation and dichotomous image segmentation |
| 54 | +, run the following script: |
| 55 | +```bash |
| 56 | +bash scripts/inference_normal.sh |
| 57 | +bash scripts/inference_dis.sh |
| 58 | +``` |
| 59 | + |
| 60 | +Thanks to our one-step perception paradigm, the inference process runs much faster. (Around 0.4s for each image on an A800 GPU card.) |
| 61 | + |
| 62 | + |
| 63 | +### Using torch.hub |
| 64 | +GenPercept models can be easily used with torch.hub for quick integration into your Python projects. Here's how to use the models for normal estimation, depth estimation, and segmentation: |
| 65 | +#### Normal Estimation |
| 66 | +```python |
| 67 | +import torch |
| 68 | +import cv2 |
| 69 | +import numpy as np |
| 70 | + |
| 71 | +# Load the normal predictor model from torch hub |
| 72 | +normal_predictor = torch.hub.load("hugoycj/GenPercept-hub", "GenPercept_Normal", trust_repo=True) |
| 73 | + |
| 74 | +# Load the input image using OpenCV |
| 75 | +image = cv2.imread("path/to/your/image.jpg", cv2.IMREAD_COLOR) |
| 76 | + |
| 77 | +# Use the model to infer the normal map from the input image |
| 78 | +with torch.inference_mode(): |
| 79 | + normal = normal_predictor.infer_cv2(image) |
| 80 | + |
| 81 | +# Save the output normal map to a file |
| 82 | +cv2.imwrite("output_normal_map.png", normal) |
| 83 | +``` |
| 84 | + |
| 85 | +#### Depth Estimation |
| 86 | +```python |
| 87 | +import torch |
| 88 | +import cv2 |
| 89 | + |
| 90 | +# Load the depth predictor model from torch hub |
| 91 | +depth_predictor = torch.hub.load("hugoycj/GenPercept-hub", "GenPercept_Depth", trust_repo=True) |
| 92 | + |
| 93 | +# Load the input image using OpenCV |
| 94 | +image = cv2.imread("path/to/your/image.jpg", cv2.IMREAD_COLOR) |
| 95 | + |
| 96 | +# Use the model to infer the depth map from the input image |
| 97 | +with torch.inference_mode(): |
| 98 | + depth = depth_predictor.infer_cv2(image) |
| 99 | + |
| 100 | +# Save the output depth map to a file |
| 101 | +cv2.imwrite("output_depth_map.png", depth) |
| 102 | +``` |
| 103 | + |
| 104 | +#### Segmentation |
| 105 | +```python |
| 106 | +import torch |
| 107 | +import cv2 |
| 108 | + |
| 109 | +# Load the segmentation predictor model from torch hub |
| 110 | +seg_predictor = torch.hub.load("hugoycj/GenPercept-hub", "GenPercept_Segmentation", trust_repo=True) |
| 111 | + |
| 112 | +# Load the input image using OpenCV |
| 113 | +image = cv2.imread("path/to/your/image.jpg", cv2.IMREAD_COLOR) |
| 114 | + |
| 115 | +# Use the model to infer the segmentation map from the input image |
| 116 | +with torch.inference_mode(): |
| 117 | + segmentation = seg_predictor.infer_cv2(image) |
| 118 | + |
| 119 | +# Save the output segmentation map to a file |
| 120 | +cv2.imwrite("output_segmentation_map.png", segmentation) |
| 121 | +``` |
| 122 | + |
| 123 | +## 📖 Recommanded Works |
| 124 | + |
| 125 | +- Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation. [arXiv](https://github.com/prs-eth/marigold), [GitHub](https://github.com/prs-eth/marigold). |
| 126 | +- GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image. [arXiv](https://arxiv.org/abs/2403.12013), [GitHub](https://github.com/fuxiao0719/GeoWizard). |
| 127 | +- FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models. [arXiv](https://arxiv.org/abs/2308.05733), [GitHub](https://github.com/aim-uofa/FrozenRecon). |
| 128 | +======= |
| 129 | + |
| 130 | + |
| 131 | +## 🏅 Results in Paper |
| 132 | + |
| 133 | +### Depth and Surface Normal |
| 134 | + |
| 135 | +<div align="center"> |
| 136 | +<img width="800" alt="image" src="figs/demo_depth_normal.jpg"> |
| 137 | +</div> |
| 138 | + |
| 139 | +### Dichotomous Image Segmentation |
| 140 | + |
| 141 | +<div align="center"> |
| 142 | +<img width="400" alt="image" src="figs/demo_dis.jpg"> |
| 143 | +</div> |
| 144 | + |
| 145 | +### Image Matting |
| 146 | + |
| 147 | +<div align="center"> |
| 148 | +<img width="800" alt="image" src="figs/demo_matting.jpg"> |
| 149 | +</div> |
| 150 | + |
| 151 | +### Human Pose Estimation |
| 152 | + |
| 153 | +<div align="center"> |
| 154 | +<img width="800" alt="image" src="figs/demo_keypoint.jpg"> |
| 155 | +</div> |
| 156 | + |
| 157 | + |
| 158 | +## 🎫 License |
| 159 | + |
| 160 | +For non-commercial academic use, this project is licensed under [the 2-clause BSD License](https://opensource.org/license/bsd-2-clause). |
| 161 | +For commercial use, please contact [Chunhua Shen](mailto:chhshen@gmail.com). |
| 162 | + |
| 163 | + |
| 164 | +## 🎓 Citation |
| 165 | +``` |
| 166 | +@article{xu2024diffusion, |
| 167 | + title={Diffusion Models Trained with Large Data Are Transferable Visual Models}, |
| 168 | + author={Xu, Guangkai and Ge, Yongtao and Liu, Mingyu and Fan, Chengxiang and Xie, Kangyang and Zhao, Zhiyue and Chen, Hao and Shen, Chunhua}, |
| 169 | + journal={arXiv preprint arXiv:2403.06090}, |
| 170 | + year={2024} |
| 171 | +} |
| 172 | +``` |
| 173 | + |
| 174 | +## 📖 Related work |
| 175 | + |
| 176 | +- Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation. [arXiv](https://github.com/prs-eth/marigold), [GitHub](https://github.com/prs-eth/marigold). |
| 177 | +- GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image. [arXiv](https://arxiv.org/abs/2403.12013), [GitHub](https://github.com/fuxiao0719/GeoWizard). |
| 178 | +- FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models. [arXiv](https://arxiv.org/abs/2308.05733), [GitHub](https://github.com/aim-uofa/FrozenRecon). |
| 179 | + |
0 commit comments