Skip to content

Commit afc1578

Browse files
committed
upload GenPercept v2
1 parent d807530 commit afc1578

330 files changed

Lines changed: 30650 additions & 556 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 14 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,26 @@
11

22
# ignore these folders
3-
checkpoint/
4-
data/
5-
output*
6-
temp/
7-
wandb/
8-
venv/
9-
cache/
10-
logs/
11-
datasets/
3+
/checkpoint
4+
/data
5+
/input*
6+
/output*
7+
/temp
8+
/wandb
9+
/venv
10+
/cache
11+
/datasets*
12+
/prs-eth
13+
/script/batch_train
14+
/script/train_debug
15+
1216
**/.ipynb_checkpoints/
1317
.vscode/
1418
.idea
15-
build/
16-
dist/
17-
*__pycache__*
18-
.hypothesis/
19-
data*
20-
genpercept.egg-info
21-
weights
2219

2320
# ignore these types
2421
*.pyc
2522
*.out
2623
*.log
2724
*.mexa64
2825
*.pdf
29-
*.tar
30-
26+
*.tar

GenPercept_v1/.gitignore

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
2+
# ignore these folders
3+
checkpoint/
4+
data/
5+
output*
6+
temp/
7+
wandb/
8+
venv/
9+
cache/
10+
logs/
11+
datasets/
12+
**/.ipynb_checkpoints/
13+
.vscode/
14+
.idea
15+
build/
16+
dist/
17+
*__pycache__*
18+
.hypothesis/
19+
data*
20+
genpercept.egg-info
21+
weights
22+
23+
# ignore these types
24+
*.pyc
25+
*.out
26+
*.log
27+
*.mexa64
28+
*.pdf
29+
*.tar
30+
File renamed without changes.

GenPercept_v1/README.md

Lines changed: 179 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,179 @@
1+
<div align="center">
2+
3+
<h1> GenPercept: Diffusion Models Trained with Large Data Are Transferable Visual Models </h1>
4+
5+
[Guangkai Xu](https://github.com/guangkaixu/), &nbsp;
6+
[Yongtao Ge](https://yongtaoge.github.io/), &nbsp;
7+
[Mingyu Liu](https://mingyulau.github.io/), &nbsp;
8+
[Chengxiang Fan](https://leaf1170124460.github.io/), &nbsp;
9+
[Kangyang Xie](https://github.com/felix-ky), &nbsp;
10+
[Zhiyue Zhao](https://github.com/ZhiyueZhau), &nbsp;
11+
[Hao Chen](https://stan-haochen.github.io/), &nbsp;
12+
[Chunhua Shen](https://cshen.github.io/), &nbsp;
13+
14+
Zhejiang University
15+
16+
### [HuggingFace (Space)](https://huggingface.co/spaces/guangkaixu/GenPercept) | [HuggingFace (Model)](https://huggingface.co/guangkaixu/GenPercept) | [arXiv](https://arxiv.org/abs/2403.06090)
17+
18+
#### 🔥 Fine-tune diffusion models for perception tasks, and inference with only one step! ✈️
19+
20+
</div>
21+
22+
<div align="center">
23+
<img width="800" alt="image" src="figs/pipeline.jpg">
24+
</div>
25+
26+
27+
## 📢 News
28+
- 2024.4.30: Release checkpoint weights of surface normal and dichotomous image segmentation.
29+
- 2024.4.7: Add [HuggingFace](https://huggingface.co/spaces/guangkaixu/GenPercept) App demo.
30+
- 2024.4.6: Release inference code and depth checkpoint weight of GenPercept in the [GitHub](https://github.com/aim-uofa/GenPercept) repo.
31+
- 2024.3.15: Release [arXiv v2 paper](https://arxiv.org/abs/2403.06090v2), with supplementary material.
32+
- 2024.3.10: Release [arXiv v1 paper](https://arxiv.org/abs/2403.06090v1).
33+
34+
35+
## 🖥️ Dependencies
36+
37+
```bash
38+
conda create -n genpercept python=3.10
39+
conda activate genpercept
40+
pip install -r requirements.txt
41+
pip install -e .
42+
```
43+
44+
## 🚀 Inference
45+
### Using Command-line Scripts
46+
Download the pre-trained models ```genpercept_ckpt_v1.zip``` from [BaiduNetDisk](https://pan.baidu.com/s/1n6FlqrOTZqHX-F6OhcvNyA?pwd=g2cm) (Extract code: g2cm), [HuggingFace](https://huggingface.co/guangkaixu/GenPercept), or [Rec Cloud Disk (To be uploaded)](). Please unzip the package and put the checkpoints under ```./weights/v1/```.
47+
48+
Then, place images in the ```./input/$TASK_TYPE``` dictionary, and run the following script. The output depth will be saved in ```./output/$TASK_TYPE```. The ```$TASK_TYPE``` can be chosen from ```depth```, ```normal```, and ```dis```.
49+
```bash
50+
sh scripts/inference_depth.sh
51+
```
52+
53+
For surface normal estimation and dichotomous image segmentation
54+
, run the following script:
55+
```bash
56+
bash scripts/inference_normal.sh
57+
bash scripts/inference_dis.sh
58+
```
59+
60+
Thanks to our one-step perception paradigm, the inference process runs much faster. (Around 0.4s for each image on an A800 GPU card.)
61+
62+
63+
### Using torch.hub
64+
GenPercept models can be easily used with torch.hub for quick integration into your Python projects. Here's how to use the models for normal estimation, depth estimation, and segmentation:
65+
#### Normal Estimation
66+
```python
67+
import torch
68+
import cv2
69+
import numpy as np
70+
71+
# Load the normal predictor model from torch hub
72+
normal_predictor = torch.hub.load("hugoycj/GenPercept-hub", "GenPercept_Normal", trust_repo=True)
73+
74+
# Load the input image using OpenCV
75+
image = cv2.imread("path/to/your/image.jpg", cv2.IMREAD_COLOR)
76+
77+
# Use the model to infer the normal map from the input image
78+
with torch.inference_mode():
79+
normal = normal_predictor.infer_cv2(image)
80+
81+
# Save the output normal map to a file
82+
cv2.imwrite("output_normal_map.png", normal)
83+
```
84+
85+
#### Depth Estimation
86+
```python
87+
import torch
88+
import cv2
89+
90+
# Load the depth predictor model from torch hub
91+
depth_predictor = torch.hub.load("hugoycj/GenPercept-hub", "GenPercept_Depth", trust_repo=True)
92+
93+
# Load the input image using OpenCV
94+
image = cv2.imread("path/to/your/image.jpg", cv2.IMREAD_COLOR)
95+
96+
# Use the model to infer the depth map from the input image
97+
with torch.inference_mode():
98+
depth = depth_predictor.infer_cv2(image)
99+
100+
# Save the output depth map to a file
101+
cv2.imwrite("output_depth_map.png", depth)
102+
```
103+
104+
#### Segmentation
105+
```python
106+
import torch
107+
import cv2
108+
109+
# Load the segmentation predictor model from torch hub
110+
seg_predictor = torch.hub.load("hugoycj/GenPercept-hub", "GenPercept_Segmentation", trust_repo=True)
111+
112+
# Load the input image using OpenCV
113+
image = cv2.imread("path/to/your/image.jpg", cv2.IMREAD_COLOR)
114+
115+
# Use the model to infer the segmentation map from the input image
116+
with torch.inference_mode():
117+
segmentation = seg_predictor.infer_cv2(image)
118+
119+
# Save the output segmentation map to a file
120+
cv2.imwrite("output_segmentation_map.png", segmentation)
121+
```
122+
123+
## 📖 Recommanded Works
124+
125+
- Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation. [arXiv](https://github.com/prs-eth/marigold), [GitHub](https://github.com/prs-eth/marigold).
126+
- GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image. [arXiv](https://arxiv.org/abs/2403.12013), [GitHub](https://github.com/fuxiao0719/GeoWizard).
127+
- FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models. [arXiv](https://arxiv.org/abs/2308.05733), [GitHub](https://github.com/aim-uofa/FrozenRecon).
128+
=======
129+
130+
131+
## 🏅 Results in Paper
132+
133+
### Depth and Surface Normal
134+
135+
<div align="center">
136+
<img width="800" alt="image" src="figs/demo_depth_normal.jpg">
137+
</div>
138+
139+
### Dichotomous Image Segmentation
140+
141+
<div align="center">
142+
<img width="400" alt="image" src="figs/demo_dis.jpg">
143+
</div>
144+
145+
### Image Matting
146+
147+
<div align="center">
148+
<img width="800" alt="image" src="figs/demo_matting.jpg">
149+
</div>
150+
151+
### Human Pose Estimation
152+
153+
<div align="center">
154+
<img width="800" alt="image" src="figs/demo_keypoint.jpg">
155+
</div>
156+
157+
158+
## 🎫 License
159+
160+
For non-commercial academic use, this project is licensed under [the 2-clause BSD License](https://opensource.org/license/bsd-2-clause).
161+
For commercial use, please contact [Chunhua Shen](mailto:chhshen@gmail.com).
162+
163+
164+
## 🎓 Citation
165+
```
166+
@article{xu2024diffusion,
167+
title={Diffusion Models Trained with Large Data Are Transferable Visual Models},
168+
author={Xu, Guangkai and Ge, Yongtao and Liu, Mingyu and Fan, Chengxiang and Xie, Kangyang and Zhao, Zhiyue and Chen, Hao and Shen, Chunhua},
169+
journal={arXiv preprint arXiv:2403.06090},
170+
year={2024}
171+
}
172+
```
173+
174+
## 📖 Related work
175+
176+
- Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation. [arXiv](https://github.com/prs-eth/marigold), [GitHub](https://github.com/prs-eth/marigold).
177+
- GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image. [arXiv](https://arxiv.org/abs/2403.12013), [GitHub](https://github.com/fuxiao0719/GeoWizard).
178+
- FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models. [arXiv](https://arxiv.org/abs/2308.05733), [GitHub](https://github.com/aim-uofa/FrozenRecon).
179+
501 KB
Loading

GenPercept_v1/figs/demo_dis.jpg

263 KB
Loading
299 KB
Loading
422 KB
Loading

0 commit comments

Comments
 (0)