Skip to content

CLIP: Using CPU Backend despite configuring for CUDA #636

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
hamstared opened this issue Mar 25, 2025 · 1 comment
Open

CLIP: Using CPU Backend despite configuring for CUDA #636

hamstared opened this issue Mar 25, 2025 · 1 comment

Comments

@hamstared
Copy link

Whenever I use a weight that's not of F32 type, the program falls back to using CPU backend instead of continuing with CUDA, which then slows down the text encoding process a lot.

[DEBUG] stable-diffusion.cpp:165  - Using CUDA backend
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: yes
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4070 Ti, compute capability 8.9, VMM: yes
[INFO ] stable-diffusion.cpp:197  - loading model from 'C:\...\stable-diffusion.cpp\models\sd3_medium_incl_clips_t5xxlfp16.safetensors'
[INFO ] model.cpp:908  - load C:\...\stable-diffusion.cpp\models\sd3_medium_incl_clips_t5xxlfp16.safetensors using safetensors format
[DEBUG] model.cpp:979  - init from 'C:\...\stable-diffusion.cpp\models\sd3_medium_incl_clips_t5xxlfp16.safetensors'
[INFO ] stable-diffusion.cpp:244  - Version: SD3.x
[INFO ] stable-diffusion.cpp:277  - Weight type:                 f16
[INFO ] stable-diffusion.cpp:278  - Conditioner weight type:     f16
[INFO ] stable-diffusion.cpp:279  - Diffusion model weight type: f16
[INFO ] stable-diffusion.cpp:280  - VAE weight type:             f16
[DEBUG] stable-diffusion.cpp:282  - ggml tensor size = 400 bytes
[INFO ] stable-diffusion.cpp:321  - set clip_on_cpu to true
[INFO ] stable-diffusion.cpp:324  - CLIP: Using CPU backend

My GPU doesn't have enough VRAM for me to force it to use F32 with --type F32, so my only option is to use F16 whenever I use SD3.x.

How can I make it so that it uses GPU during the text encoding process, so that it can be a lot faster?

@stduhpf
Copy link
Contributor

stduhpf commented Mar 25, 2025

The text encoders are always ran on CPU for SD3.x and Flux models. It's not a bug, just a quirk of the current implemention. I guess there's something that goes wong when trying to run T5 on the GPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants