You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Whenever I use a weight that's not of F32 type, the program falls back to using CPU backend instead of continuing with CUDA, which then slows down the text encoding process a lot.
[DEBUG] stable-diffusion.cpp:165 - Using CUDA backend
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: yes
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4070 Ti, compute capability 8.9, VMM: yes
[INFO ] stable-diffusion.cpp:197 - loading model from 'C:\...\stable-diffusion.cpp\models\sd3_medium_incl_clips_t5xxlfp16.safetensors'
[INFO ] model.cpp:908 - load C:\...\stable-diffusion.cpp\models\sd3_medium_incl_clips_t5xxlfp16.safetensors using safetensors format
[DEBUG] model.cpp:979 - init from 'C:\...\stable-diffusion.cpp\models\sd3_medium_incl_clips_t5xxlfp16.safetensors'
[INFO ] stable-diffusion.cpp:244 - Version: SD3.x
[INFO ] stable-diffusion.cpp:277 - Weight type: f16
[INFO ] stable-diffusion.cpp:278 - Conditioner weight type: f16
[INFO ] stable-diffusion.cpp:279 - Diffusion model weight type: f16
[INFO ] stable-diffusion.cpp:280 - VAE weight type: f16
[DEBUG] stable-diffusion.cpp:282 - ggml tensor size = 400 bytes
[INFO ] stable-diffusion.cpp:321 - set clip_on_cpu to true
[INFO ] stable-diffusion.cpp:324 - CLIP: Using CPU backend
My GPU doesn't have enough VRAM for me to force it to use F32 with --type F32, so my only option is to use F16 whenever I use SD3.x.
How can I make it so that it uses GPU during the text encoding process, so that it can be a lot faster?
The text was updated successfully, but these errors were encountered:
The text encoders are always ran on CPU for SD3.x and Flux models. It's not a bug, just a quirk of the current implemention. I guess there's something that goes wong when trying to run T5 on the GPU.
Whenever I use a weight that's not of F32 type, the program falls back to using CPU backend instead of continuing with CUDA, which then slows down the text encoding process a lot.
My GPU doesn't have enough VRAM for me to force it to use F32 with
--type F32
, so my only option is to use F16 whenever I use SD3.x.How can I make it so that it uses GPU during the text encoding process, so that it can be a lot faster?
The text was updated successfully, but these errors were encountered: