Kohya SS FLUX LoRA training is way faster on Linux than Windows any ideas to debug? Same settings, libraries and GPU #9259

FurkanGozukara · 2024-08-23T11:42:53Z

FurkanGozukara
Aug 23, 2024

Describe the bug

I am using Kohya SS to train FLUX LoRA

On Linux RTX 3090 gets like 5.5 second / it - batch size 1 and 1024x1024 px resolution

On Windows RTX 3090 TI gets 7.7 second / it - has the most powerful CPU 13900 K

This speed dispercany is huge between Windows and Linux for some reason

Torch upgrade from 2.1 to 2.4 on Linux caused huge speed up and VRAM usage reduction but on Windows only VRAM usage dropped - speed same

Any ideas for how to fix? Using SDPA Cross Attention

I am sharing venv pip freeze of both Windows and Linux

Both has Python 3.10.11

Windows pip freeze

Microsoft Windows [Version 10.0.19045.4717]
(c) Microsoft Corporation. All rights reserved.

R:\Kohya_GUI_Flux_Installer\kohya_ss\venv\Scripts>activate

(venv) R:\Kohya_GUI_Flux_Installer\kohya_ss\venv\Scripts>pip freeze
absl-py==2.1.0
accelerate==0.33.0
aiofiles==23.2.1
aiohappyeyeballs==2.4.0
aiohttp==3.10.5
aiosignal==1.3.1
altair==4.2.2
annotated-types==0.7.0
antlr4-python3-runtime==4.9.3
anyio==4.4.0
appdirs==1.4.4
astunparse==1.6.3
async-timeout==4.0.3
attrs==24.2.0
bitsandbytes==0.43.3
certifi==2022.12.7
charset-normalizer==2.1.1
click==8.1.7
colorama==0.4.6
coloredlogs==15.0.1
contourpy==1.2.1
cycler==0.12.1
dadaptation==3.2
diffusers==0.25.0
docker-pycreds==0.4.0
easygui==0.98.3
einops==0.7.0
entrypoints==0.4
exceptiongroup==1.2.2
fairscale==0.4.13
fastapi==0.112.1
ffmpy==0.4.0
filelock==3.13.1
flatbuffers==24.3.25
fonttools==4.53.1
frozenlist==1.4.1
fsspec==2024.2.0
ftfy==6.1.1
gast==0.6.0
gitdb==4.0.11
GitPython==3.1.43
google-pasta==0.2.0
gradio==4.41.0
gradio_client==1.3.0
grpcio==1.65.5
h11==0.14.0
h5py==3.11.0
httpcore==1.0.5
httpx==0.27.0
huggingface-hub==0.24.5
humanfriendly==10.0
idna==3.4
imagesize==1.4.1
importlib_metadata==8.4.0
importlib_resources==6.4.4
invisible-watermark==0.2.0
Jinja2==3.1.3
jsonschema==4.23.0
jsonschema-specifications==2023.12.1
keras==3.5.0
kiwisolver==1.4.5
libclang==18.1.1
-e git+https://github.com/kohya-ss/sd-scripts.git@e1cd19c0c0ef55709e8eb1e5babe25045f65031f#egg=library&subdirectory=..\..\sd-scripts
lightning-utilities==0.11.6
lion-pytorch==0.0.6
lycoris-lora==2.2.0.post3
Markdown==3.7
markdown-it-py==3.0.0
MarkupSafe==2.1.5
matplotlib==3.9.2
mdurl==0.1.2
ml-dtypes==0.4.0
mpmath==1.3.0
multidict==6.0.5
namex==0.0.8
networkx==3.2.1
numpy==1.26.3
nvidia-cublas-cu12==12.4.2.65
nvidia-cuda-cupti-cu12==12.4.99
nvidia-cuda-nvrtc-cu12==12.4.99
nvidia-cuda-runtime-cu12==12.4.99
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.2.0.44
nvidia-curand-cu12==10.3.5.119
nvidia-cusolver-cu12==11.6.0.99
nvidia-cusparse-cu12==12.3.0.142
nvidia-nvjitlink-cu12==12.4.99
nvidia-nvtx-cu12==12.4.99
omegaconf==2.3.0
onnx==1.16.1
onnxruntime-gpu==1.17.1
open-clip-torch==2.20.0
opencv-python==4.7.0.68
opt-einsum==3.3.0
optree==0.12.1
orjson==3.10.7
packaging==24.1
pandas==2.2.2
pathtools==0.1.2
pillow==10.2.0
prodigyopt==1.0
protobuf==3.20.3
psutil==6.0.0
pydantic==2.8.2
pydantic_core==2.20.1
pydub==0.25.1
Pygments==2.18.0
pyparsing==3.1.2
pyreadline3==3.4.1
python-dateutil==2.9.0.post0
python-multipart==0.0.9
pytorch-lightning==1.9.0
pytz==2024.1
PyWavelets==1.7.0
PyYAML==6.0.2
referencing==0.35.1
regex==2024.7.24
requests==2.32.3
rich==13.7.1
rpds-py==0.20.0
ruff==0.6.1
safetensors==0.4.4
scipy==1.11.4
semantic-version==2.10.0
sentencepiece==0.2.0
sentry-sdk==2.13.0
setproctitle==1.3.3
shellingham==1.5.4
six==1.16.0
smmap==5.0.1
sniffio==1.3.1
starlette==0.38.2
sympy==1.12
tensorboard==2.17.1
tensorboard-data-server==0.7.2
tensorflow==2.17.0
tensorflow-intel==2.17.0
tensorflow-io-gcs-filesystem==0.31.0
termcolor==2.4.0
timm==0.6.12
tk==0.1.0
tokenizers==0.19.1
toml==0.10.2
tomlkit==0.12.0
toolz==0.12.1
torch==2.4.0+cu124
torchmetrics==1.4.1
torchvision==0.19.0+cu124
tqdm==4.66.5
transformers==4.44.0
typer==0.12.4
typing_extensions==4.9.0
tzdata==2024.1
urllib3==2.2.2
uvicorn==0.30.6
voluptuous==0.13.1
wandb==0.15.11
wcwidth==0.2.13
websockets==12.0
Werkzeug==3.0.4
wrapt==1.16.0
xformers==0.0.27.post2
yarl==1.9.4
zipp==3.20.0

(venv) R:\Kohya_GUI_Flux_Installer\kohya_ss\venv\Scripts>

Ubuntu pip freeze

(venv) Ubuntu@0054-kci-prxmx10136:~/apps/kohya_ss$ pip freeze
absl-py==2.1.0
accelerate==0.33.0
aiofiles==23.2.1
aiohttp==3.9.5
aiosignal==1.3.1
altair==4.2.2
annotated-types==0.7.0
antlr4-python3-runtime==4.9.3
anyio==4.4.0
appdirs==1.4.4
astunparse==1.6.3
async-timeout==4.0.3
attrs==23.2.0
bitsandbytes==0.43.3
cachetools==5.3.3
certifi==2024.2.2
charset-normalizer==3.3.2
click==8.1.7
coloredlogs==15.0.1
contourpy==1.2.1
cycler==0.12.1
dadaptation==3.1
diffusers==0.25.0
dnspython==2.6.1
docker-pycreds==0.4.0
easygui==0.98.3
einops==0.7.0
email_validator==2.1.1
entrypoints==0.4
exceptiongroup==1.2.1
fairscale==0.4.13
fastapi==0.111.0
fastapi-cli==0.0.4
ffmpy==0.3.2
filelock==3.14.0
flatbuffers==24.3.25
fonttools==4.53.0
frozenlist==1.4.1
fsspec==2024.5.0
ftfy==6.1.1
gast==0.5.4
gitdb==4.0.11
GitPython==3.1.43
google-auth==2.29.0
google-auth-oauthlib==1.2.0
google-pasta==0.2.0
gradio==4.41.0
gradio_client==1.3.0
grpcio==1.64.0
h11==0.14.0
h5py==3.11.0
httpcore==1.0.5
httptools==0.6.1
httpx==0.27.0
huggingface-hub==0.24.5
humanfriendly==10.0
idna==3.7
imagesize==1.4.1
importlib_metadata==7.1.0
importlib_resources==6.4.0
invisible-watermark==0.2.0
Jinja2==3.1.4
jsonschema==4.22.0
jsonschema-specifications==2023.12.1
keras==2.15.0
kiwisolver==1.4.5
libclang==18.1.1
-e git+https://github.com/kohya-ss/sd-scripts.git@e1cd19c0c0ef55709e8eb1e5babe25045f65031f#egg=library&subdirectory=../../sd-scripts
lightning-utilities==0.11.2
lion-pytorch==0.0.6
lycoris-lora==2.2.0.post3
Markdown==3.6
markdown-it-py==3.0.0
MarkupSafe==2.1.5
matplotlib==3.9.0
mdurl==0.1.2
ml-dtypes==0.2.0
mpmath==1.3.0
multidict==6.0.5
networkx==3.3
numpy==1.26.4
nvidia-cublas-cu12==12.4.2.65
nvidia-cuda-cupti-cu12==12.4.99
nvidia-cuda-nvrtc-cu12==12.4.99
nvidia-cuda-runtime-cu12==12.4.99
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.2.0.44
nvidia-curand-cu12==10.3.5.119
nvidia-cusolver-cu12==11.6.0.99
nvidia-cusparse-cu12==12.3.0.142
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.4.99
nvidia-nvtx-cu12==12.4.99
oauthlib==3.2.2
omegaconf==2.3.0
onnx==1.16.1
onnxruntime-gpu==1.17.1
open-clip-torch==2.20.0
opencv-python==4.7.0.68
opt-einsum==3.3.0
orjson==3.10.3
packaging==24.0
pandas==2.2.2
pathtools==0.1.2
pillow==10.3.0
prodigyopt==1.0
protobuf==3.20.3
psutil==5.9.8
pyasn1==0.6.0
pyasn1_modules==0.4.0
pydantic==2.7.2
pydantic_core==2.18.3
pydub==0.25.1
Pygments==2.18.0
pyparsing==3.1.2
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
python-multipart==0.0.9
pytorch-lightning==1.9.0
pytz==2024.1
PyWavelets==1.6.0
PyYAML==6.0.1
referencing==0.35.1
regex==2024.5.15
requests==2.32.3
requests-oauthlib==2.0.0
rich==13.7.1
rpds-py==0.18.1
rsa==4.9
ruff==0.4.6
safetensors==0.4.2
scipy==1.11.4
semantic-version==2.10.0
sentencepiece==0.2.0
sentry-sdk==2.3.1
setproctitle==1.3.3
shellingham==1.5.4
six==1.16.0
smmap==5.0.1
sniffio==1.3.1
starlette==0.37.2
sympy==1.12.1
tensorboard==2.15.2
tensorboard-data-server==0.7.2
tensorflow==2.15.0.post1
tensorflow-estimator==2.15.0
tensorflow-io-gcs-filesystem==0.37.0
termcolor==2.4.0
timm==0.6.12
tk==0.1.0
tokenizers==0.19.1
toml==0.10.2
tomlkit==0.12.0
toolz==0.12.1
torch==2.4.0+cu124
torchmetrics==1.4.0.post0
torchvision==0.19.0+cu124
tqdm==4.66.4
transformers==4.44.0
triton==3.0.0
typer==0.12.3
typing_extensions==4.12.0
tzdata==2024.1
ujson==5.10.0
urllib3==2.2.1
uvicorn==0.30.0
uvloop==0.19.0
voluptuous==0.13.1
wandb==0.15.11
watchfiles==0.22.0
wcwidth==0.2.13
websockets==11.0.3
Werkzeug==3.0.3
wrapt==1.14.1
xformers==0.0.23.post1+cu118
yarl==1.9.4
zipp==3.19.1

Who can help?

Stable Diffusion 3 / FLUX : @yiyixuxu @sayakpaul @DN6 @asomoza

sayakpaul · 2024-08-23T11:54:59Z

sayakpaul
Aug 23, 2024
Maintainer

This should be a discussion not an issue.

1 reply

FurkanGozukara Aug 23, 2024
Author

Sure thank you

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kohya SS FLUX LoRA training is way faster on Linux than Windows any ideas to debug? Same settings, libraries and GPU #9259

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Kohya SS FLUX LoRA training is way faster on Linux than Windows any ideas to debug? Same settings, libraries and GPU #9259

FurkanGozukara Aug 23, 2024

Describe the bug

Who can help?

Replies: 1 comment · 1 reply

sayakpaul Aug 23, 2024 Maintainer

FurkanGozukara Aug 23, 2024 Author

FurkanGozukara
Aug 23, 2024

Replies: 1 comment 1 reply

sayakpaul
Aug 23, 2024
Maintainer

FurkanGozukara Aug 23, 2024
Author