Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add imatrix support #633

Open
wants to merge 15 commits into
base: master
Choose a base branch
from
Open

Add imatrix support #633

wants to merge 15 commits into from

Conversation

stduhpf
Copy link
Contributor

@stduhpf stduhpf commented Mar 23, 2025

Adds support for llama.cpp-style importance matrices (see https://github.com/ggml-org/llama.cpp/blob/master/examples/imatrix/README.md and ggml-org/llama.cpp#4861) to increase the performance of quantized models.

Models generated with imatrix are backwards compatible with the previous releases.

Usage:

To train imatrix:
sd.exe [same exact parameters as normal generation] --imat-out imatrix.dat
This will generate an image and train the imatrix while doing so (you can use -b to generate multiple images at once).

To keep training an existing imatrix:
sd.exe [same exact parameters as normal generation] --imat-out imatrix.dat --imat-in imatrix.dat

You can load multiple imatrix at once, this will merge them in the output:
sd.exe [same exact parameters as normal generation] --imat-out imatrix.dat --imat-in imatrix.dat --imat-in imatrix2.dat

Quantize with imatrix:
sd.exe -M convert [same exact parameters as normal quantization] --imat-in imatrix.dat
(again you can use multiple imatrix)

Examples

"simple" imatrix trained on a batch of 32 image generations (512x512) with the dreamshaper_8LCM (f16, 8 steps) model and empty prompts. (because of the model's bias, it was mostly calibrated on portraits of asian women):

"better" imatrix trained on 504 generations using diverse prompst and aspect ratios, using the same model.

iq3_xxs static * iq3_xxs with simple imatrix iq3_xxs with better imatrix fp16
xxs_stat (no prompt) xxs_imat (no prompt) xxs_imatv2 (no prompt) full (no prompt)
xxs_stat a cute cat playing with yarn xxs_imat a cute cat playing with yarn xxs_imatv2 a cute cat playing with yarn full a cute cat playing with yarn
xxs_stat a girl wearing a funny hat xxs_imat a girl wearing a funny hat xxs_imatv2 a girl wearing a funny hat full a girl wearing a funny hat

* static means that the importance matrix is not active (all ones), as it is set up to do when quantizing with the master branch.

iq2_xs seems completely broken even with imatrix for this model, but the effect is still noticable. With iq4, the static quant is already pretty good so the difference in quality isn't obvious. (both using the "better" imatrix here)

iq2_xs static iq2_xs imatrix iq4_nl static iq4_nl imatrix
2xs_static a girl wearing a funny hat 2xs_imatv2 a girl wearing a funny hat 4nl_static a girl wearing a funny hat 4nl_imatv2 a girl wearing a funny hat

Interesting observation: for the "girl wearing a funny hat" prompt, static quants put her in a city like the original fp16 model does, while the quants calibrated with the "better" imatrix put her in a forest. This is most likely due to a bias in the calibraton dataset, which contained some samples of girls with forest background and none with city backgrounds.

You can find these models and the imatrices used here: https://huggingface.co/stduhpf/dreamshaper-8LCM-im-GGUF-sdcpp

You can find examples with other models in the discussion.

@Green-Sky
Copy link
Contributor

@stduhpf Thank you for working on this :)

Do you think transformer based models work better with importance, like ggml quants generally do? (eg. flux)

@stduhpf
Copy link
Contributor Author

stduhpf commented Mar 23, 2025

@Green-Sky I have no idea. I'm not sure it would work right now, but I've only tested sd1.5 so far, because it's so much faster

@stduhpf
Copy link
Contributor Author

stduhpf commented Mar 23, 2025

I don't understand why the CI's linker is unable to find log_printf(). It works on my machine just fine, which is also windows, and I'm also using cmake... It's probably because I'm not using -DSD_BUILD_SHARED_LIBS=ON

@Green-Sky
Copy link
Contributor

I don't understand why the CI's linker is unable to find log_printf(). It works on my machine just fine, which is also windows, and I'm also using cmake... It's probably because I'm not using -DSD_BUILD_SHARED_LIBS=ON

Maybe imatrix.hpp should just not be a header only lib ^^

@stduhpf
Copy link
Contributor Author

stduhpf commented Mar 23, 2025

@Green-Sky I'm doing some tests with sd3, it seems to be doing something, but cooking imatrix for larger un-distilled models takes ages compared to something like sd1.5 LCM.

Now that I think about It, applying imatrix to flux (or any model with standalone diffusion model) will be tricky, The imatrix uses the name that the weight have at runtime, but when quantizing the names are not prefixed like they are at runtime.

@idostyle
Copy link
Contributor

Nice job stduhpf.

Do you think transformer based models work better with importance, like ggml quants generally do? (eg. flux)

Flux also seems to struggle with the lower bit i-quants: https://huggingface.co/Eviation/flux-imatrix

@stduhpf
Copy link
Contributor Author

stduhpf commented Mar 24, 2025

Results with my sd3 2B experiments, using a basic imatrix trained on a dozen generations only.

K-quants

sd3_medium_incl_clips_t5xxl q3_K static sd3_medium_incl_clips_t5xxl q3_K imatrix sd3_medium_incl_clips_t5xxl q4_K static sd3_medium_incl_clips_t5xxl q4_K imatrix
output output output output
sd3_medium_incl_clips_t5xxl q5_K static sd3_medium_incl_clips_t5xxl q5_K imatrix sd3_medium_incl_clips_t5xxl q6_K static sd3_medium_incl_clips_t5xxl q6_K imatrix
output output output output

i-quants

sd3_medium_incl_clips_t5xxl iq3_xxs static sd3_medium_incl_clips_t5xxl iq3_xxs imatrix sd3_medium_incl_clips_t5xxl iq3_s static sd3_medium_incl_clips_t5xxl iq3_s imatrix
output output output output
sd3_medium_incl_clips_t5xxl iq4_xs static sd3_medium_incl_clips_t5xxl iq4_xs imatrix sd3_medium_incl_clips_t5xxl iq4_nl static sd3_medium_incl_clips_t5xxl iq4_nl imatrix
output copy 11 output copy 10 output output

Ground truth

sd3_medium_incl_clips_t5xxl fp16
output

(all images generated with same settings, only quantization changes)

@stduhpf
Copy link
Contributor Author

stduhpf commented Mar 28, 2025

Ok I found a satisfactory way to apply imatrix to flux. (Also it seems like training the imatrix with quantized models works just fine)

Flux.1 schnell q2_k static Flux.1 schnell q2_k imatrix Flux.1 schnell q2_k imatrix trained on Flux dev
schnell-q2k-static schnell-q2k-imatrix output
Flux.1 dev q2_k static Flux.1 dev q2_k imatrix Flux.1 dev q2_k imatrix trained on Flux schnell
output output output

(imatrix trained on 10 generations using static q4_k (schnell) or iq4_nl (dev) model)

@Green-Sky
Copy link
Contributor

Looks great.

Did you tune it on the same amount sampling steps? Optimising for your own usecase is probably the best for lower quants.

@stduhpf
Copy link
Contributor Author

stduhpf commented Mar 28, 2025

Did you tune it on the same amount sampling steps? Optimising for your own usecase is probably the best for lower quants.

For the schnell one, I trained it with 4 steps only, with different resolutions. My PC is currently cooking a Flux dev imatrix using varying step count (from 16 to 40). Maybe I'll try to make one with fixed step count to compare with after.

@stduhpf stduhpf force-pushed the imatrix branch 2 times, most recently from 4ec74a9 to 24d8fd7 Compare March 29, 2025 18:03
@stduhpf stduhpf marked this pull request as ready for review March 29, 2025 19:04
@stduhpf
Copy link
Contributor Author

stduhpf commented Mar 29, 2025

I feel like this is pretty much ready now.

@stduhpf stduhpf changed the title Imatrix: first implementation attempt Add imatrix support Mar 29, 2025
@Green-Sky
Copy link
Contributor

I am trying this right now. I am no expert on how the importance data flows into the quantization, but does it make sense to sample using a quant, just to recreate the same quant with the importance data?

You showed that using a higher quant to generate the imat works, but using the same quant would be interesting...

@stduhpf
Copy link
Contributor Author

stduhpf commented Mar 31, 2025

I am trying this right now. I am no expert on how the importance data flows into the quantization, but does it make sense to sample using a quant, just to recreate the same quant with the importance data?

I think it would work. As long as the original quant is "good enough" to generate coherent images, the activations should already be representative of the ideal activations, and therefore the imatrix shouldn't be too different from the one trained on the full precision model, with the same kind of improvements.

@Green-Sky
Copy link
Contributor

Thanks, good to know. This all reminds me very much of PGO, where you usually stack them, to get the last 1-2% performance. 😄

I am doing q5_k right now, and the image is very coherent indeed.

@@ -204,6 +210,8 @@ void print_usage(int argc, const char* argv[]) {
printf(" --upscale-repeats Run the ESRGAN upscaler this many times (default 1)\n");
printf(" --type [TYPE] weight type (examples: f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0, q2_K, q3_K, q4_K)\n");
printf(" If not specified, the default is the type of the weight file\n");
printf(" --imat-out [PATH] If set, compute the imatrix for this run and save it to the provided path");
printf(" --imat-in [PATH] Use imatrix for quantization.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both new options miss a new line.

imatrix.cpp Outdated
return false;
}

// Recreate the state as expected by save_imatrix(), and corerct for weighted sum.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corerct -> correct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, I just copy-pasted that part of the code, maybe the typo is important

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😁

@Green-Sky
Copy link
Contributor

Green-Sky commented Mar 31, 2025

Not sure I did anything wrong, but using imats produced by the same quant seems to produce the same model file. So it either does not work, or I did something wrong.

$ result/bin/sd -M convert -m models/flux.1-lite-8B.safetensors --type q5_K -o models/flux.1-lite-8B-q5_k-igo.gguf --imat-in flux.1-lite-8B-q5_k-1024x768-g3.3-s24-b3-asia1.dat --imat-in flux.1-lite-8B-q5_k-512x768-g3.4-s30-b3-nordic1.dat --imat-in flux.1-lite-8B-q5_k-768x768-g3.2-s30-b3-nordic2.dat --imat-in flux.1-lite-8B-q5_k-768x768-g3.5-s28-b3-anime1.dat
loading imatrix from 'flux.1-lite-8B-q5_k-1024x768-g3.3-s24-b3-asia1.dat'
loading imatrix from 'flux.1-lite-8B-q5_k-512x768-g3.4-s30-b3-nordic1.dat'
loading imatrix from 'flux.1-lite-8B-q5_k-768x768-g3.2-s30-b3-nordic2.dat'
loading imatrix from 'flux.1-lite-8B-q5_k-768x768-g3.5-s28-b3-anime1.dat'
[INFO ] model.cpp:918  - load models/flux.1-lite-8B.safetensors using safetensors format
[INFO ] model.cpp:2003 - model tensors mem size: 5562.48MB
  |==================================================| 516/516 - 24.39it/s
[INFO ] model.cpp:2038 - load tensors done
[INFO ] model.cpp:2039 - trying to save tensors to models/flux.1-lite-8B-q5_k-igo.gguf
convert 'models/flux.1-lite-8B.safetensors'/'' to 'models/flux.1-lite-8B-q5_k-igo.gguf' success
ba1a721718a7431f79a3266b127999bae515ed2b3f0cb835558b9db7d0bb9890  models/flux.1-lite-8B-q5_k.gguf
ba1a721718a7431f79a3266b127999bae515ed2b3f0cb835558b9db7d0bb9890  models/flux.1-lite-8B-q5_k-igo.gguf

edit: strings shows that the imat contains data for the diffusion model:

...
model.diffusion_model.single_blocks.35.modulation.lin.weightH
model.diffusion_model.double_blocks.5.img_mlp.2.weightH
...

edit2: and the imats are different

b481d4c6e8903ac4a1e612a8e9b5dc8afc4b2bb31d1fea2a2a404e9bd565416a  flux.1-lite-8B-q5_k-1024x768-g3.3-s24-b3-asia1.dat
ab385c84e8bd4002a1579350a7bdd01a96581900922cf192bc47012224038ebe  flux.1-lite-8B-q5_k-512x768-g3.4-s30-b3-nordic1.dat

edit3: tried to do an optimized q4_k, same issue, so something is fundamentally broken with the flux prune/distill/dedistill i am using.
https://huggingface.co/Freepik/flux.1-lite-8B
https://huggingface.co/Green-Sky/flux.1-lite-8B-GGUF/tree/main/base

@stduhpf
Copy link
Contributor Author

stduhpf commented Mar 31, 2025

Not sure I did anything wrong, but using imats produced by the same quant seems to produce the same model file. So it either does not work, or I did something wrong.

$ result/bin/sd -M convert -m models/flux.1-lite-8B.safetensors --type q5_K -o models/flux.1-lite-8B-q5_k-igo.gguf --imat-in flux.1-lite-8B-q5_k-1024x768-g3.3-s24-b3-asia1.dat --imat-in flux.1-lite-8B-q5_k-512x768-g3.4-s30-b3-nordic1.dat --imat-in flux.1-lite-8B-q5_k-768x768-g3.2-s30-b3-nordic2.dat --imat-in flux.1-lite-8B-q5_k-768x768-g3.5-s28-b3-anime1.dat
loading imatrix from 'flux.1-lite-8B-q5_k-1024x768-g3.3-s24-b3-asia1.dat'
loading imatrix from 'flux.1-lite-8B-q5_k-512x768-g3.4-s30-b3-nordic1.dat'
loading imatrix from 'flux.1-lite-8B-q5_k-768x768-g3.2-s30-b3-nordic2.dat'
loading imatrix from 'flux.1-lite-8B-q5_k-768x768-g3.5-s28-b3-anime1.dat'
[INFO ] model.cpp:918  - load models/flux.1-lite-8B.safetensors using safetensors format
[INFO ] model.cpp:2003 - model tensors mem size: 5562.48MB
  |==================================================| 516/516 - 24.39it/s
[INFO ] model.cpp:2038 - load tensors done
[INFO ] model.cpp:2039 - trying to save tensors to models/flux.1-lite-8B-q5_k-igo.gguf
convert 'models/flux.1-lite-8B.safetensors'/'' to 'models/flux.1-lite-8B-q5_k-igo.gguf' success
ba1a721718a7431f79a3266b127999bae515ed2b3f0cb835558b9db7d0bb9890  models/flux.1-lite-8B-q5_k.gguf
ba1a721718a7431f79a3266b127999bae515ed2b3f0cb835558b9db7d0bb9890  models/flux.1-lite-8B-q5_k-igo.gguf

Try with result/bin/sd -M convert --diffusion-model models/flux.1-lite-8B.safetensors --type q5_K -o models/flux.1-lite-8B-q5_k-igo.gguf --imat-in flux.1-lite-8B-q5_k-1024x768-g3.3-s24-b3-asia1.dat --imat-in flux.1-lite-8B-q5_k-512x768-g3.4-s30-b3-nordic1.dat --imat-in flux.1-lite-8B-q5_k-768x768-g3.2-s30-b3-nordic2.dat --imat-in flux.1-lite-8B-q5_k-768x768-g3.5-s28-b3-anime1.dat (you can also include vae/text encoders)

And then run it with -m models/flux.1-lite-8B-q5_k-igo.gguf instead of --diffusion-model models/flux.1-lite-8B-q5_k-igo.gguf

@Green-Sky
Copy link
Contributor

Green-Sky commented Mar 31, 2025

Another issue. When I use flash attention it breaks the imat collection after a varying amount of images. (using sd_turbo here)

[WARN ] imatrix.cpp:140  - inf detected in model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn1.to_q.weight
Details
$ result/bin/sd -m models/sd_turbo-f16-q8_0.gguf --cfg-scale 1 --steps 8 --schedule karras -p "a lovely cat" --imat-out sd_turbo.imat -b 32 -s -1 --diffusion-fa

 IMPORTANT: imatrix file sd_turbo.imat already exists, but wasn't found in the imatrix inputs.
sd_turbo.imat will get overwritten!
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 2070, compute capability 7.5, VMM: yes
[INFO ] stable-diffusion.cpp:197  - loading model from 'models/sd_turbo-f16-q8_0.gguf'
[INFO ] model.cpp:915  - load models/sd_turbo-f16-q8_0.gguf using gguf format
[INFO ] stable-diffusion.cpp:244  - Version: SD 2.x
[INFO ] stable-diffusion.cpp:277  - Weight type:                 q8_0
[INFO ] stable-diffusion.cpp:278  - Conditioner weight type:     q8_0
[INFO ] stable-diffusion.cpp:279  - Diffusion model weight type: q8_0
[INFO ] stable-diffusion.cpp:280  - VAE weight type:             q8_0
[INFO ] stable-diffusion.cpp:328  - Using flash attention in the diffusion model
  |>                                                 | 3/1323 - 0.00it/s[INFO ] model.cpp:1915 - unknown tensor 'cond_stage_model.transformer.text_model.text_projection | q8_0 | 2 [1024, 1024, 1, 1, 1]' in model file
  |==================================================| 1323/1323 - 1000.00it/s
[INFO ] stable-diffusion.cpp:503  - total params memory size = 2006.07MB (VRAM 2006.07MB, RAM 0.00MB): clip 500.53MB(VRAM), unet 1411.07MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:522  - loading model from 'models/sd_turbo-f16-q8_0.gguf' completed, taking 0.73s
[INFO ] stable-diffusion.cpp:556  - running in eps-prediction mode
[INFO ] stable-diffusion.cpp:566  - running with Karras schedule
[INFO ] stable-diffusion.cpp:690  - Attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:1246 - apply_loras completed, taking 0.00s
[INFO ] stable-diffusion.cpp:1379 - get_learned_condition completed, taking 51 ms
[INFO ] stable-diffusion.cpp:1402 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:1439 - generating image: 1/32 - seed 832414162
  |==================================================| 8/8 - 3.60it/s
[INFO ] stable-diffusion.cpp:1478 - sampling completed, taking 2.30s
[INFO ] stable-diffusion.cpp:1439 - generating image: 2/32 - seed 832414163
  |==================================================| 8/8 - 3.64it/s
[INFO ] stable-diffusion.cpp:1478 - sampling completed, taking 2.27s
[INFO ] stable-diffusion.cpp:1439 - generating image: 3/32 - seed 832414164
  |==================================================| 8/8 - 3.64it/s
[INFO ] stable-diffusion.cpp:1478 - sampling completed, taking 2.27s
[INFO ] stable-diffusion.cpp:1439 - generating image: 4/32 - seed 832414165
  |==================================================| 8/8 - 3.62it/s
[INFO ] stable-diffusion.cpp:1478 - sampling completed, taking 2.28s
[INFO ] stable-diffusion.cpp:1439 - generating image: 5/32 - seed 832414166
  |==================================================| 8/8 - 3.51it/s
[INFO ] stable-diffusion.cpp:1478 - sampling completed, taking 2.29s
[INFO ] stable-diffusion.cpp:1439 - generating image: 6/32 - seed 832414167
  |==================================================| 8/8 - 3.64it/s
[INFO ] stable-diffusion.cpp:1478 - sampling completed, taking 2.26s
[INFO ] stable-diffusion.cpp:1439 - generating image: 7/32 - seed 832414168
  |==================================================| 8/8 - 3.68it/s
[INFO ] stable-diffusion.cpp:1478 - sampling completed, taking 2.26s
[INFO ] stable-diffusion.cpp:1439 - generating image: 8/32 - seed 832414169
  |==================================================| 8/8 - 3.65it/s
[INFO ] stable-diffusion.cpp:1478 - sampling completed, taking 2.26s
[INFO ] stable-diffusion.cpp:1439 - generating image: 9/32 - seed 832414170
  |============>                                     | 2/8 - 3.65it/s[WARN ] imatrix.cpp:140  - inf detected in model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn1.to_q.weight

update: happened after much longer without flash attention too.

[WARN ] imatrix.cpp:140  - inf detected in model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn1.to_q.weight

@Green-Sky
Copy link
Contributor

Green-Sky commented Mar 31, 2025

Try with result/bin/sd -M convert --diffusion-model models/flux.1-lite-8B.safetensors --type q5_K -o models/flux.1-lite-8B-q5_k-igo.gguf --imat-in flux.1-lite-8B-q5_k-1024x768-g3.3-s24-b3-asia1.dat --imat-in flux.1-lite-8B-q5_k-512x768-g3.4-s30-b3-nordic1.dat --imat-in flux.1-lite-8B-q5_k-768x768-g3.2-s30-b3-nordic2.dat --imat-in flux.1-lite-8B-q5_k-768x768-g3.5-s28-b3-anime1.dat (you can also include vae/text encoders)

And then run it with -m models/flux.1-lite-8B-q5_k-igo.gguf instead of --diffusion-model models/flux.1-lite-8B-q5_k-igo.gguf

This seems to have worked. Not a fan of the tensor renaming though.

base q5_k q5_k with q5_k imat diff
flux 1-lite-8B-q5_k-anime-base-s545345362 flux 1-lite-8B-q5_k-anime-igo-q5_k-s545345362 flux 1-lite-8B-q5_k-anime-igo-q5_k-diff-to-q5_k-s545345362

(they where obv identical before)

base q4_k q4_k with q5_k imat diff
flux 1-lite-8B-q4_k-anime-base-s545345362 flux 1-lite-8B-q4_k-anime-igo-q5_k-s545345362 image

It looks like the imat from q5_k made q4_k stray more. I don't have the full size model image for this example (it's too expensive ngl), but to me this looks worse. However, the importance guided quant seems to have less dither noise, so it got better somewhere...


I was trying to measure the visual quality difference of the quants, and I saw and remembered that flux specifically shows dither-like patterns when you go lower with the quants. So I tried to measure that with gimp. I first applied a high-pass filter (at 0.5 std and 4 contrast) and then used the histogram plot.

base q5_k q5_k with q5_k imat
image image

Base is spread out a little more, so this should mean there is indeed more high frequency noise, but this is just a single sample AND a highly experimental and somewhat subjective analysis 😅

@stduhpf
Copy link
Contributor Author

stduhpf commented Mar 31, 2025

This seems to have worked. Not a fan of the tensor renaming though.

Yes, this is a bit annoying for flux models. I thought of adding a way to extract the diffusion model (or other components like vae or text encoders) from the model file, but I feel like this is getting a bit out of scope for this PR. (Something like sd.exe -M extract -m "models/flux.1-lite-8B-q5_k-igo.gguf" --diffusion-model "models/flux.1-lite-8B-q5_k-igo.gguf" maybe, or like sd.exe -M extract -m "models/flux.1-lite-8B-q5_k-igo.gguf" -o "models/flux.1-lite-8B-q5_k-igo.gguf" -p "model.diffusion_model"

@stduhpf
Copy link
Contributor Author

stduhpf commented Mar 31, 2025

@Green-Sky I've had a similar problem with sd3 q4_k. (with fp16 imatrix). For some reason the outputs of the q4_k model seems to stray further from the full precision model when using imatrix, but it still seem to minimise artifacts. (I think q6_k shows a similar behavior to a lesser extent)

@Green-Sky
Copy link
Contributor

@Green-Sky I've had a similar problem with sd3 q4_k. (with fp16 imatrix). For some reason the outputs of the q4_k model seems to stray further from the full precision model when using imatrix, but it still seem to minimise artifacts. (I think q6_k shows a similar behavior to a lesser extent)

Interesting. Here is q2_k, which seems to have the same, but much more noticeable behavior.

q2_k q2_k with q5_k imat
flux 1-lite-8B-q2_k-anime-base-s545345362 flux 1-lite-8B-q2_k-anime-igo-q5_k-s545345362

@Green-Sky
Copy link
Contributor

q3_k q3_k with q5_k imat
flux 1-lite-8B-q3_k-anime-base-s545345362 flux 1-lite-8B-q3_k-anime-igo-q5_k-s545345362

the imat version might look more coherent, but it also looks more blury. So over all for qx_k quants, imat seems to act more like a blur, at least a little.

I will run some q8_0 imat profiling on cpu tomorrow to get a better source of truth.

@Green-Sky
Copy link
Contributor

Green-Sky commented Apr 1, 2025

I will run some q8_0 imat profiling on cpu tomorrow to get a better source of truth.

I actually cant, it crashes.

result/bin/sd --diffusion-model models/flux.1-lite-8B-q8_0.gguf --clip_l models/flux-extra/clip_l-q8_0.gguf --t5xxl models/flux-extra/t5xxl_q8_0.gguf --vae models/flux-extra/ae-f16.gguf -v --color -t 8 --cfg-scale 1 --sampling-method euler --steps 24 --guidance 3.3 -W 1024 -H 768 -p "a lovely cat" --imat-out imat.dat
Option:
    n_threads:         8
    mode:              txt2img
    model_path:
    wtype:             unspecified
    clip_l_path:       models/flux-extra/clip_l-q8_0.gguf
    clip_g_path:
    t5xxl_path:        models/flux-extra/t5xxl_q8_0.gguf
    diffusion_model_path:   models/flux.1-lite-8B-q8_0.gguf
    vae_path:          models/flux-extra/ae-f16.gguf
    taesd_path:
    esrgan_path:
    controlnet_path:
    embeddings_path:
    stacked_id_embeddings_path:
    input_id_images_path:
    style ratio:       20.00
    normalize input image :  false
    output_path:       output.png
    init_img:
    mask_img:
    control_image:
    clip on cpu:       false
    controlnet cpu:    false
    vae decoder on cpu:false
    diffusion flash attention:false
    strength(control): 0.90
    prompt:            a lovely cat
    negative_prompt:
    min_cfg:           1.00
    cfg_scale:         1.00
    slg_scale:         0.00
    guidance:          3.30
    eta:               0.00
    clip_skip:         -1
    width:             1024
    height:            768
    sample_method:     euler
    schedule:          default
    sample_steps:      24
    strength(img2img): 0.75
    rng:               cuda
    seed:              42
    batch_count:       1
    vae_tiling:        false
    upscale_repeats:   1
System Info:
    SSE3 = 1
    AVX = 1
    AVX2 = 1
    AVX512 = 0
    AVX512_VBMI = 0
    AVX512_VNNI = 0
    FMA = 1
    NEON = 0
    ARM_FMA = 0
    F16C = 1
    FP16_VA = 0
    WASM_SIMD = 0
    VSX = 0
[DEBUG] stable-diffusion.cpp:188  - Using CPU backend
[INFO ] stable-diffusion.cpp:204  - loading clip_l from 'models/flux-extra/clip_l-q8_0.gguf'
[INFO ] model.cpp:915  - load models/flux-extra/clip_l-q8_0.gguf using gguf format
[DEBUG] model.cpp:932  - init from 'models/flux-extra/clip_l-q8_0.gguf'
[INFO ] stable-diffusion.cpp:218  - loading t5xxl from 'models/flux-extra/t5xxl_q8_0.gguf'
[INFO ] model.cpp:915  - load models/flux-extra/t5xxl_q8_0.gguf using gguf format
[DEBUG] model.cpp:932  - init from 'models/flux-extra/t5xxl_q8_0.gguf'
[INFO ] stable-diffusion.cpp:225  - loading diffusion model from 'models/flux.1-lite-8B-q8_0.gguf'
[INFO ] model.cpp:915  - load models/flux.1-lite-8B-q8_0.gguf using gguf format
[DEBUG] model.cpp:932  - init from 'models/flux.1-lite-8B-q8_0.gguf'
[INFO ] stable-diffusion.cpp:232  - loading vae from 'models/flux-extra/ae-f16.gguf'
[INFO ] model.cpp:915  - load models/flux-extra/ae-f16.gguf using gguf format
[DEBUG] model.cpp:932  - init from 'models/flux-extra/ae-f16.gguf'
[INFO ] stable-diffusion.cpp:244  - Version: Flux
[INFO ] stable-diffusion.cpp:277  - Weight type:                 q8_0
[INFO ] stable-diffusion.cpp:278  - Conditioner weight type:     q8_0
[INFO ] stable-diffusion.cpp:279  - Diffusion model weight type: q8_0
[INFO ] stable-diffusion.cpp:280  - VAE weight type:             f16
[DEBUG] stable-diffusion.cpp:282  - ggml tensor size = 400 bytes
[DEBUG] clip.hpp:171  - vocab size: 49408
[DEBUG] clip.hpp:182  -  trigger word img already in vocab
[INFO ] flux.hpp:889  - Flux blocks: 8 double, 38 single
[DEBUG] ggml_extend.hpp:1169 - clip params backend buffer size =  231.50 MB(RAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1169 - t5 params backend buffer size =  4826.11 MB(RAM) (219 tensors)
[DEBUG] ggml_extend.hpp:1169 - flux params backend buffer size =  8457.01 MB(RAM) (516 tensors)
[DEBUG] ggml_extend.hpp:1169 - vae params backend buffer size =  94.57 MB(RAM) (138 tensors)
[DEBUG] stable-diffusion.cpp:419  - loading weights
[DEBUG] model.cpp:1737 - loading tensors from models/flux-extra/clip_l-q8_0.gguf
  |========>                                         | 196/1176 - 0.00it/s[DEBUG] model.cpp:1737 - loading tensors from models/flux-extra/t5xxl_q8_0.gguf
  |=================>                                | 413/1176 - 0.00it/s[INFO ] model.cpp:1915 - unknown tensor 'text_encoders.t5xxl.transformer.encoder.embed_tokens.weight | q8_0 | 2 [4096, 32128, 1, 1, 1]' in model file
  |=================>                                | 416/1176 - 9.17it/s[DEBUG] model.cpp:1737 - loading tensors from models/flux.1-lite-8B-q8_0.gguf
  |=======================================>          | 932/1176 - 25.00it/s[DEBUG] model.cpp:1737 - loading tensors from models/flux-extra/ae-f16.gguf
  |=============================================>    | 1070/1176 - 200.00it/s[INFO ] stable-diffusion.cpp:503  - total params memory size = 13609.19MB (VRAM 0.00MB, RAM 13609.19MB): clip 5057.61MB(RAM), unet 8457.01MB(RAM), vae 94.57MB(RAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM)
[INFO ] stable-diffusion.cpp:522  - loading model from '' completed, taking 10.34s
[INFO ] stable-diffusion.cpp:543  - running in Flux FLOW mode
[DEBUG] stable-diffusion.cpp:600  - finished loaded file
[DEBUG] stable-diffusion.cpp:1548 - txt2img 1024x768
[DEBUG] stable-diffusion.cpp:1241 - prompt after extract and remove lora: "a lovely cat"
[INFO ] stable-diffusion.cpp:690  - Attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:1246 - apply_loras completed, taking 0.00s
[DEBUG] conditioner.hpp:1055 - parse 'a lovely cat' to [['a lovely cat', 1], ]
[DEBUG] clip.hpp:311  - token length: 77
[DEBUG] t5.hpp:397  - token length: 256
[DEBUG] clip.hpp:737  - Missing text_projection matrix, assuming identity...
[DEBUG] ggml_extend.hpp:1121 - clip compute buffer size: 1.40 MB(RAM)
[DEBUG] clip.hpp:737  - Missing text_projection matrix, assuming identity...
[DEBUG] ggml_extend.hpp:1121 - t5 compute buffer size: 68.25 MB(RAM)
[DEBUG] conditioner.hpp:1170 - computing condition graph completed, taking 9611 ms
[INFO ] stable-diffusion.cpp:1379 - get_learned_condition completed, taking 9615 ms
[INFO ] stable-diffusion.cpp:1402 - sampling using Euler method
[INFO ] stable-diffusion.cpp:1439 - generating image: 1/1 - seed 42
[DEBUG] stable-diffusion.cpp:808  - Sample
[DEBUG] ggml_extend.hpp:1121 - flux compute buffer size: 1662.42 MB(RAM)
Segmentation fault (core dumped)

The reason seems to be a null buffer here:
https://github.com/stduhpf/stable-diffusion.cpp/blob/71eed146cd78ab771761888169cab7d82d90a5bb/imatrix.cpp#L54

update: It's with any model type (q8_0, f16, q5_k tested)

stacktrace:

#0  ggml_backend_buffer_get_type (buffer=0x0) at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/ggml/src/ggml-backend.cpp:175
#1  0x00000000005430f5 in ggml_backend_buffer_is_host (buffer=<optimized out>) at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/ggml/src/ggml-backend.cpp:158
#2  0x0000000000507363 in IMatrixCollector::collect_imatrix (this=this@entry=0x8c3aa0 <imatrix_collector>, t=0x6153370, ask=false, user_data=<optimized out>)
    at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/imatrix.cpp:54
#3  0x0000000000436254 in collect_imatrix (t=<optimized out>, ask=<optimized out>, user_data=<optimized out>)
    at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/model.cpp:2127
#4  0x000000000048b25b in GGMLRunner::compute(std::function<ggml_cgraph* ()>, int, bool, ggml_tensor**, ggml_context*) (this=this@entry=0xc7ba58, get_graph=...,
    n_threads=n_threads@entry=8, free_compute_buffer_immediately=free_compute_buffer_immediately@entry=false, output=output@entry=0x7fffffffb160,
    output_ctx=output_ctx@entry=0x0) at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/ggml_extend.hpp:1263
#5  0x000000000048b912 in Flux::FluxRunner::compute (this=this@entry=0xc7ba58, n_threads=n_threads@entry=8, x=<optimized out>, x@entry=0x7ffc9e66c0d0,
    timesteps=<optimized out>, timesteps@entry=0x7ffc9e96c790, context=<optimized out>, context@entry=0x7ffc9e4ebbc0, c_concat=<optimized out>, c_concat@entry=0x0,
    y=<optimized out>, guidance=<optimized out>, output=0x7fffffffb160, output_ctx=0x0, skip_layers=std::vector of length 0, capacity 0)
    at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/flux.hpp:970
#6  0x0000000000494901 in FluxModel::compute (this=this@entry=0xc7ba50, n_threads=8, x=0x7ffc9e66c0d0, timesteps=timesteps@entry=0x7ffc9e96c790, context=0x7ffc9e4ebbc0,
    c_concat=0x0, y=0x7ffc9dcea500, guidance=0x7ffc9e96c950, num_video_frames=-1, controls=std::vector of length 0, capacity 0,
    control_strength=control_strength@entry=0.899999976, output=0x7fffffffb160, output_ctx=0x0, skip_layers=std::vector of length 0, capacity 0)
    at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/diffusion_model.hpp:178
#7  0x0000000000496b6d in StableDiffusionGGML::sample(ggml_context*, ggml_tensor*, ggml_tensor*, SDCondition, SDCondition, ggml_tensor*, float, float, float, float, float, sample_method_t, std::vector<float, std::allocator<float> > const&, int, SDCondition, std::vector<int, std::allocator<int> >, float, float, float, ggml_tensor*)::{lambda(ggml_tensor*, float, int)#1}::operator()(ggml_tensor*, float, int) const (__closure=0x965cb0, input=0x7ffc9e5abf20, sigma=<optimized out>, step=1)
...

update2: same for sd_turbo

#0  ggml_backend_buffer_get_type (buffer=0x0) at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/ggml/src/ggml-backend.cpp:175
#1  0x00000000005430f5 in ggml_backend_buffer_is_host (buffer=<optimized out>) at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/ggml/src/ggml-backend.cpp:158
#2  0x0000000000507363 in IMatrixCollector::collect_imatrix (this=this@entry=0x8c3aa0 <imatrix_collector>, t=0x32925d0, ask=false, user_data=<optimized out>)
    at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/imatrix.cpp:54
#3  0x0000000000436254 in collect_imatrix (t=<optimized out>, ask=<optimized out>, user_data=<optimized out>)
    at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/model.cpp:2127
#4  0x000000000048b25b in GGMLRunner::compute(std::function<ggml_cgraph* ()>, int, bool, ggml_tensor**, ggml_context*) (this=this@entry=0xb58ad8, get_graph=...,
    n_threads=n_threads@entry=12, free_compute_buffer_immediately=free_compute_buffer_immediately@entry=false, output=output@entry=0x7fffffffb3b8,
    output_ctx=output_ctx@entry=0x0) at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/ggml_extend.hpp:1263
#5  0x000000000048b4c4 in UNetModelRunner::compute (this=this@entry=0xb58ad8, n_threads=n_threads@entry=12, x=<optimized out>, x@entry=0x2716b30, timesteps=<optimized out>,
    timesteps@entry=0x2719290, context=<optimized out>, context@entry=0x27170e0, c_concat=<optimized out>, c_concat@entry=0x0, y=<optimized out>,
    num_video_frames=<optimized out>, controls=std::vector of length 0, capacity 0, control_strength=<optimized out>, control_strength@entry=0, output=0x7fffffffb3b8,
    output_ctx=0x0) at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/unet.hpp:615
#6  0x00000000004964dc in UNetModel::compute (this=0xb58ad0, n_threads=12, x=0x2716b30, timesteps=0x2719290, context=0x27170e0, c_concat=0x0, y=0x0, guidance=0x0,
    num_video_frames=-1, controls=std::vector of length 0, capacity 0, control_strength=0, output=0x7fffffffb3b8, output_ctx=0x0,
    skip_layers=std::vector of length 0, capacity 0) at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/diffusion_model.hpp:78
#7  0x0000000000485dba in StableDiffusionGGML::is_using_v_parameterization_for_sd2 (this=this@entry=0x9655d0, work_ctx=work_ctx@entry=0xcb00a0, is_inpaint=<optimized out>)
    at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/stable-diffusion.cpp:621
#8  0x00000000004e6a9b in StableDiffusionGGML::load_from_file (this=this@entry=0x9655d0, model_path="models/sd_turbo-f16-q8_0.gguf", clip_l_path="", clip_g_path="",
    t5xxl_path="", diffusion_model_path="", vae_path=..., control_net_path=..., embeddings_path=..., id_embeddings_path=..., taesd_path=..., vae_tiling_=<optimized out>,
    wtype=<optimized out>, schedule=<optimized out>, clip_on_cpu=<optimized out>, control_net_cpu=<optimized out>, vae_on_cpu=<optimized out>,
    diffusion_flash_attn=<optimized out>) at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/stable-diffusion.cpp:527
#9  0x00000000004760ad in new_sd_ctx (model_path_c_str=<optimized out>, clip_l_path_c_str=0x7fffffffbb58 "", clip_g_path_c_str=0x7fffffffbb78 "",
    t5xxl_path_c_str=0x7fffffffbb98 "", diffusion_model_path_c_str=0x7fffffffbbb8 "", vae_path_c_str=0x7fffffffbbd8 "", taesd_path_c_str=0x7fffffffbbf8 "",
    control_net_path_c_str=0x7fffffffbc38 "", lora_model_dir_c_str=0x7fffffffbcc0 "", embed_dir_c_str=0x7fffffffbc58 "", id_embed_dir_c_str=0x7fffffffbc78 "",
    vae_decode_only=true, vae_tiling=false, free_params_immediately=true, n_threads=12, wtype=SD_TYPE_COUNT, rng_type=CUDA_RNG, s=KARRAS, keep_clip_on_cpu=false,
    keep_control_net_cpu=false, keep_vae_on_cpu=false, diffusion_flash_attn=false) at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/stable-diffusion.cpp:1159
#10 0x0000000000420451 in main (argc=<optimized out>, argv=<optimized out>) at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/examples/cli/main.cpp:926

@stduhpf
Copy link
Contributor Author

stduhpf commented Apr 1, 2025

@Green-Sky It should be fixed now.

@Green-Sky
Copy link
Contributor

Green-Sky commented Apr 1, 2025

@stduhpf works thanks. running some at incredible 541.82s/it.

But looks like cpu inference is broken for that model..., so imat might be of questionable quality.
(probably not related to this pr)

update: I did a big oopsy and forgot to add --cfg-scale 1 --sampling-method euler

@Green-Sky
Copy link
Contributor

I ran the freshly f16 generated imatrix file through ggml-org/llama.cpp#12718 :

$ result/bin/llama-imatrix --show-statistics --in-file ../stable-diffusion.cpp/flux.1-lite-8B-f16-768x768-g3.5-s28-b1-anime1.dat

Computing statistics for ../stable-diffusion.cpp/flux.1-lite-8B-f16-768x768-g3.5-s28-b1-anime1.dat (444 tensors)

 Layer	               Tensor	          μ(Importance Scores)	   Contribution
================================================================================
    -	                            wo	          83.69	              5.9814 %
    -	                             0	          82.23	              5.8774 %
    -	                          proj	          64.58	              4.6155 %
    -	                             2	          61.22	              4.3756 %
    -	                          proj	          61.04	              4.3627 %
    -	                          proj	          45.07	              3.2214 %
    -	                            wo	          35.81	              2.5595 %
    -	                       linear2	          32.47	              2.3209 %
    -	                          proj	          31.14	              2.2260 %
    -	                          proj	          30.86	              2.2058 %
    -	                      in_layer	          28.69	              2.0506 %
    -	                          proj	          28.63	              2.0465 %
    -	                        img_in	          21.10	              1.5079 %
    -	                          proj	          20.19	              1.4431 %
    -	                             2	          16.62	              1.1883 %
    -	                             0	          16.57	              1.1847 %
    -	                             0	          15.83	              1.1315 %
    -	                          proj	          14.21	              1.0154 %
    -	                             0	          14.13	              1.0097 %
    -	                      in_layer	          14.00	              1.0007 %
    -	                      in_layer	          14.00	              1.0006 %
    -	                          proj	          13.91	              0.9942 %
    -	                             0	          13.52	              0.9663 %
    -	                           qkv	          13.28	              0.9490 %
    -	                            wo	          11.81	              0.8442 %
    -	                           qkv	          11.78	              0.8419 %
    -	                           qkv	          11.66	              0.8331 %
    -	                           qkv	          10.23	              0.7314 %
    -	                       linear2	           9.72	              0.6950 %
    -	                       linear1	           9.15	              0.6540 %
    -	                          proj	           8.44	              0.6034 %
    -	                             2	           8.44	              0.6033 %
    -	                       linear1	           7.96	              0.5692 %
    -	                           qkv	           7.81	              0.5579 %
    -	                             0	           7.54	              0.5388 %
    -	                       linear2	           7.40	              0.5293 %
    -	                       linear2	           7.09	              0.5070 %
    -	                          proj	           7.08	              0.5064 %
    -	                           fc1	           6.90	              0.4929 %
    -	                           fc1	           6.43	              0.4595 %
    -	                           qkv	           5.80	              0.4142 %
    -	                             0	           5.72	              0.4089 %
    -	                          proj	           5.60	              0.4001 %
    -	                           qkv	           5.51	              0.3935 %
    -	                             0	           5.42	              0.3871 %
    -	                       linear2	           5.24	              0.3742 %
    -	                       linear1	           5.21	              0.3722 %
    -	                       linear1	           5.09	              0.3641 %
    -	                          proj	           5.09	              0.3639 %
    -	                           qkv	           4.79	              0.3427 %
    -	                           fc1	           4.68	              0.3344 %
    -	                           qkv	           4.67	              0.3335 %
    -	                           fc1	           4.62	              0.3305 %
    -	                       linear2	           4.62	              0.3302 %
    -	                       linear1	           4.47	              0.3196 %
    -	                       linear1	           4.46	              0.3187 %
    -	                           fc1	           4.46	              0.3186 %
    -	                       linear1	           4.44	              0.3172 %
    -	                           fc1	           4.39	              0.3140 %
    -	                           qkv	           4.37	              0.3124 %
    -	                       linear1	           4.35	              0.3110 %
    -	                       linear1	           4.31	              0.3082 %
    -	                       linear1	           4.29	              0.3067 %
    -	                       linear1	           4.29	              0.3063 %
    -	                           fc1	           4.17	              0.2981 %
    -	                       linear1	           4.16	              0.2974 %
    -	                       linear1	           4.15	              0.2964 %
    -	                       linear1	           4.12	              0.2942 %
    -	                        linear	           4.10	              0.2931 %
    -	                       linear1	           4.09	              0.2927 %
    -	                       linear1	           4.06	              0.2904 %
    -	                       linear2	           4.00	              0.2857 %
    -	                           fc1	           3.97	              0.2840 %
    -	                           qkv	           3.94	              0.2816 %
    -	                       linear2	           3.93	              0.2805 %
    -	                       linear2	           3.76	              0.2684 %
    -	                       linear1	           3.75	              0.2677 %
    -	                           fc1	           3.65	              0.2608 %
    -	                           qkv	           3.63	              0.2596 %
    -	                       linear1	           3.61	              0.2581 %
    -	                       linear1	           3.61	              0.2577 %
    -	                       linear2	           3.51	              0.2512 %
    -	                           fc1	           3.46	              0.2470 %
    -	                       linear2	           3.45	              0.2464 %
    -	                           fc1	           3.44	              0.2458 %
    -	                       linear1	           3.42	              0.2445 %
    -	                       linear1	           3.40	              0.2427 %
    -	                       linear2	           3.33	              0.2383 %
    -	                           fc1	           3.32	              0.2372 %
    -	                       linear1	           3.32	              0.2371 %
    -	                             2	           3.30	              0.2361 %
    -	                       linear1	           3.30	              0.2361 %
    -	                       linear1	           3.30	              0.2359 %
    -	                       linear1	           3.27	              0.2338 %
    -	                          proj	           3.26	              0.2333 %
    -	                       linear1	           3.24	              0.2318 %
    -	                       linear2	           3.19	              0.2280 %
    -	                       linear2	           3.15	              0.2251 %
    -	                       linear2	           3.14	              0.2244 %
    -	                       linear2	           3.13	              0.2238 %
    -	                       linear2	           3.12	              0.2229 %
    -	                       linear2	           3.11	              0.2222 %
    -	                       linear2	           3.10	              0.2214 %
    -	                       linear1	           3.01	              0.2154 %
    -	                       linear1	           2.99	              0.2140 %
    -	                       linear1	           2.99	              0.2137 %
    -	                             0	           2.98	              0.2132 %
    -	                       linear2	           2.96	              0.2118 %
    -	                           qkv	           2.92	              0.2089 %
    -	                       linear2	           2.89	              0.2062 %
    -	                       linear1	           2.88	              0.2062 %
    -	                       linear1	           2.85	              0.2037 %
    -	                       linear2	           2.84	              0.2030 %
    -	                            wo	           2.83	              0.2022 %
    -	                       linear1	           2.82	              0.2017 %
    -	                       linear2	           2.80	              0.2001 %
    -	                       linear1	           2.76	              0.1974 %
    -	                       linear1	           2.74	              0.1961 %
    -	                       linear1	           2.74	              0.1960 %
    -	                       linear2	           2.72	              0.1942 %
    -	                       linear2	           2.71	              0.1937 %
    -	                       linear1	           2.68	              0.1918 %
    -	                       linear1	           2.66	              0.1901 %
    -	                       linear2	           2.65	              0.1898 %
    -	                           qkv	           2.62	              0.1875 %
    -	                       linear2	           2.61	              0.1863 %
    -	                       linear2	           2.59	              0.1851 %
    -	                        v_proj	           2.57	              0.1836 %
    -	                        k_proj	           2.57	              0.1836 %
    -	                        q_proj	           2.57	              0.1836 %
    -	                       linear2	           2.56	              0.1829 %
    -	                        q_proj	           2.47	              0.1765 %
    -	                        v_proj	           2.47	              0.1765 %
    -	                        k_proj	           2.47	              0.1765 %
    -	                             0	           2.46	              0.1760 %
    -	                       linear1	           2.43	              0.1736 %
    -	                           qkv	           2.43	              0.1734 %
    -	                           qkv	           2.34	              0.1674 %
    -	                             0	           2.32	              0.1661 %
    -	                        v_proj	           2.32	              0.1658 %
    -	                        k_proj	           2.32	              0.1658 %
    -	                        q_proj	           2.32	              0.1658 %
    -	                       linear2	           2.30	              0.1644 %
    -	                            wo	           2.30	              0.1642 %
    -	                          proj	           2.26	              0.1613 %
    -	                       linear2	           2.21	              0.1580 %
    -	                             o	           2.12	              0.1513 %
    -	                        k_proj	           2.07	              0.1478 %
    -	                        v_proj	           2.07	              0.1478 %
    -	                        q_proj	           2.07	              0.1478 %
    -	                       linear2	           2.07	              0.1478 %
    -	                        v_proj	           2.07	              0.1477 %
    -	                        q_proj	           2.07	              0.1477 %
    -	                        k_proj	           2.07	              0.1477 %
    -	                       linear2	           2.05	              0.1468 %
    -	                        v_proj	           2.02	              0.1443 %
    -	                        k_proj	           2.02	              0.1443 %
    -	                        q_proj	           2.02	              0.1443 %
    -	                       linear2	           1.96	              0.1399 %
    -	                             o	           1.95	              0.1397 %
    -	                       linear2	           1.95	              0.1394 %
    -	                             o	           1.87	              0.1333 %
    -	                       linear2	           1.82	              0.1303 %
    -	                             2	           1.80	              0.1288 %
    -	                        v_proj	           1.79	              0.1279 %
    -	                        q_proj	           1.79	              0.1279 %
    -	                        k_proj	           1.79	              0.1279 %
    -	                       linear2	           1.77	              0.1266 %
    -	                       linear2	           1.74	              0.1247 %
    -	                            wo	           1.70	              0.1219 %
    -	                           fc2	           1.64	              0.1175 %
    -	                             2	           1.62	              0.1160 %
    -	                             2	           1.53	              0.1092 %
    -	                             2	           1.49	              0.1064 %
    -	                        q_proj	           1.47	              0.1051 %
    -	                        v_proj	           1.47	              0.1051 %
    -	                        k_proj	           1.47	              0.1051 %
    -	                             0	           1.46	              0.1043 %
    -	                        v_proj	           1.37	              0.0976 %
    -	                        q_proj	           1.37	              0.0976 %
    -	                        k_proj	           1.37	              0.0976 %
    -	                        q_proj	           1.28	              0.0916 %
    -	                        k_proj	           1.28	              0.0916 %
    -	                        v_proj	           1.28	              0.0916 %
    -	                             0	           1.23	              0.0881 %
    -	                             2	           1.19	              0.0850 %
    -	                             2	           1.16	              0.0832 %
    -	                             o	           1.03	              0.0735 %
    -	                            wo	           1.00	              0.0711 %
    -	                        k_proj	           0.97	              0.0691 %
    -	                        v_proj	           0.97	              0.0691 %
    -	                        q_proj	           0.97	              0.0691 %
    -	                        k_proj	           0.97	              0.0691 %
    -	                        v_proj	           0.97	              0.0691 %
    -	                        q_proj	           0.97	              0.0691 %
    -	                             0	           0.76	              0.0546 %
    -	                             o	           0.74	              0.0529 %
    -	                             o	           0.70	              0.0503 %
    -	                            wo	           0.70	              0.0500 %
    -	                             2	           0.65	              0.0464 %
    -	                             2	           0.65	              0.0463 %
    -	                          proj	           0.62	              0.0445 %
    -	                             2	           0.61	              0.0436 %
    -	                             0	           0.60	              0.0428 %
    -	                             o	           0.55	              0.0395 %
    -	                             2	           0.50	              0.0356 %
    -	                             o	           0.48	              0.0344 %
    -	                             o	           0.48	              0.0342 %
    -	                            wo	           0.45	              0.0324 %
    -	                             2	           0.41	              0.0293 %
    -	                            wo	           0.40	              0.0285 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                             1	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                           lin	           0.39	              0.0280 %
    -	                            wo	           0.39	              0.0276 %
    -	                            wo	           0.38	              0.0269 %
    -	                             o	           0.37	              0.0267 %
    -	                             o	           0.36	              0.0261 %
    -	                             o	           0.36	              0.0258 %
    -	                             0	           0.33	              0.0233 %
    -	                             o	           0.30	              0.0218 %
    -	                            wo	           0.30	              0.0217 %
    -	                     out_layer	           0.30	              0.0214 %
    -	                        txt_in	           0.30	              0.0212 %
    -	                             o	           0.25	              0.0182 %
    -	                            wo	           0.25	              0.0178 %
    -	                             2	           0.24	              0.0172 %
    -	                      out_proj	           0.24	              0.0169 %
    -	                             o	           0.21	              0.0153 %
    -	                             o	           0.21	              0.0150 %
    -	                             o	           0.21	              0.0148 %
    -	                             o	           0.20	              0.0145 %
    -	                             o	           0.20	              0.0140 %
    -	                     out_layer	           0.19	              0.0133 %
    -	                             o	           0.18	              0.0129 %
    -	                            wo	           0.18	              0.0128 %
    -	                            wo	           0.18	              0.0127 %
    -	                             o	           0.17	              0.0119 %
    -	                      out_proj	           0.16	              0.0113 %
    -	                             o	           0.13	              0.0089 %
    -	                             o	           0.12	              0.0086 %
    -	                            wo	           0.11	              0.0079 %
    -	                            wo	           0.10	              0.0074 %
    -	                            wo	           0.10	              0.0070 %
    -	                            wo	           0.08	              0.0058 %
    -	                            wo	           0.07	              0.0050 %
    -	                          wi_1	           0.07	              0.0049 %
    -	                          wi_0	           0.07	              0.0049 %
    -	                             o	           0.06	              0.0046 %
    -	                      out_proj	           0.06	              0.0044 %
    -	                      out_proj	           0.06	              0.0044 %
    -	                            wo	           0.06	              0.0042 %
    -	                     out_layer	           0.06	              0.0040 %
    -	                            wo	           0.05	              0.0037 %
    -	                             v	           0.05	              0.0035 %
    -	                             k	           0.05	              0.0035 %
    -	                             q	           0.05	              0.0035 %
    -	                            wo	           0.05	              0.0033 %
    -	                      out_proj	           0.05	              0.0032 %
    -	                      out_proj	           0.04	              0.0032 %
    -	                          wi_0	           0.04	              0.0030 %
    -	                          wi_1	           0.04	              0.0030 %
    -	                             q	           0.04	              0.0029 %
    -	                             v	           0.04	              0.0029 %
    -	                             k	           0.04	              0.0029 %
    -	                             k	           0.04	              0.0028 %
    -	                             v	           0.04	              0.0028 %
    -	                             q	           0.04	              0.0028 %
    -	                      out_proj	           0.04	              0.0026 %
    -	                          wi_1	           0.03	              0.0023 %
    -	                          wi_0	           0.03	              0.0023 %
    -	                           fc2	           0.03	              0.0022 %
    -	                          wi_1	           0.03	              0.0020 %
    -	                          wi_0	           0.03	              0.0020 %
    -	                             v	           0.03	              0.0020 %
    -	                             k	           0.03	              0.0020 %
    -	                             q	           0.03	              0.0020 %
    -	                          wi_1	           0.03	              0.0020 %
    -	                          wi_0	           0.03	              0.0020 %
    -	                             q	           0.03	              0.0019 %
    -	                             v	           0.03	              0.0019 %
    -	                             k	           0.03	              0.0019 %
    -	                             k	           0.03	              0.0018 %
    -	                             q	           0.03	              0.0018 %
    -	                             v	           0.03	              0.0018 %
    -	                          wi_0	           0.03	              0.0018 %
    -	                          wi_1	           0.03	              0.0018 %
    -	                             q	           0.02	              0.0018 %
    -	                             k	           0.02	              0.0018 %
    -	                             v	           0.02	              0.0018 %
    -	                             v	           0.02	              0.0017 %
    -	                             k	           0.02	              0.0017 %
    -	                             q	           0.02	              0.0017 %
    -	                           fc2	           0.02	              0.0016 %
    -	                      out_proj	           0.02	              0.0016 %
    -	                             q	           0.02	              0.0015 %
    -	                             k	           0.02	              0.0015 %
    -	                             v	           0.02	              0.0015 %
    -	                          wi_1	           0.02	              0.0014 %
    -	                          wi_0	           0.02	              0.0014 %
    -	                          wi_1	           0.02	              0.0012 %
    -	                          wi_0	           0.02	              0.0012 %
    -	                           fc2	           0.02	              0.0011 %
    -	                           fc2	           0.02	              0.0011 %
    -	                             k	           0.02	              0.0011 %
    -	                             q	           0.02	              0.0011 %
    -	                             v	           0.02	              0.0011 %
    -	                      out_proj	           0.02	              0.0011 %
    -	                          wi_0	           0.02	              0.0011 %
    -	                          wi_1	           0.02	              0.0011 %
    -	                          wi_0	           0.01	              0.0011 %
    -	                          wi_1	           0.01	              0.0011 %
    -	                          wi_0	           0.01	              0.0010 %
    -	                          wi_1	           0.01	              0.0010 %
    -	                          wi_0	           0.01	              0.0010 %
    -	                          wi_1	           0.01	              0.0010 %
    -	                           fc2	           0.01	              0.0010 %
    -	                             k	           0.01	              0.0010 %
    -	                             q	           0.01	              0.0010 %
    -	                             v	           0.01	              0.0010 %
    -	                             k	           0.01	              0.0009 %
    -	                             q	           0.01	              0.0009 %
    -	                             v	           0.01	              0.0009 %
    -	                             k	           0.01	              0.0009 %
    -	                             v	           0.01	              0.0009 %
    -	                             q	           0.01	              0.0009 %
    -	                          wi_0	           0.01	              0.0009 %
    -	                          wi_1	           0.01	              0.0009 %
    -	                             v	           0.01	              0.0009 %
    -	                             k	           0.01	              0.0009 %
    -	                             q	           0.01	              0.0009 %
    -	                             q	           0.01	              0.0009 %
    -	                             k	           0.01	              0.0009 %
    -	                             v	           0.01	              0.0009 %
    -	                             v	           0.01	              0.0009 %
    -	                             k	           0.01	              0.0009 %
    -	                             q	           0.01	              0.0009 %
    -	                          wi_0	           0.01	              0.0009 %
    -	                          wi_1	           0.01	              0.0009 %
    -	                             k	           0.01	              0.0009 %
    -	                             q	           0.01	              0.0009 %
    -	                             v	           0.01	              0.0009 %
    -	                          wi_0	           0.01	              0.0009 %
    -	                          wi_1	           0.01	              0.0009 %
    -	                           fc2	           0.01	              0.0008 %
    -	                          wi_0	           0.01	              0.0008 %
    -	                          wi_1	           0.01	              0.0008 %
    -	                      out_proj	           0.01	              0.0007 %
    -	                             v	           0.01	              0.0007 %
    -	                             q	           0.01	              0.0007 %
    -	                             k	           0.01	              0.0007 %
    -	                      out_proj	           0.01	              0.0007 %
    -	                             q	           0.01	              0.0007 %
    -	                             k	           0.01	              0.0007 %
    -	                             v	           0.01	              0.0007 %
    -	                           fc2	           0.01	              0.0007 %
    -	                          wi_0	           0.01	              0.0006 %
    -	                          wi_1	           0.01	              0.0006 %
    -	                             k	           0.01	              0.0006 %
    -	                             v	           0.01	              0.0006 %
    -	                             q	           0.01	              0.0006 %
    -	                             v	           0.01	              0.0006 %
    -	                             q	           0.01	              0.0006 %
    -	                             k	           0.01	              0.0006 %
    -	                          wi_0	           0.01	              0.0006 %
    -	                          wi_1	           0.01	              0.0006 %
    -	                             k	           0.01	              0.0006 %
    -	                             q	           0.01	              0.0006 %
    -	                             v	           0.01	              0.0006 %
    -	                           fc2	           0.01	              0.0006 %
    -	                          wi_0	           0.01	              0.0006 %
    -	                          wi_1	           0.01	              0.0006 %
    -	                             v	           0.01	              0.0006 %
    -	                             q	           0.01	              0.0006 %
    -	                             k	           0.01	              0.0006 %
    -	                          wi_0	           0.01	              0.0006 %
    -	                          wi_1	           0.01	              0.0006 %
    -	                      out_proj	           0.01	              0.0006 %
    -	                          wi_0	           0.01	              0.0005 %
    -	                          wi_1	           0.01	              0.0005 %
    -	                          wi_0	           0.01	              0.0005 %
    -	                          wi_1	           0.01	              0.0005 %
    -	                          wi_1	           0.01	              0.0005 %
    -	                          wi_0	           0.01	              0.0005 %
    -	                          wi_0	           0.01	              0.0005 %
    -	                          wi_1	           0.01	              0.0005 %
    -	                             q	           0.01	              0.0005 %
    -	                             v	           0.01	              0.0005 %
    -	                             k	           0.01	              0.0005 %
    -	                           fc2	           0.01	              0.0004 %
    -	                           fc2	           0.00	              0.0003 %
    -	                           fc2	           0.00	              0.0001 %

somewhat unreadable.

@idostyle
Copy link
Contributor

idostyle commented Apr 3, 2025

llama-imatrix --show-statistics assumes that layer naming follows "blk.%d" instead of the single_blocks.%d/double_blocks.%d naming in flux and flux-lite. Would have to adjust process_tensor_name in that PR accordingly.

@EAddario
Copy link

EAddario commented Apr 3, 2025

Haven't had much of an opportunity to play with T2I models yet but if someone can point me to a sample model and imatrix file, happy to make the necessary changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants