[GGUF] Refactor and decouple gguf checkpoint loading logic #34385

Isotr0py · 2024-10-24T15:59:49Z

What does this PR do?

Since there are more and more GGUF architectures have supported, the gguf loading logic (especially the load_gguf_checkpoint function) become complicated and couple tensor rename and tensor "reshape"/"permute" reverse.

Therefore, I think there is a need to refactor the gguf checkpoint loading logic for better maintenance.

This PR is going to refactor and cleanup the gguf model loading logic with following methods:

Introduce a get_gguf_hf_weights_map function that has been used in vLLM to infer gguf_weights_map (or GGUF_TENSOR_MAPPING we used in transformers currently) from model config with gguf package automatically.
Since GGUF_TENSOR_MAPPING can be inferred with above function, we can try to remove most of existing hardcoded mapping. And we don't need to ask contributor to add new mapping in GGUF_TENSOR_MAPPING anymore at most of cases.
~~Separate the GGUF tensors rename and "reshape"/"permute" reverse logic.~~

Since vLLM and transformers have a different model weights loading logic, it wiil cost some time for me to figure out a better method to finish these works. So I marked this PR as a draft.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Rocketknight1 · 2024-12-04T14:57:40Z

@SunMarc @MekkCyber for quantization - but let me know if someone else is handling the GGUF integration and I'll ping them instead in future!

HuggingFaceDocBuilderDev · 2024-12-04T15:22:44Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Signed-off-by: Isotr0py <[email protected]>

SunMarc · 2024-12-05T13:33:39Z

Feel free to ping @Isotr0py and me for anything related to gguf @Rocketknight1 !
Let me know when this is ready to be reviewed @Isotr0py

MekkCyber · 2024-12-05T13:51:52Z

@Rocketknight1 I'm diving into the code of gguf integration to fix some issues so feel free to ping me as well for related issues and PRs

Signed-off-by: Isotr0py <[email protected]>

Isotr0py · 2024-12-10T11:24:42Z

BTW, since the GGUF tests are very slow, I used a script to do quick tests for GGUF tensors renaming (which is modified in this PR):

import torch
from huggingface_hub import hf_hub_download

from transformers import AutoConfig, AutoModelForCausalLM
from transformers.modeling_gguf_pytorch_utils import load_gguf_checkpoint


model_id = "duyntnet/Qwen1.5-MoE-A2.7B-Chat-imatrix-GGUF"
gguf_model_id = "Qwen1.5-MoE-A2.7B-Chat-IQ1_M.gguf"

config = AutoConfig.from_pretrained(model_id, gguf_file=gguf_model_id)
with torch.device("meta"):
    model = AutoModelForCausalLM.from_config(config)
    expected_keys = set(model.state_dict().keys())

    gguf_path = hf_hub_download(model_id, filename=gguf_model_id)
    exact_keys = set(load_gguf_checkpoint(gguf_path, True, model)["tensors"].keys())

assert (
    expected_keys == exact_keys
), f"Following parameters are not initialized from GGUF file: {expected_keys - exact_keys}"

src/transformers/modeling_gguf_pytorch_utils.py

Signed-off-by: Isotr0py <[email protected]>

SunMarc

LGTM ! Thanks for this nice cleanup. It will make the maintenance + adding support to new models way easier. Did you check that if the tests are passing locally or not ? We can also trigger the slow tests if you prefer.

src/transformers/modeling_gguf_pytorch_utils.py

Signed-off-by: Isotr0py <[email protected]>

MekkCyber · 2024-12-10T21:36:36Z

Looks awesome ! More readable than it used to be ! Thanks for the clean up 🔥

src/transformers/modeling_gguf_pytorch_utils.py

MekkCyber · 2024-12-10T21:33:02Z

src/transformers/modeling_gguf_pytorch_utils.py

+    if named_children := hf_model.named_children():
+        for name, child in named_children:
+            sub_map = get_gguf_hf_weights_map(child, model_type, num_layers, qual_name=f"{qual_name}{name}.")
+            # Ignore the keys that are already in the main map to avoid overwriting
+            sub_map = {k: v for k, v in sub_map.items() if k not in gguf_to_hf_name_map}
+            gguf_to_hf_name_map.update(sub_map)


Do we need the whole recursion depth or only one layer of depth ?

I think depth=1 might be not enough, because some vision models based on AutoModelForVision2Seq might use AutoModelForCausalLM as language backbone. If they use BloomForCausalLM as backbone, this will require depth=2. (Although we won't face this case currently)

Since there is no significant slowdown in model loading between whole depth and depth=1, I think we can keep whole depth for code simplicity :)

BTW, about the model loading performance, the root bottleneck is the GGUFReader itself, because its implementation uses numpy memmap for metadata extraction incorrectly, which cause a terrible slowdown. 😅

Ah okay thanks for the clarifications @Isotr0py 🔥

For the GGUFReader if you think there is a problem in the way it's handled I think it would be nice to open an issue in the gguf repo 😊

Yea, I have opened a PR in llama.cpp to improve the performance of GGUFReader: ggml-org/llama.cpp#10159 😄

Signed-off-by: Isotr0py <[email protected]>

Isotr0py · 2024-12-16T11:45:10Z

I have run slow tests for all architectures we supported (except falcon-40b) locally, and most of them have passed.

Only Nemotron is failling because its tokenizer failed to initialize, it also failed on the main branch with same reason. So I think it's not related to this PR, and we need to fix it in a following PR.

(Or we should re-organize the GGUF tests, because it's too messy 😅)

MekkCyber · 2024-12-16T11:49:20Z

Yes the nemotron integration fails because of the tokenizer, I'm working on it

ArthurZucker

LGTM sorry for the delay, Christmas and new year vacation hit me! 🤗

ArthurZucker · 2025-01-06T17:00:52Z

src/transformers/modeling_gguf_pytorch_utils.py

+    where N signifies the block number of a layer, and BB signifies the
+    attention/mlp layer components.
+    See "Standardized tensor names" in
+    https://github.com/ggerganov/ggml/blob/master/docs/gguf.md for details.


Suggested change

https://github.com/ggerganov/ggml/blob/master/docs/gguf.md for details.

https://github.com/ggerganov/ggml/blob/master/docs/gguf.md for details. We rely on the `get_tensor_name_map` to get the correct mapping!

…ce#34385) * draft load_gguf refactor * update Signed-off-by: Isotr0py <[email protected]> * remove llama mapping Signed-off-by: Isotr0py <[email protected]> * remove qwen2 mapping Signed-off-by: Isotr0py <[email protected]> * remove unused function Signed-off-by: Isotr0py <[email protected]> * deprecate stablelm mapping Signed-off-by: Isotr0py <[email protected]> * deprecate phi3 mapping Signed-off-by: Isotr0py <[email protected]> * deprecate t5 mapping Signed-off-by: Isotr0py <[email protected]> * deprecate bloom mapping Signed-off-by: Isotr0py <[email protected]> * fix bloom Signed-off-by: Isotr0py <[email protected]> * deprecate starcoder2 mapping Signed-off-by: Isotr0py <[email protected]> * deprecate gpt2 mapping Signed-off-by: Isotr0py <[email protected]> * deprecate mistral mapping Signed-off-by: Isotr0py <[email protected]> * deprecate nemotron mapping Signed-off-by: Isotr0py <[email protected]> * deprecate mamba mapping Signed-off-by: Isotr0py <[email protected]> * deprecate mamba mapping Signed-off-by: Isotr0py <[email protected]> * code format Signed-off-by: Isotr0py <[email protected]> * code format Signed-off-by: Isotr0py <[email protected]> * fix mamba Signed-off-by: Isotr0py <[email protected]> * fix qwen2moe Signed-off-by: Isotr0py <[email protected]> * remove qwen2moe mapping Signed-off-by: Isotr0py <[email protected]> * clean up Signed-off-by: Isotr0py <[email protected]> * remove falcon 7b map Signed-off-by: Isotr0py <[email protected]> * remove all ggml tensors mapping Signed-off-by: Isotr0py <[email protected]> * add comments Signed-off-by: Isotr0py <[email protected]> * update messages Signed-off-by: Isotr0py <[email protected]> * fix tensors in parsed parameters Signed-off-by: Isotr0py <[email protected]> * add gguf check Signed-off-by: Isotr0py <[email protected]> --------- Signed-off-by: Isotr0py <[email protected]>

Isotr0py added 2 commits October 24, 2024 23:28

draft load_gguf refactor

21e2d58

Merge branch 'main' into gguf-tensor-map

0e381fa

Isotr0py added 4 commits December 4, 2024 23:50

update

f4e2f1f

Signed-off-by: Isotr0py <[email protected]>

remove llama mapping

8a4521d

Signed-off-by: Isotr0py <[email protected]>

remove qwen2 mapping

490572e

Signed-off-by: Isotr0py <[email protected]>

remove unused function

ad06fde

Signed-off-by: Isotr0py <[email protected]>

Isotr0py added 20 commits December 5, 2024 23:02

deprecate stablelm mapping

2fd5c4f

Signed-off-by: Isotr0py <[email protected]>

deprecate phi3 mapping

ccd2972

Signed-off-by: Isotr0py <[email protected]>

Merge branch 'main' into gguf-tensor-map

0ac0980

deprecate t5 mapping

0762572

Signed-off-by: Isotr0py <[email protected]>

deprecate bloom mapping

5895d6e

Signed-off-by: Isotr0py <[email protected]>

Merge branch 'huggingface:main' into gguf-tensor-map

08ef6a9

fix bloom

967bab2

Signed-off-by: Isotr0py <[email protected]>

deprecate starcoder2 mapping

d4feeff

Signed-off-by: Isotr0py <[email protected]>

deprecate gpt2 mapping

f5f2113

Signed-off-by: Isotr0py <[email protected]>

deprecate mistral mapping

3c9d7a1

Signed-off-by: Isotr0py <[email protected]>

deprecate nemotron mapping

ef476b8

Signed-off-by: Isotr0py <[email protected]>

deprecate mamba mapping

9cebfcc

Signed-off-by: Isotr0py <[email protected]>

deprecate mamba mapping

4ca22a4

Signed-off-by: Isotr0py <[email protected]>

code format

2f24591

Signed-off-by: Isotr0py <[email protected]>

code format

5f7c852

Signed-off-by: Isotr0py <[email protected]>

fix mamba

424f9f0

Signed-off-by: Isotr0py <[email protected]>

fix qwen2moe

83fc1d4

Signed-off-by: Isotr0py <[email protected]>

remove qwen2moe mapping

cca300e

Signed-off-by: Isotr0py <[email protected]>

clean up

87438a6

Signed-off-by: Isotr0py <[email protected]>

remove falcon 7b map

5641fb5

Signed-off-by: Isotr0py <[email protected]>

Isotr0py added 2 commits December 10, 2024 15:01

remove all ggml tensors mapping

07dd41a

Signed-off-by: Isotr0py <[email protected]>

Merge branch 'main' into gguf-tensor-map

809790e

Isotr0py changed the title ~~[WIP] Refactor and decouple gguf checkpoint loading logic~~ [GGUF] Refactor and decouple gguf checkpoint loading logic Dec 10, 2024

Isotr0py marked this pull request as ready for review December 10, 2024 11:19

Isotr0py commented Dec 10, 2024

View reviewed changes

src/transformers/modeling_gguf_pytorch_utils.py Show resolved Hide resolved

add comments

d8b613f

Signed-off-by: Isotr0py <[email protected]>

SunMarc approved these changes Dec 10, 2024

View reviewed changes

src/transformers/modeling_gguf_pytorch_utils.py Outdated Show resolved Hide resolved

src/transformers/modeling_gguf_pytorch_utils.py Show resolved Hide resolved

src/transformers/modeling_gguf_pytorch_utils.py Outdated Show resolved Hide resolved

SunMarc requested a review from LysandreJik December 10, 2024 15:20

Isotr0py mentioned this pull request Dec 10, 2024

Fix : Falcon processor doesn't account for a layout difference of qkv between transformers and GGUF #35088

Open

Isotr0py added 2 commits December 11, 2024 00:10

update messages

93c6248

Signed-off-by: Isotr0py <[email protected]>

Merge branch 'main' into gguf-tensor-map

acec7bd

MekkCyber reviewed Dec 10, 2024

View reviewed changes

Isotr0py added 3 commits December 16, 2024 16:10

fix tensors in parsed parameters

240dff4

Signed-off-by: Isotr0py <[email protected]>

Merge branch 'main' into gguf-tensor-map

f028a65

add gguf check

b43e1ee

Signed-off-by: Isotr0py <[email protected]>

MekkCyber approved these changes Dec 16, 2024

View reviewed changes

SunMarc requested a review from ArthurZucker December 23, 2024 15:59

Isotr0py added 2 commits January 2, 2025 13:51

Merge branch 'main' into gguf-tensor-map

bb9b204

Merge branch 'main' into gguf-tensor-map

a832783

ArthurZucker approved these changes Jan 6, 2025

View reviewed changes

ArthurZucker merged commit 3951da1 into huggingface:main Jan 6, 2025
25 checks passed

Isotr0py deleted the gguf-tensor-map branch January 21, 2025 14:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GGUF] Refactor and decouple gguf checkpoint loading logic #34385

[GGUF] Refactor and decouple gguf checkpoint loading logic #34385

Isotr0py commented Oct 24, 2024 •

edited

Loading

Rocketknight1 commented Dec 4, 2024

HuggingFaceDocBuilderDev commented Dec 4, 2024

SunMarc commented Dec 5, 2024 •

edited

Loading

MekkCyber commented Dec 5, 2024 •

edited

Loading

Isotr0py commented Dec 10, 2024 •

edited

Loading

SunMarc left a comment

MekkCyber commented Dec 10, 2024

MekkCyber Dec 10, 2024

Isotr0py Dec 16, 2024

MekkCyber Dec 16, 2024

MekkCyber Dec 16, 2024

Isotr0py Dec 16, 2024

Isotr0py commented Dec 16, 2024

MekkCyber commented Dec 16, 2024

ArthurZucker left a comment

ArthurZucker Jan 6, 2025

	https://github.com/ggerganov/ggml/blob/master/docs/gguf.md for details.
	https://github.com/ggerganov/ggml/blob/master/docs/gguf.md for details. We rely on the `get_tensor_name_map` to get the correct mapping!

[GGUF] Refactor and decouple gguf checkpoint loading logic #34385

[GGUF] Refactor and decouple gguf checkpoint loading logic #34385

Conversation

Isotr0py commented Oct 24, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

Rocketknight1 commented Dec 4, 2024

HuggingFaceDocBuilderDev commented Dec 4, 2024

SunMarc commented Dec 5, 2024 • edited Loading

MekkCyber commented Dec 5, 2024 • edited Loading

Isotr0py commented Dec 10, 2024 • edited Loading

SunMarc left a comment

Choose a reason for hiding this comment

MekkCyber commented Dec 10, 2024

MekkCyber Dec 10, 2024

Choose a reason for hiding this comment

Isotr0py Dec 16, 2024

Choose a reason for hiding this comment

MekkCyber Dec 16, 2024

Choose a reason for hiding this comment

MekkCyber Dec 16, 2024

Choose a reason for hiding this comment

Isotr0py Dec 16, 2024

Choose a reason for hiding this comment

Isotr0py commented Dec 16, 2024

MekkCyber commented Dec 16, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker Jan 6, 2025

Choose a reason for hiding this comment

Isotr0py commented Oct 24, 2024 •

edited

Loading

SunMarc commented Dec 5, 2024 •

edited

Loading

MekkCyber commented Dec 5, 2024 •

edited

Loading

Isotr0py commented Dec 10, 2024 •

edited

Loading