Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for uint8_t as data type for GatherBlockQuantized #24239

Merged
merged 8 commits into from
Apr 4, 2025

Conversation

sushraja-msft
Copy link
Contributor

@sushraja-msft sushraja-msft commented Mar 28, 2025

Description

This change adds support for GatherBlockQuantized to use uin8_t as data's type with the same semantics as MatMulNBits. Zero_Points and Gather Axis other than 0 are not yet supported, in order to keep the change scoped.

Motivation and Context

With the newer llama models like Phi4 trained with shared embeddings, the weights of the lm_head matrix and the embeddings table are exactly the same. These embeddings are huge, unquantized embeddings are 1.2GB in Phi4 mini instruct, at int4 quantization the weights are still 300MB. We can go a step further and have these two ops the lm_head matmulnbits and GatherBlockQuantized share the same weights, that would save 300MB on the model size.

The two things that hinder that are the shape expectations for GatherBlockQuantized and the data type supported for data in GatherBlockQuantized. The shape can be solved via a simple reshape op, but the data type needs code changes and that is what this change does.

Here is Phi4 modified with shared weights between lm_head and matmulnbits, this model is just 2.1GB on disk.
image

@sushraja-msft sushraja-msft requested a review from guschmue March 28, 2025 19:30
@sushraja-msft sushraja-msft marked this pull request as ready for review March 28, 2025 19:30
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

sushraja-msft and others added 2 commits March 31, 2025 10:58
…d.cc

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…d.cc

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
guschmue
guschmue previously approved these changes Mar 31, 2025
@guschmue
Copy link
Contributor

in theory we'd need to rev to opdef but I belief there is no harm in this case:
it is a contrib op and if a model hits an older version of onnxruntime with uint8 there is no registration for that type so it will still gracefully fail.

Copy link
Contributor

@liqunfu liqunfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall there be test?

@sushraja-msft
Copy link
Contributor Author

shall there be test?

Yes there shall, can you tell me where the existing tests are and how to run them locally so I can add to it ? I am new to making ORT CPU changes.

@sushraja-msft
Copy link
Contributor Author

Are the tests here

https://github.com/microsoft/onnxruntime/blob/24620e70d9f14956a0dc84bb8a332dcd64c95a94/onnxruntime/test/contrib_ops/gather_block_quantized_op_test.cc#L125C3-L125C27 ?

that tests seems to indicate that it I would fail because I added uint8_t support, yet I am passing the CI.

Copy link
Contributor

@yihonglyu yihonglyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe some .md files under the docs should be updated as well.

@sushraja-msft
Copy link
Contributor Author

I believe some .md files under the docs should be updated as well.

Could you point me to it please, I see a documentation string in contrib_defs.cc. That has been updated, is there some other documentation.

@sushraja-msft
Copy link
Contributor Author

I believe some .md files under the docs should be updated as well.

Could you point me to it please, I see a documentation string in contrib_defs.cc. That has been updated, is there some other documentation.

Done updated

@sushraja-msft sushraja-msft requested a review from liqunfu April 4, 2025 16:54
liqunfu
liqunfu previously approved these changes Apr 4, 2025
@sushraja-msft sushraja-msft merged commit a4976e3 into main Apr 4, 2025
85 of 89 checks passed
@sushraja-msft sushraja-msft deleted the user/sushraja/gather_dequantize branch April 4, 2025 22:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants