Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce QuantizedType #1352
Introduce QuantizedType #1352
Changes from all commits
bd87b3d
1e9382c
a79f3fe
e66d04a
f7c35ac
335ee96
df84365
11d64ca
21f3653
54c20a9
caf5f87
40b13fb
6379be5
09e9ca8
b49a2d4
e30a7c7
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we open the possibility to have unknown values for scale and zero point for quantized training, which requires the training programs to calculate scale and zero point values on the fly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's an interesting idea, and #1149 is highlighting this as an open issue as well. upd: We opened #1407 to track this work.
However, given that we don't have a precedent for this functionality (quant.uniform doesn't support this yet, and therefore StableHLO dialect doesn't support this either), I think that this functionality would need to be introduced in a dedicated RFC once we define the baseline StableHLO spec.
This is similar to how we've forked StableHLO from MHLO and specced baseline HLO semantics, and now are starting to make changes to it along the lines of #1143, #1308, #1342, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the semantics of
storage_min
andstorage_max
? Are operators which produce quantized values supposed to clamp them to[storage_min, storage_max]
? Are StableHLO consumers supposed to validate that quantized tensors that are inputs to the model don't have values outside of[storage_min, storage_max]
? The only case where[storage_min, storage_max] != [min_value(storage_type), max_value(storage_type)]
is filters in signed 8-bit quantization, where they are constrained to[-127, 127]
instead of[-128, 127]
. However, as the filters are static, it is possible to verify this constraint by directly inspecting the weights.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The semantics is to make sure that the quantized values are within this range. This semantics is enforced by the individual ops.
Yes.
The spec in its current form, uses the individual ops to apply the semantics of the
[storage_min, storage_max]
, which is to clamp the quantized values. There is no constraint in the input data and IMO the consumers can still be compliant to the spec even if they do not have the validation.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some cases, we also use the [-7, 7] range with eight bit integer storage to mimick four bit quantization or other bit scheme quantization with wider integer storage.
For popular eight bit quantization cases, as you said, the typical ranges, [-128, 127] [-127, 127], are popular. However, IMO, for various reasons (e.g, special quantization treatments, research purpose) it might be better not to limit allowed ranges with one or two ranges only in the language specification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point! Let's discuss more in #1406. In the meanwhile, we removed the spec for quantized AddOp to sidestep having an opinion on this in this specific PR.