Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce QuantizedType #1352

Merged
merged 16 commits into from
Apr 14, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 64 additions & 5 deletions docs/spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ completely numeric to simplify generation of StableHLO programs.

```ebnf
Type ::= ValueType | NonValueType
ValueType ::= TensorType | TokenType | TupleType
ValueType ::= TensorType | QuantizedTensorType | TokenType | TupleType
NonValueType ::= ElementType | FunctionType | StringType
```

Expand Down Expand Up @@ -116,6 +116,69 @@ types, for example, to include layouts
([#629](https://github.com/openxla/stablehlo/issues/629)) and sparsity
([#1078](https://github.com/openxla/stablehlo/issues/1078)).

```ebnf
QuantizedTensorType ::= 'tensor' '<' TensorShape QuantizedElementType '>'
QuantizedElementType ::= '!quant.uniform' '<'
QuantizationStorageType
['<' QuantizationStorageMin ':' QuantizationStorageMax '>']
':' QuantizationExpressedType
[':' QuantizationDimension]
',' QuantizationParameters '>'
QuantizationStorageType ::= IntegerType
QuantizationStorageMin ::= IntegerConstant
QuantizationStorageMax ::= IntegerConstant
QuantizationExpressedType ::= FloatType
QuantizationDimension ::= IntegerConstant
QuantizationParameters ::= QuantizationParameter
| '{' QuantizationParameter {',' QuantizationParameter} '}'
QuantizationParameter ::= QuantizationScale ':' QuantizationZeroPoint
QuantizationScale ::= FloatConstant
Copy link

@abattery abattery Apr 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we open the possibility to have unknown values for scale and zero point for quantized training, which requires the training programs to calculate scale and zero point values on the fly?

Copy link
Contributor

@burmako burmako Apr 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's an interesting idea, and #1149 is highlighting this as an open issue as well. upd: We opened #1407 to track this work.

However, given that we don't have a precedent for this functionality (quant.uniform doesn't support this yet, and therefore StableHLO dialect doesn't support this either), I think that this functionality would need to be introduced in a dedicated RFC once we define the baseline StableHLO spec.

This is similar to how we've forked StableHLO from MHLO and specced baseline HLO semantics, and now are starting to make changes to it along the lines of #1143, #1308, #1342, etc.

QuantizationZeroPoint ::= IntegerConstant
```

**Quantized element types** represent integer values of a **storage type** in
the range from `storage_min` to `storage_max` (inclusive) that correspond to
floating-point values of an **expressed type**. For a given integer value `i`,
the corresponding floating-point value `f` can be computed as
`f = (i - zero_point) * scale`, where `scale` and `zero_point` are called
**quantization parameters**. The `storage_min` and `storage_max` are optional
in the grammar, but have default values of `min_value(storage_type)` and
`max_value(storage_type)` respectively. Quantized element types have the
following constraints:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the semantics of storage_min and storage_max? Are operators which produce quantized values supposed to clamp them to [storage_min, storage_max]? Are StableHLO consumers supposed to validate that quantized tensors that are inputs to the model don't have values outside of [storage_min, storage_max]? The only case where [storage_min, storage_max] != [min_value(storage_type), max_value(storage_type)] is filters in signed 8-bit quantization, where they are constrained to [-127, 127] instead of [-128, 127]. However, as the filters are static, it is possible to verify this constraint by directly inspecting the weights.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the semantics of storage_min and storage_max?

The semantics is to make sure that the quantized values are within this range. This semantics is enforced by the individual ops.

Are operators which produce quantized values supposed to clamp them to [storage_min, storage_max]?

Yes.

Are StableHLO consumers supposed to validate that quantized tensors that are inputs to the model don't have values outside of [storage_min, storage_max]?

The spec in its current form, uses the individual ops to apply the semantics of the [storage_min, storage_max], which is to clamp the quantized values. There is no constraint in the input data and IMO the consumers can still be compliant to the spec even if they do not have the validation.

Copy link

@abattery abattery Apr 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some cases, we also use the [-7, 7] range with eight bit integer storage to mimick four bit quantization or other bit scheme quantization with wider integer storage.

For popular eight bit quantization cases, as you said, the typical ranges, [-128, 127] [-127, 127], are popular. However, IMO, for various reasons (e.g, special quantization treatments, research purpose) it might be better not to limit allowed ranges with one or two ranges only in the language specification.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point! Let's discuss more in #1406. In the meanwhile, we removed the spec for quantized AddOp to sidestep having an opinion on this in this specific PR.


* (C1) `num_bits(storage_type) < num_bits(expressed_type)`.
* (C2) `type(storage_min) = storage_type`.
* (C3) `type(storage_max) = storage_type`.
* (C4) `min_value(storage_type) <= storage_min < storage_max <= max_value(storage_type)`.
* (C5) For all `i`, `type(scales[i]) = expressed_type`.
* (C6) For all `i`, `scales[i] > 0`.
* (C7) For all `i`, `is_finite(scales[i])`.
* (C8) For all `i`, `storage_min <= zero_points[i] <= storage_max`.
* (C9) For all `i`, `type(zero_points[i]) = storage_type`.
* (C10) `size(scales) = size(zero_points)`.
* (C11) If `quantization_dimension` is empty, then `size(scales) = 1`.
* (C12) If `quantization_dimension` is not empty, then
`0 <= quantization_dimension`.

**Quantized tensor types** represent tensors with quantized elements. These
tensors are exactly the same as regular tensors, except that their elements
have quantized element types, instead of regular element types.

In quantized tensors, quantization can be **per-tensor**, meaning, having
one `scale` and `zero_point` for the entire tensor or can be **per-axis**,
meaning, having multiple `scales` and `zero_points`, one pair per slice of
a particular dimension `quantized_dimension`. More formally, in a tensor `t` of
with per-axis quantization, there are `dim(t, quantized_dimension)` slices
of the `quantized_dimension`: `t[:, ..., 0, ..., :], t[:, ..., 1, ..., :]`, etc.
All elements in the `i`th slice use `scales[i]` and `zero_points[i]` as their
quantization parameters. Quantized tensor types have the following constraints:

* For per-tensor quantization:
* No additional constraints.
* For per-axis quantization:
* (C12) `quantization_dimension < size(shape)`.
* (C13) `size(scales) = shape[quantization_dimension]`.

```ebnf
TokenType ::= 'token'
```
Expand Down Expand Up @@ -173,10 +236,6 @@ values of type `tensor<T>`).
and an **imaginary part** of the same **element type**. Supported complex
types are `complex<f32>` (both parts are of type `f32`) and `complex<f64>`
(both parts are of type `f64`).
* In the future, we are also planning to introduce **quantized types** that
represent integer values obtained via uniform quantization of floating-point
values using given scales and zero points
([#588](https://github.com/openxla/stablehlo/issues/588)).

```ebnf
FunctionType ::= '(' [ValueType {',' ValueType}] ')' '->' '(' [ValueType {',' ValueType}] ')'
Expand Down