You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: QUANTIZED_OP.md
+28-15Lines changed: 28 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,26 +15,39 @@ fp generally denotes the data_type of
15
15
16
16
## Quick Look-Up for Implementations in SNPS Caffe
17
17
We support the implementations from different frameworks, which leverages the layer parameter `quantize_method` when their results fail bit-exactness. You can also refer to [FEATURES.md](https://github.com/foss-for-synopsys-dwc-arc-processors/synopsys-caffe/blob/development/FEATURES.md#custom-quantization-related) for other quantization-related parameters.
18
-
We denote TFLite/ONNXruntime/Caffe2 implementations by **t**/**o**/**c**. Since some quantized operators may have bit-exactness results between the frameworks, we don't elaborate the specific implementation.
We denote TFLite/ONNXruntime/Caffe2 implementations by `t/o/c`; and the **`~`** entries indicate that the Caffe implementation computes in floating representation such as
30
36
37
+
```cpp=
38
+
// A Dequantize-Op-Quantize procedure, taking ReLU as example.
1. Our model zoo doesn't cover all quantized operators over the frameworks. The entry is left empty if the `(framework,operator)` combination is not seen yet.
34
-
* Quantized bias_layer only occurs in ONNX (does not support FC+Bias fusion yet).
35
-
2. Only Quantize and Dequantize operators are mapped to Power_layer.
36
-
3. For ResizeBilinear/Concat layers, we use Dequantize+Quantize to implment the affine transformation.
37
-
46
+
* Quantized `bias_layer` only occurs in ONNX (does not support `FC+Bias` fusion yet).
47
+
2. Only `Quantize` and `Dequantize` operators are mapped to `Power_layer`.
48
+
3. Since some quantized operators may have bit-exactness results between the frameworks, for such entries we adapt the implementation from other framework.
49
+
4.`MaxPool`, `ArgMax` are seen, but they do nothing different for quantized/floating numbers.
50
+
5.`Convolution` concludes a number of variations, please see the folloing section.
0 commit comments