Skip to content

Commit 53b5d6f

Browse files
metascroypytorchbot
authored andcommitted
Update backends-coreml.md (#12100)
Adds more context on quantization to address #12059. (cherry picked from commit 488dfed)
1 parent 46b3298 commit 53b5d6f

File tree

1 file changed

+33
-11
lines changed

1 file changed

+33
-11
lines changed

docs/source/backends-coreml.md

Lines changed: 33 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -112,18 +112,17 @@ mobilenet_v2 = models.mobilenetv2.mobilenet_v2(weights=MobileNet_V2_Weights.DEFA
112112
sample_inputs = (torch.randn(1, 3, 224, 224), )
113113

114114
# Step 1: Define a LinearQuantizerConfig and create an instance of a CoreMLQuantizer
115-
quantization_config = ct.optimize.torch.quantization.LinearQuantizerConfig.from_dict(
116-
{
117-
"global_config": {
118-
"quantization_scheme": ct.optimize.torch.quantization.QuantizationScheme.symmetric,
119-
"milestones": [0, 0, 10, 10],
120-
"activation_dtype": torch.quint8,
121-
"weight_dtype": torch.qint8,
122-
"weight_per_channel": True,
123-
}
124-
}
115+
# Note that linear here does not mean only linear layers are quantized, but that linear (aka affine) quantization
116+
# is being performed
117+
static_8bit_config = ct.optimize.torch.quantization.LinearQuantizerConfig(
118+
global_config=ct.optimize.torch.quantization.ModuleLinearQuantizerConfig(
119+
quantization_scheme="symmetric",
120+
activation_dtype=torch.quint8,
121+
weight_dtype=torch.qint8,
122+
weight_per_channel=True,
123+
)
125124
)
126-
quantizer = CoreMLQuantizer(quantization_config)
125+
quantizer = CoreMLQuantizer(static_8bit_config)
127126

128127
# Step 2: Export the model for training
129128
training_gm = torch.export.export_for_training(mobilenet_v2, sample_inputs).module()
@@ -153,6 +152,24 @@ et_program = to_edge_transform_and_lower(
153152
).to_executorch()
154153
```
155154

155+
The above does static quantization (activations and weights are quantized). Quantizing activations requires calibrating the model on representative data. You can also do weight-only quantization, which does not require calibration data, by specifying the activation_dtype to be torch.float32:
156+
157+
```
158+
weight_only_8bit_config = ct.optimize.torch.quantization.LinearQuantizerConfig(
159+
global_config=ct.optimize.torch.quantization.ModuleLinearQuantizerConfig(
160+
quantization_scheme="symmetric",
161+
activation_dtype=torch.float32,
162+
weight_dtype=torch.qint8,
163+
weight_per_channel=True,
164+
)
165+
)
166+
quantizer = CoreMLQuantizer(weight_only_8bit_config)
167+
prepared_model = prepare_pt2e(training_gm, quantizer)
168+
quantized_model = convert_pt2e(prepared_model)
169+
```
170+
171+
Note that static quantization requires exporting the model for iOS17 or later.
172+
156173
See [PyTorch 2 Export Post Training Quantization](https://pytorch.org/tutorials/prototype/pt2e_quant_ptq.html) for more information.
157174

158175
----
@@ -204,3 +221,8 @@ This happens because the model is in FP16, but CoreML interprets some of the arg
204221
raise RuntimeError("BlobWriter not loaded")
205222

206223
If you're using Python 3.13, try reducing your python version to Python 3.12. coremltools does not support Python 3.13, see this [issue](https://github.com/apple/coremltools/issues/2487).
224+
225+
### At runtime
226+
1. [ETCoreMLModelCompiler.mm:55] [Core ML] Failed to compile model, error = Error Domain=com.apple.mlassetio Code=1 "Failed to parse the model specification. Error: Unable to parse ML Program: at unknown location: Unknown opset 'CoreML7'." UserInfo={NSLocalizedDescription=Failed to par$
227+
228+
This means the model requires the the CoreML opset 'CoreML7', which requires running the model on iOS17/macOS14 or later.

0 commit comments

Comments
 (0)