Honor quantization_config by pcuenca · Pull Request #692 · Blaizzy/mlx-vlm

pcuenca · 2026-01-28T16:50:25Z

This makes it possible to load transformers quantized weights.

This is another mismatch I found while working on #689. I was puzzled that I had to use QuantizedSwitchLinear explicitly in order to be able to load the weights, whereas this was not necessary in mlx_lm. The quantization process immediately performed after sanitization is driven by the existence of quantization_config, and it adapts the weights accordingly so they have .scales and .biases.

Extracting as a separate PR for discussion, maybe I missed some side effects.

This makes it possible to load transformers quantized weights

pcuenca · 2026-01-28T16:53:04Z

After this commit, the following simplification can be made: cacb9c2, part of the kimi PR.

(and we could probably still simplify the sanitize method a bit)

Blaizzy · 2026-01-28T17:31:02Z

mlx_vlm/utils.py

+                config["quantization"] = quantization
+                config["quantization_config"] = quantization
+
    if (quantization := config.get("quantization", None)) is not None:


I believe the new code above should be here, right?

It's a bit confusing 😅. quantization is first read in L140 (which was already there). If it exists, it's a MLX-style quantization definition. If it doesn't, then we check if quantization_config exists, convert it to quantization format and store it in the config object. Perhaps we could remove L140 and just check if config has the quantization attribute in line 217.

Let's refactor then 😎

I think this could start at line 241.

So all the quantisation logic is clear in one place

Deleted the distant quantization line from L140 and simplified slightly.

Blaizzy

LGTM, thanks!

Honor quantization_config

7ee2bca

This makes it possible to load transformers quantized weights

Blaizzy reviewed Jan 28, 2026

View reviewed changes

pcuenca mentioned this pull request Feb 2, 2026

Distributed inference for Kimi K2.5 #689

Open

pcuenca added 2 commits February 2, 2026 18:41

Slight reorder

3224ed5

Merge branch 'main' into quantization_config

1ff0e4d

pcuenca requested a review from Blaizzy February 2, 2026 17:47

Merge branch 'main' into quantization_config

a53ea3c

Blaizzy approved these changes Feb 11, 2026

View reviewed changes

Blaizzy merged commit dd7cef1 into Blaizzy:main Feb 11, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Honor quantization_config#692

Honor quantization_config#692
Blaizzy merged 4 commits intoBlaizzy:mainfrom
pcuenca:quantization_config

pcuenca commented Jan 28, 2026 •

edited

Loading

Uh oh!

pcuenca commented Jan 28, 2026 •

edited

Loading

Uh oh!

Blaizzy Jan 28, 2026 •

edited

Loading

Uh oh!

pcuenca Jan 28, 2026

Uh oh!

Blaizzy Jan 28, 2026

Uh oh!

pcuenca Feb 2, 2026

Uh oh!

Blaizzy left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

pcuenca commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pcuenca commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Blaizzy Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pcuenca Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Blaizzy Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

pcuenca Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Blaizzy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pcuenca commented Jan 28, 2026 •

edited

Loading

pcuenca commented Jan 28, 2026 •

edited

Loading

Blaizzy Jan 28, 2026 •

edited

Loading