Feature: MoE.Routing control (Bypass or Override) by avtc · Pull Request #2235 · ModelCloud/GPTQModel

avtc · 2025-12-04T10:07:32Z

@Qubitium please review.
It looks like LLM has issues in Antigravity when it tried to understand what needed to be changed from git diff and tried to write feature from scratch, so had to babysit it and get out of limits every now and then, also there are no many tests were added.

The model definitions with gate/up/down, w3/w1/w2 expert weights were updated, but several are out of scheme, I have left them untouched:

dbrx_converted
ernie4_5_moe (it has "#": ("gate_proj:0", "upe_proj:0", "down_proj:1"), - upe_proj maybe typo, idk), and it has custom forward
gpt_oss
phi3

I have made test run on Qwen3-30B-A3B - the logs shows all samples are passed to each expert weight, and the LLM answers normally after quantization.

Also added a flag wait_for_layer_completion to minimize VRAM usage.

…rent layer index

…when forward_hook_last is True

…n forward_to_all_experts

…ve part of experts, remove optimization for shared experts as it could be not optimal when StopForward raised

…ed. update _masked_hook_wrapper to check hooks_paused

…, except of: dbrx_converted ernie4_5_moe gpt_oss phi3

Qubitium · 2025-12-31T09:31:08Z

I have run tests mentioned above, using lm_eval with api requests to vllm via sampling_proxy to override sampling params and temperature. I have not set max_length in these tests, by default it is 2048 Time to run is about 10 minutes, each model is on separate 4x3090 gpus, run in parallel

# not full
lm_eval --model local-completions --tasks arc_challenge,mmlu_stem,gsm8k_platinum_cot --model_args model=/home/ubuntu/models/GPTQModel/Qwen3-Coder-30B-A3B-Instruct-gptqmodel-w4g32-tp8-dump0.01-bs1-s403-d175149d-not-full-exp,base_url=http://127.0.0.1:8011/completions,num_concurrent=16,max_retries=1,tokenized_requests=False

# full
lm_eval --model local-completions --tasks arc_challenge,mmlu_stem,gsm8k_platinum_cot --model_args model=/home/ubuntu/models/GPTQModel/Qwen3-Coder-30B-A3B-Instruct-gptqmodel-w4g32-tp8-dump0.01-bs1-s403-d175149d-full-exp,base_url=http://127.0.0.1:8021/completions,num_concurrent=16,max_retries=1,tokenized_requests=False

# results, temperature is 0.0, sampling params set min_p=0, top_p=0.8, top_k=20, repetition_penalty=1.05
# not-full (model A)
local-completions (model=/home/ubuntu/models/GPTQModel/Qwen3-Coder-30B-A3B-Instruct-gptqmodel-w4g32-tp8-dump0.01-bs1-s403-d175149d-not-full-exp,base_url=http://127.0.0.1:8011/completions,num_concurrent=16,max_retries=1,tokenized_requests=False), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
|             Tasks             |Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-------------------------------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|arc_challenge                  |      1|none            |     0|acc        |↑  |0.5401|±  |0.0146|
|                               |       |none            |     0|acc_norm   |↑  |0.5580|±  |0.0145|
|gsm8k_platinum_cot             |      3|flexible-extract|     8|exact_match|↑  |0.9363|±  |0.0070|
|                               |       |strict-match    |     8|exact_match|↑  |0.9256|±  |0.0076|
|stem                           |      2|none            |      |acc        |↑  |0.7926|±  |0.0070|
| - abstract_algebra            |      1|none            |     0|acc        |↑  |0.6600|±  |0.0476|
| - anatomy                     |      1|none            |     0|acc        |↑  |0.7259|±  |0.0385|
| - astronomy                   |      1|none            |     0|acc        |↑  |0.8947|±  |0.0250|
| - college_biology             |      1|none            |     0|acc        |↑  |0.9028|±  |0.0248|
| - college_chemistry           |      1|none            |     0|acc        |↑  |0.6200|±  |0.0488|
| - college_computer_science    |      1|none            |     0|acc        |↑  |0.7700|±  |0.0423|
| - college_mathematics         |      1|none            |     0|acc        |↑  |0.6700|±  |0.0473|
| - college_physics             |      1|none            |     0|acc        |↑  |0.7157|±  |0.0449|
| - computer_security           |      1|none            |     0|acc        |↑  |0.8300|±  |0.0378|
| - conceptual_physics          |      1|none            |     0|acc        |↑  |0.9021|±  |0.0194|
| - electrical_engineering      |      1|none            |     0|acc        |↑  |0.8069|±  |0.0329|
| - elementary_mathematics      |      1|none            |     0|acc        |↑  |0.8466|±  |0.0186|
| - high_school_biology         |      1|none            |     0|acc        |↑  |0.9226|±  |0.0152|
| - high_school_chemistry       |      1|none            |     0|acc        |↑  |0.7537|±  |0.0303|
| - high_school_computer_science|      1|none            |     0|acc        |↑  |0.8700|±  |0.0338|
| - high_school_mathematics     |      1|none            |     0|acc        |↑  |0.6407|±  |0.0293|
| - high_school_physics         |      1|none            |     0|acc        |↑  |0.7219|±  |0.0366|
| - high_school_statistics      |      1|none            |     0|acc        |↑  |0.7917|±  |0.0277|
| - machine_learning            |      1|none            |     0|acc        |↑  |0.7054|±  |0.0433|

|Groups|Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|------|------:|------|------|------|---|-----:|---|-----:|
|stem  |      2|none  |      |acc   |↑  |0.7926|±  | 0.007|

# not-full temp 0.8 (model A)
local-completions (model=/home/ubuntu/models/GPTQModel/Qwen3-Coder-30B-A3B-Instruct-gptqmodel-w4g32-tp8-dump0.01-bs1-s403-d175149d-not-full-exp,base_url=http://127.0.0.1:8011/completions,num_concurrent=32,max_retries=1,tokenized_requests=False), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
|             Tasks             |Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-------------------------------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|arc_challenge                  |      1|none            |     0|acc        |↑  |0.5350|±  |0.0146|
|                               |       |none            |     0|acc_norm   |↑  |0.5572|±  |0.0145|
|gsm8k_platinum_cot             |      3|flexible-extract|     8|exact_match|↑  |0.9413|±  |0.0068|
|                               |       |strict-match    |     8|exact_match|↑  |0.9355|±  |0.0071|
|stem                           |      2|none            |      |acc        |↑  |0.7919|±  |0.0070|
| - abstract_algebra            |      1|none            |     0|acc        |↑  |0.6600|±  |0.0476|
| - anatomy                     |      1|none            |     0|acc        |↑  |0.7259|±  |0.0385|
| - astronomy                   |      1|none            |     0|acc        |↑  |0.8947|±  |0.0250|
| - college_biology             |      1|none            |     0|acc        |↑  |0.8958|±  |0.0255|
| - college_chemistry           |      1|none            |     0|acc        |↑  |0.6200|±  |0.0488|
| - college_computer_science    |      1|none            |     0|acc        |↑  |0.7600|±  |0.0429|
| - college_mathematics         |      1|none            |     0|acc        |↑  |0.6600|±  |0.0476|
| - college_physics             |      1|none            |     0|acc        |↑  |0.7059|±  |0.0453|
| - computer_security           |      1|none            |     0|acc        |↑  |0.8300|±  |0.0378|
| - conceptual_physics          |      1|none            |     0|acc        |↑  |0.9021|±  |0.0194|
| - electrical_engineering      |      1|none            |     0|acc        |↑  |0.8000|±  |0.0333|
| - elementary_mathematics      |      1|none            |     0|acc        |↑  |0.8492|±  |0.0184|
| - high_school_biology         |      1|none            |     0|acc        |↑  |0.9258|±  |0.0149|
| - high_school_chemistry       |      1|none            |     0|acc        |↑  |0.7586|±  |0.0301|
| - high_school_computer_science|      1|none            |     0|acc        |↑  |0.8800|±  |0.0327|
| - high_school_mathematics     |      1|none            |     0|acc        |↑  |0.6259|±  |0.0295|
| - high_school_physics         |      1|none            |     0|acc        |↑  |0.7219|±  |0.0366|
| - high_school_statistics      |      1|none            |     0|acc        |↑  |0.8009|±  |0.0272|
| - machine_learning            |      1|none            |     0|acc        |↑  |0.7143|±  |0.0429|

|Groups|Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|------|------:|------|------|------|---|-----:|---|-----:|
|stem  |      2|none  |      |acc   |↑  |0.7919|±  | 0.007|

# full (model B)
cal-completions (model=/home/ubuntu/models/GPTQModel/Qwen3-Coder-30B-A3B-Instruct-gptqmodel-w4g32-tp8-dump0.01-bs1-s403-d175149d-full-exp,base_url=http://127.0.0.1:8021/completions,num_concurrent=16,max_retries=1,tokenized_requests=False), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
|             Tasks             |Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-------------------------------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|arc_challenge                  |      1|none            |     0|acc        |↑  |0.5503|±  |0.0145|
|                               |       |none            |     0|acc_norm   |↑  |0.5623|±  |0.0145|
|gsm8k_platinum_cot             |      3|flexible-extract|     8|exact_match|↑  |0.9421|±  |0.0067|
|                               |       |strict-match    |     8|exact_match|↑  |0.9313|±  |0.0073|
|stem                           |      2|none            |      |acc        |↑  |0.7919|±  |0.0070|
| - abstract_algebra            |      1|none            |     0|acc        |↑  |0.6600|±  |0.0476|
| - anatomy                     |      1|none            |     0|acc        |↑  |0.7111|±  |0.0392|
| - astronomy                   |      1|none            |     0|acc        |↑  |0.9145|±  |0.0228|
| - college_biology             |      1|none            |     0|acc        |↑  |0.8958|±  |0.0255|
| - college_chemistry           |      1|none            |     0|acc        |↑  |0.5900|±  |0.0494|
| - college_computer_science    |      1|none            |     0|acc        |↑  |0.7400|±  |0.0441|
| - college_mathematics         |      1|none            |     0|acc        |↑  |0.6000|±  |0.0492|
| - college_physics             |      1|none            |     0|acc        |↑  |0.7157|±  |0.0449|
| - computer_security           |      1|none            |     0|acc        |↑  |0.8200|±  |0.0386|
| - conceptual_physics          |      1|none            |     0|acc        |↑  |0.8979|±  |0.0198|
| - electrical_engineering      |      1|none            |     0|acc        |↑  |0.8276|±  |0.0315|
| - elementary_mathematics      |      1|none            |     0|acc        |↑  |0.8519|±  |0.0183|
| - high_school_biology         |      1|none            |     0|acc        |↑  |0.9290|±  |0.0146|
| - high_school_chemistry       |      1|none            |     0|acc        |↑  |0.7685|±  |0.0297|
| - high_school_computer_science|      1|none            |     0|acc        |↑  |0.8600|±  |0.0349|
| - high_school_mathematics     |      1|none            |     0|acc        |↑  |0.6519|±  |0.0290|
| - high_school_physics         |      1|none            |     0|acc        |↑  |0.7616|±  |0.0348|
| - high_school_statistics      |      1|none            |     0|acc        |↑  |0.7870|±  |0.0279|
| - machine_learning            |      1|none            |     0|acc        |↑  |0.6696|±  |0.0446|

|Groups|Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|------|------:|------|------|------|---|-----:|---|-----:|
|stem  |      2|none  |      |acc   |↑  |0.7919|±  | 0.007|

The full aka B scores looks great! Actually wins 2 out 3 in this round of test. PR merge is scheduled for Jan first week of 2026 release via 5.7.0 release.

…rward-whole-dataset-to-each-expert-main-2

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

gptqmodel/utils/model.py

gptqmodel/quantization/config.py

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

…rward-whole-dataset-to-each-expert-main-2

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

This reverts commit fc0de11.

* Revert "Feature: MoE.Routing control (Bypass or Override) (#2235)" This reverts commit fc0de11. * fix revert error Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * fix revert error Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * format Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * ModelTest.load_tokenizer() mark as classmethod Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * add awq test folder * add awq test folder * add awq test folder * fix revert error Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * add awq test folder * fix finalizer threaeds are not waited on before model.save * avoid dead thread ref memory leak pile up * cleanup * revert ModelTest.MOE_CONFIG Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * keep prev pr memory optimizations * keep prev pr memory optimizations * keep prev pr auto_forward_data_parallel optimizations * keep prev pr gc_mode and wait_for_submodule_finalizers * cleanup --------- Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: Qubitium <qubitium@modelcloud.ai>

avtc added 30 commits December 1, 2025 12:26

phase 1 and 2

06e7d2b

phase 3

d0915cc

fix for the case when :moe module does not have expert modules on cur…

7e59bb3

…rent layer index

log activations collected, raise StopForward in named_module forward …

15c74bf

…when forward_hook_last is True

optimization to prevent shared experts double activation

5eff9af

fix circular dependency

6b7f190

debug hooks not used

791192d

integrate to _run_forward_batches_parallel

c9e2c96

remove redundant overrides

9b3b783

debug subset content

b17bd95

cache subset_modules, refactor and fix confusion with names/modules i…

48c00a7

…n forward_to_all_experts

debug

a7032dc

refining forward_to_all_experts

8cbf894

refactor get_experts_module_name

6d05f70

fix import

77a8bfd

optimization for _extract_moe_block_prefix, fix in case subset can ha…

00ce12d

…ve part of experts, remove optimization for shared experts as it could be not optimal when StopForward raised

deduplicate moe_forward_wrapper

ce2a4a8

update _masked_pre_hook_wrapper to recent variant. revisit hooks_paus…

10d8861

…ed. update _masked_hook_wrapper to check hooks_paused

fix signature

169fa63

fix param missing

a76af2a

remove comments and unused variables

d4bc837

fix indentation, and move after torch_sync

21b0d5e

trying to fix for data-parallel

1520b87

fix indentation

c6697c3

config option wait_for_layer_completion

f6f4e1a

clean moe_contexts in finally

d175149

enable moe_lifecycle_hooks for qwen3_moe

6a1f90d

trying to fix VRAM leak

61cf213

fix typo

19e6b32

update model definitions to support pass_whole_dataset_to_each_expert…

745944b

…, except of: dbrx_converted ernie4_5_moe gpt_oss phi3

Merge remote-tracking branch 'origin/main' into feature/experiment-fo…

11498be

…rward-whole-dataset-to-each-expert-main-2

Qubitium requested a review from ZX-ModelCloud January 4, 2026 06:29

ZX-ModelCloud added 4 commits January 4, 2026 17:19

Merge remote-tracking branch 'origin/main' into feature/experiment-fo…

b699a64

…rward-whole-dataset-to-each-expert-main-2

add MoEConfig and ExpertsRoutingBypass

bcdc733

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

Fix: Saving of MoEConfig

c9b6faf

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

add set_num_experts_per_tok()

1df1e29

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

Qubitium reviewed Jan 5, 2026

View reviewed changes

gptqmodel/utils/model.py Outdated Show resolved Hide resolved

Qubitium reviewed Jan 5, 2026

View reviewed changes

gptqmodel/quantization/config.py Outdated Show resolved Hide resolved

ZX-ModelCloud added 8 commits January 5, 2026 16:02

add MoERoutingOverrideContext

1500849

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

cleanup

3feb44c

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

ModelTest add MOE_CONFIG field

e01e0c9

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

Merge remote-tracking branch 'origin/main' into feature/experiment-fo…

1ea718b

…rward-whole-dataset-to-each-expert-main-2

fix merge error

5e5fdb1

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

Fixed a bug where NSAMPLES was missing when writing to quant_log.csv.

b0a51ca

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

add test_moe_config.py

0a5fb64

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

remove debug print

928b828

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

Qubitium changed the title ~~Feature: Disable MoE Routing: All experts receive all activations~~ Feature: MoEConfig.Routing control (Bypass + Override) Jan 6, 2026

Qubitium changed the title ~~Feature: MoEConfig.Routing control (Bypass + Override)~~ Feature: MoEConfig.Routing control (Bypass or Override) Jan 6, 2026

Qubitium changed the title ~~Feature: MoEConfig.Routing control (Bypass or Override)~~ Feature: MoE.Routing control (Bypass or Override) Jan 6, 2026

Qubitium self-requested a review January 6, 2026 08:43

Qubitium approved these changes Jan 6, 2026

View reviewed changes

ZX-ModelCloud added 3 commits January 6, 2026 17:15

check quantize samples

372f72d

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

cleanup

8ba3342

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

QuantizeConfig setting "failsafe" argument

f46f381

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

Qubitium approved these changes Jan 6, 2026

View reviewed changes

Qubitium merged commit fc0de11 into ModelCloud:main Jan 6, 2026
1 check passed

Qubitium mentioned this pull request Jan 8, 2026

[BUG] AWQ Regression due to PR 2235 bypass lifecycle changes #2336

Closed

ZX-ModelCloud added a commit that referenced this pull request Jan 8, 2026

Revert "Feature: MoE.Routing control (Bypass or Override) (#2235)"

2f9ad8a

This reverts commit fc0de11.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: MoE.Routing control (Bypass or Override)#2235

Feature: MoE.Routing control (Bypass or Override)#2235
Qubitium merged 63 commits intoModelCloud:mainfrom
avtc:feature/experiment-forward-whole-dataset-to-each-expert-main-2

avtc commented Dec 4, 2025

Uh oh!

Qubitium commented Dec 31, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

avtc commented Dec 4, 2025

Uh oh!

Qubitium commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Qubitium commented Dec 31, 2025 •

edited

Loading