Skip to content

Feature: MoE.Routing control (Bypass or Override)#2235

Merged
Qubitium merged 63 commits intoModelCloud:mainfrom
avtc:feature/experiment-forward-whole-dataset-to-each-expert-main-2
Jan 6, 2026
Merged

Feature: MoE.Routing control (Bypass or Override)#2235
Qubitium merged 63 commits intoModelCloud:mainfrom
avtc:feature/experiment-forward-whole-dataset-to-each-expert-main-2

Conversation

@avtc
Copy link
Contributor

@avtc avtc commented Dec 4, 2025

@Qubitium please review.
It looks like LLM has issues in Antigravity when it tried to understand what needed to be changed from git diff and tried to write feature from scratch, so had to babysit it and get out of limits every now and then, also there are no many tests were added.

The model definitions with gate/up/down, w3/w1/w2 expert weights were updated, but several are out of scheme, I have left them untouched:

  • dbrx_converted
  • ernie4_5_moe (it has "#": ("gate_proj:0", "upe_proj:0", "down_proj:1"), - upe_proj maybe typo, idk), and it has custom forward
  • gpt_oss
  • phi3

I have made test run on Qwen3-30B-A3B - the logs shows all samples are passed to each expert weight, and the LLM answers normally after quantization.

Also added a flag wait_for_layer_completion to minimize VRAM usage.

avtc added 30 commits December 1, 2025 12:26
…ve part of experts, remove optimization for shared experts as it could be not optimal when StopForward raised
…ed. update _masked_hook_wrapper to check hooks_paused
…, except of:

dbrx_converted
ernie4_5_moe
gpt_oss
phi3
@Qubitium
Copy link
Collaborator

Qubitium commented Dec 31, 2025

I have run tests mentioned above, using lm_eval with api requests to vllm via sampling_proxy to override sampling params and temperature. I have not set max_length in these tests, by default it is 2048 Time to run is about 10 minutes, each model is on separate 4x3090 gpus, run in parallel

# not full
lm_eval --model local-completions --tasks arc_challenge,mmlu_stem,gsm8k_platinum_cot --model_args model=/home/ubuntu/models/GPTQModel/Qwen3-Coder-30B-A3B-Instruct-gptqmodel-w4g32-tp8-dump0.01-bs1-s403-d175149d-not-full-exp,base_url=http://127.0.0.1:8011/completions,num_concurrent=16,max_retries=1,tokenized_requests=False

# full
lm_eval --model local-completions --tasks arc_challenge,mmlu_stem,gsm8k_platinum_cot --model_args model=/home/ubuntu/models/GPTQModel/Qwen3-Coder-30B-A3B-Instruct-gptqmodel-w4g32-tp8-dump0.01-bs1-s403-d175149d-full-exp,base_url=http://127.0.0.1:8021/completions,num_concurrent=16,max_retries=1,tokenized_requests=False

# results, temperature is 0.0, sampling params set min_p=0, top_p=0.8, top_k=20, repetition_penalty=1.05
# not-full (model A)
local-completions (model=/home/ubuntu/models/GPTQModel/Qwen3-Coder-30B-A3B-Instruct-gptqmodel-w4g32-tp8-dump0.01-bs1-s403-d175149d-not-full-exp,base_url=http://127.0.0.1:8011/completions,num_concurrent=16,max_retries=1,tokenized_requests=False), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
|             Tasks             |Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-------------------------------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|arc_challenge                  |      1|none            |     0|acc        |↑  |0.5401|±  |0.0146|
|                               |       |none            |     0|acc_norm   |↑  |0.5580|±  |0.0145|
|gsm8k_platinum_cot             |      3|flexible-extract|     8|exact_match|↑  |0.9363|±  |0.0070|
|                               |       |strict-match    |     8|exact_match|↑  |0.9256|±  |0.0076|
|stem                           |      2|none            |      |acc        |↑  |0.7926|±  |0.0070|
| - abstract_algebra            |      1|none            |     0|acc        |↑  |0.6600|±  |0.0476|
| - anatomy                     |      1|none            |     0|acc        |↑  |0.7259|±  |0.0385|
| - astronomy                   |      1|none            |     0|acc        |↑  |0.8947|±  |0.0250|
| - college_biology             |      1|none            |     0|acc        |↑  |0.9028|±  |0.0248|
| - college_chemistry           |      1|none            |     0|acc        |↑  |0.6200|±  |0.0488|
| - college_computer_science    |      1|none            |     0|acc        |↑  |0.7700|±  |0.0423|
| - college_mathematics         |      1|none            |     0|acc        |↑  |0.6700|±  |0.0473|
| - college_physics             |      1|none            |     0|acc        |↑  |0.7157|±  |0.0449|
| - computer_security           |      1|none            |     0|acc        |↑  |0.8300|±  |0.0378|
| - conceptual_physics          |      1|none            |     0|acc        |↑  |0.9021|±  |0.0194|
| - electrical_engineering      |      1|none            |     0|acc        |↑  |0.8069|±  |0.0329|
| - elementary_mathematics      |      1|none            |     0|acc        |↑  |0.8466|±  |0.0186|
| - high_school_biology         |      1|none            |     0|acc        |↑  |0.9226|±  |0.0152|
| - high_school_chemistry       |      1|none            |     0|acc        |↑  |0.7537|±  |0.0303|
| - high_school_computer_science|      1|none            |     0|acc        |↑  |0.8700|±  |0.0338|
| - high_school_mathematics     |      1|none            |     0|acc        |↑  |0.6407|±  |0.0293|
| - high_school_physics         |      1|none            |     0|acc        |↑  |0.7219|±  |0.0366|
| - high_school_statistics      |      1|none            |     0|acc        |↑  |0.7917|±  |0.0277|
| - machine_learning            |      1|none            |     0|acc        |↑  |0.7054|±  |0.0433|

|Groups|Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|------|------:|------|------|------|---|-----:|---|-----:|
|stem  |      2|none  |      |acc   |↑  |0.7926|±  | 0.007|

# not-full temp 0.8 (model A)
local-completions (model=/home/ubuntu/models/GPTQModel/Qwen3-Coder-30B-A3B-Instruct-gptqmodel-w4g32-tp8-dump0.01-bs1-s403-d175149d-not-full-exp,base_url=http://127.0.0.1:8011/completions,num_concurrent=32,max_retries=1,tokenized_requests=False), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
|             Tasks             |Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-------------------------------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|arc_challenge                  |      1|none            |     0|acc        |↑  |0.5350|±  |0.0146|
|                               |       |none            |     0|acc_norm   |↑  |0.5572|±  |0.0145|
|gsm8k_platinum_cot             |      3|flexible-extract|     8|exact_match|↑  |0.9413|±  |0.0068|
|                               |       |strict-match    |     8|exact_match|↑  |0.9355|±  |0.0071|
|stem                           |      2|none            |      |acc        |↑  |0.7919|±  |0.0070|
| - abstract_algebra            |      1|none            |     0|acc        |↑  |0.6600|±  |0.0476|
| - anatomy                     |      1|none            |     0|acc        |↑  |0.7259|±  |0.0385|
| - astronomy                   |      1|none            |     0|acc        |↑  |0.8947|±  |0.0250|
| - college_biology             |      1|none            |     0|acc        |↑  |0.8958|±  |0.0255|
| - college_chemistry           |      1|none            |     0|acc        |↑  |0.6200|±  |0.0488|
| - college_computer_science    |      1|none            |     0|acc        |↑  |0.7600|±  |0.0429|
| - college_mathematics         |      1|none            |     0|acc        |↑  |0.6600|±  |0.0476|
| - college_physics             |      1|none            |     0|acc        |↑  |0.7059|±  |0.0453|
| - computer_security           |      1|none            |     0|acc        |↑  |0.8300|±  |0.0378|
| - conceptual_physics          |      1|none            |     0|acc        |↑  |0.9021|±  |0.0194|
| - electrical_engineering      |      1|none            |     0|acc        |↑  |0.8000|±  |0.0333|
| - elementary_mathematics      |      1|none            |     0|acc        |↑  |0.8492|±  |0.0184|
| - high_school_biology         |      1|none            |     0|acc        |↑  |0.9258|±  |0.0149|
| - high_school_chemistry       |      1|none            |     0|acc        |↑  |0.7586|±  |0.0301|
| - high_school_computer_science|      1|none            |     0|acc        |↑  |0.8800|±  |0.0327|
| - high_school_mathematics     |      1|none            |     0|acc        |↑  |0.6259|±  |0.0295|
| - high_school_physics         |      1|none            |     0|acc        |↑  |0.7219|±  |0.0366|
| - high_school_statistics      |      1|none            |     0|acc        |↑  |0.8009|±  |0.0272|
| - machine_learning            |      1|none            |     0|acc        |↑  |0.7143|±  |0.0429|

|Groups|Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|------|------:|------|------|------|---|-----:|---|-----:|
|stem  |      2|none  |      |acc   |↑  |0.7919|±  | 0.007|

# full (model B)
cal-completions (model=/home/ubuntu/models/GPTQModel/Qwen3-Coder-30B-A3B-Instruct-gptqmodel-w4g32-tp8-dump0.01-bs1-s403-d175149d-full-exp,base_url=http://127.0.0.1:8021/completions,num_concurrent=16,max_retries=1,tokenized_requests=False), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
|             Tasks             |Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-------------------------------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|arc_challenge                  |      1|none            |     0|acc        |↑  |0.5503|±  |0.0145|
|                               |       |none            |     0|acc_norm   |↑  |0.5623|±  |0.0145|
|gsm8k_platinum_cot             |      3|flexible-extract|     8|exact_match|↑  |0.9421|±  |0.0067|
|                               |       |strict-match    |     8|exact_match|↑  |0.9313|±  |0.0073|
|stem                           |      2|none            |      |acc        |↑  |0.7919|±  |0.0070|
| - abstract_algebra            |      1|none            |     0|acc        |↑  |0.6600|±  |0.0476|
| - anatomy                     |      1|none            |     0|acc        |↑  |0.7111|±  |0.0392|
| - astronomy                   |      1|none            |     0|acc        |↑  |0.9145|±  |0.0228|
| - college_biology             |      1|none            |     0|acc        |↑  |0.8958|±  |0.0255|
| - college_chemistry           |      1|none            |     0|acc        |↑  |0.5900|±  |0.0494|
| - college_computer_science    |      1|none            |     0|acc        |↑  |0.7400|±  |0.0441|
| - college_mathematics         |      1|none            |     0|acc        |↑  |0.6000|±  |0.0492|
| - college_physics             |      1|none            |     0|acc        |↑  |0.7157|±  |0.0449|
| - computer_security           |      1|none            |     0|acc        |↑  |0.8200|±  |0.0386|
| - conceptual_physics          |      1|none            |     0|acc        |↑  |0.8979|±  |0.0198|
| - electrical_engineering      |      1|none            |     0|acc        |↑  |0.8276|±  |0.0315|
| - elementary_mathematics      |      1|none            |     0|acc        |↑  |0.8519|±  |0.0183|
| - high_school_biology         |      1|none            |     0|acc        |↑  |0.9290|±  |0.0146|
| - high_school_chemistry       |      1|none            |     0|acc        |↑  |0.7685|±  |0.0297|
| - high_school_computer_science|      1|none            |     0|acc        |↑  |0.8600|±  |0.0349|
| - high_school_mathematics     |      1|none            |     0|acc        |↑  |0.6519|±  |0.0290|
| - high_school_physics         |      1|none            |     0|acc        |↑  |0.7616|±  |0.0348|
| - high_school_statistics      |      1|none            |     0|acc        |↑  |0.7870|±  |0.0279|
| - machine_learning            |      1|none            |     0|acc        |↑  |0.6696|±  |0.0446|

|Groups|Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|------|------:|------|------|------|---|-----:|---|-----:|
|stem  |      2|none  |      |acc   |↑  |0.7919|±  | 0.007|

The full aka B scores looks great! Actually wins 2 out 3 in this round of test. PR merge is scheduled for Jan first week of 2026 release via 5.7.0 release.

@Qubitium Qubitium requested a review from ZX-ModelCloud January 4, 2026 06:29
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
@Qubitium Qubitium changed the title Feature: Disable MoE Routing: All experts receive all activations Feature: MoEConfig.Routing control (Bypass + Override) Jan 6, 2026
@Qubitium Qubitium changed the title Feature: MoEConfig.Routing control (Bypass + Override) Feature: MoEConfig.Routing control (Bypass or Override) Jan 6, 2026
@Qubitium Qubitium changed the title Feature: MoEConfig.Routing control (Bypass or Override) Feature: MoE.Routing control (Bypass or Override) Jan 6, 2026
@Qubitium Qubitium self-requested a review January 6, 2026 08:43
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
@Qubitium Qubitium merged commit fc0de11 into ModelCloud:main Jan 6, 2026
1 check passed
ZX-ModelCloud added a commit that referenced this pull request Jan 8, 2026
Qubitium added a commit that referenced this pull request Jan 8, 2026
* Revert "Feature: MoE.Routing control (Bypass or Override) (#2235)"

This reverts commit fc0de11.

* fix revert error

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* fix revert error

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* format

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* ModelTest.load_tokenizer() mark as classmethod

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* add awq test folder

* add awq test folder

* add awq test folder

* fix revert error

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* add awq test folder

* fix finalizer threaeds are not waited on before model.save

* avoid dead thread ref memory leak pile up

* cleanup

* revert ModelTest.MOE_CONFIG

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* keep prev pr memory optimizations

* keep prev pr memory optimizations

* keep prev pr auto_forward_data_parallel optimizations

* keep prev pr gc_mode and wait_for_submodule_finalizers

* cleanup

---------

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: Qubitium <qubitium@modelcloud.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants