Feature: MoE.Routing control (Bypass or Override)#2235
Merged
Qubitium merged 63 commits intoModelCloud:mainfrom Jan 6, 2026
Merged
Conversation
…when forward_hook_last is True
…n forward_to_all_experts
…ve part of experts, remove optimization for shared experts as it could be not optimal when StopForward raised
…ed. update _masked_hook_wrapper to check hooks_paused
…, except of: dbrx_converted ernie4_5_moe gpt_oss phi3
Collaborator
The |
…rward-whole-dataset-to-each-expert-main-2
…rward-whole-dataset-to-each-expert-main-2
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Qubitium
reviewed
Jan 5, 2026
Qubitium
reviewed
Jan 5, 2026
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
…rward-whole-dataset-to-each-expert-main-2
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Qubitium
approved these changes
Jan 6, 2026
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Qubitium
approved these changes
Jan 6, 2026
ZX-ModelCloud
added a commit
that referenced
this pull request
Jan 8, 2026
This reverts commit fc0de11.
Qubitium
added a commit
that referenced
this pull request
Jan 8, 2026
* Revert "Feature: MoE.Routing control (Bypass or Override) (#2235)" This reverts commit fc0de11. * fix revert error Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * fix revert error Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * format Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * ModelTest.load_tokenizer() mark as classmethod Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * add awq test folder * add awq test folder * add awq test folder * fix revert error Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * add awq test folder * fix finalizer threaeds are not waited on before model.save * avoid dead thread ref memory leak pile up * cleanup * revert ModelTest.MOE_CONFIG Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * keep prev pr memory optimizations * keep prev pr memory optimizations * keep prev pr auto_forward_data_parallel optimizations * keep prev pr gc_mode and wait_for_submodule_finalizers * cleanup --------- Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: Qubitium <qubitium@modelcloud.ai>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
@Qubitium please review.
It looks like LLM has issues in Antigravity when it tried to understand what needed to be changed from git diff and tried to write feature from scratch, so had to babysit it and get out of limits every now and then, also there are no many tests were added.
The model definitions with gate/up/down, w3/w1/w2 expert weights were updated, but several are out of scheme, I have left them untouched:
"#": ("gate_proj:0", "upe_proj:0", "down_proj:1"),- upe_proj maybe typo, idk), and it has custom forwardI have made test run on Qwen3-30B-A3B - the logs shows all samples are passed to each expert weight, and the LLM answers normally after quantization.
Also added a flag
wait_for_layer_completionto minimize VRAM usage.