Skip to content

Support Qwen3-Next on ATOM Framework#171

Merged
valarLip merged 23 commits intomainfrom
qwen_next
Feb 14, 2026
Merged

Support Qwen3-Next on ATOM Framework#171
valarLip merged 23 commits intomainfrom
qwen_next

Conversation

@PerryZhang01
Copy link
Contributor

@PerryZhang01 PerryZhang01 commented Jan 29, 2026

📋 Motivation

This PR introduces support for the Qwen3-Next model (available at: https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Thinking/tree/main) within the ATOM framework.

The Qwen3-Next architecture introduces a novel Gated Delta Network (GDN) module, which requires specialized integration to enable efficient inference on ATOM.

🔧 Technical Implementation

To implement Qwen3-Next support in ATOM, we followed vLLM's approach and integrated three core components:

1. Model Architecture (models/qwen3_next.py)

Defines the fundamental structure of the Qwen3-Next model

Implements the model layers and GDN blocks specific to this architecture

2. Metadata Management (metadata/gdn_attn.py)

Adds GDN metadata definitions and construction methods in prefill and decode phase

Handles the specialized attention mechanisms required by the Gated Delta Network

3. Kernel Operations (ops/)

fla_ops: Implements Flash Attention operations optimized for GDN

mamba_ops: Provides Mamba-style state space model operations for efficient sequence processing

✅ Supported Features

Eager Mode Execution: Full support for standard inference

CUDA Graph Optimization: Enables graph capture for reduced kernel launch overhead and improved performance

🧪 Test Plan & Validation Strategy

Server:

export ATOM_ENABLE_ALLREDUCE_RMSNORM_FUSION=0
export ATOM_ENABLE_DS_QKNORM_QUANT_FUSION=0

#rm -rf ~/.cache/atom/
python3 -m atom.entrypoints.openai_server \
        --model /mnt/raid0/models/Qwen3-Next-80B-A3B-Thinking \
        --gpu-memory-utilization 0.8 \
        -tp 8 \
        --level 0 \
        --server-port 8888

Client:

lm_eval --model local-completions \
        --model_args model=/mnt/raid0/models/Qwen3-Next-80B-A3B-Thinking,base_url=http://localhost:8888/v1/completions,num_concurrent=64,max_retries=3,tokenized_requests=False \
        --tasks gsm8k

Test Result

image

🔮 Future Enhancement Roadmap

  • support MTP mode in qwen3-next
  • more fused ops for performance optimization

@PerryZhang01 PerryZhang01 changed the title Qwen next Support Qwen3-Next on ATOM Framework Jan 29, 2026
@PerryZhang01 PerryZhang01 marked this pull request as draft January 29, 2026 12:23
@ganyi1996ppo
Copy link
Contributor

Remove enforce-eager can enable cudagraph, and with cudagraph enabled, we get following gsm8k score:

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.8711|±  |0.0092|
|     |       |strict-match    |     5|exact_match|↑  |0.8476|±  |0.0099|

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit

ruff

⚠️ [ruff] <E731> reported by reviewdog 🐶
Do not assign a lambda expression, use a def

grid = lambda META: (triton.cdiv(dim, META["BLOCK_SIZE"]), nheads)


⚠️ [ruff] <F401> reported by reviewdog 🐶
typing.Any imported but unused

from typing import Any, Dict, Iterable, Optional, Set, Tuple, Union


⚠️ [ruff] <F401> reported by reviewdog 🐶
typing.Dict imported but unused

from typing import Any, Dict, Iterable, Optional, Set, Tuple, Union


⚠️ [ruff] <F401> reported by reviewdog 🐶
typing.Iterable imported but unused

from typing import Any, Dict, Iterable, Optional, Set, Tuple, Union


⚠️ [ruff] <F401> reported by reviewdog 🐶
typing.Set imported but unused

from typing import Any, Dict, Iterable, Optional, Set, Tuple, Union


⚠️ [ruff] <F811> reported by reviewdog 🐶
Redefinition of unused Optional from line 1: Optional redefined here

from typing import Optional


⚠️ [ruff] <F401> reported by reviewdog 🐶
transformers.Qwen3Config imported but unused

from transformers import Qwen3Config


⚠️ [ruff] <F401> reported by reviewdog 🐶
transformers.PretrainedConfig imported but unused

from transformers import PretrainedConfig


⚠️ [ruff] <F401> reported by reviewdog 🐶
atom.model_ops.layernorm.RMSNorm imported but unused

from atom.model_ops.layernorm import RMSNorm, RMSNormGated, GemmaRMSNorm


⚠️ [ruff] <F401> reported by reviewdog 🐶
atom.utils.forward_context.ForwardContext imported but unused

from atom.utils.forward_context import ForwardContext, get_forward_context


⚠️ [ruff] <F811> reported by reviewdog 🐶
Redefinition of unused get_tp_group from line 9: get_tp_group redefined here

get_tp_group,


⚠️ [ruff] <F401> reported by reviewdog 🐶
aiter.dist.parallel_state.get_tp_group imported but unused

get_tp_group,


⚠️ [ruff] <F821> reported by reviewdog 🐶
Undefined name MambaStateDtypeCalculator

return MambaStateDtypeCalculator.gated_delta_net_state_dtype(


⚠️ [ruff] <F821> reported by reviewdog 🐶
Undefined name MambaStateShapeCalculator

return MambaStateShapeCalculator.gated_delta_net_state_shape(


⚠️ [ruff] <F841> reported by reviewdog 🐶
Local variable kv_cache_data is assigned to but never used

kv_cache_data = forward_context.kv_cache_data


⚠️ [ruff] <F821> reported by reviewdog 🐶
Undefined name self

self._forward_core(


⚠️ [ruff] <F811> reported by reviewdog 🐶
Redefinition of unused extract_layer_index from line 217: extract_layer_index redefined here

def extract_layer_index(layer_name: str, num_attn_module: int = 1) -> int:

@ganyi1996ppo ganyi1996ppo marked this pull request as ready for review February 2, 2026 05:29
Copilot AI review requested due to automatic review settings February 2, 2026 05:29
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for the Qwen3-Next model architecture on the ATOM framework, specifically the Qwen3-Next-80B-A3B-Thinking variant. The implementation focuses on supporting the model's unique gated delta net (GDN) linear attention mechanism alongside traditional full attention layers.

Changes:

  • Added Qwen3-Next model configuration and architecture implementation
  • Implemented GDN (Gated Delta Net) attention backend with Mamba SSM operations
  • Extended model loader and runner to handle mixed attention architectures
  • Added supporting utilities for layer index extraction and state management

Reviewed changes

Copilot reviewed 35 out of 38 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
atom/utils/selector.py Added use_gdn parameter to attention backend selection
atom/utils/forward_context.py Added GDNAttentionMetadata dataclass for GDN attention state
atom/models/utils.py Added extract_layer_index utility with multi-attention module support
atom/model_ops/mamba_ops/*.py Implemented Mamba SSM operations (state passing, chunk scan, BMM, cumsum)
atom/model_ops/fla_ops/*.py Added FLA (Flash Linear Attention) operations including chunk-based delta rule
atom/model_ops/layernorm.py Added RMSNormGated and GemmaRMSNorm implementations with SiLU activation
atom/model_ops/base_attention.py Added LinearAttention module for GDN attention
atom/model_ops/attentions/gdn_attn.py Implemented GDN attention backend and metadata builder
atom/model_ops/attention_gdn.py Core GDN attention implementation with convolution and recurrence
atom/model_ops/attention_mha.py Fixed ASM layout detection for mixed KV cache dimensions
atom/model_loader/loader.py Added mamba v2 sharded weight loader and "mtp" parameter filtering
atom/model_engine/model_runner.py Extended runner with GDN support, KV cache allocation for mixed architectures
atom/model_config/qwen3_next.py Added complete Qwen3Next configuration class

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Collaborator

@ChuanLi1101 ChuanLi1101 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave some comments for your reference.

@wuhuikx
Copy link
Contributor

wuhuikx commented Feb 7, 2026

Can we add a recipe like "https://github.com/ROCm/ATOM/blob/main/recipes/Qwen3-235b.md" in another PR to help others run this workload?
At first stage, we can focus on the functionality and accuracy.

@PerryZhang01 PerryZhang01 force-pushed the qwen_next branch 3 times, most recently from 9ea93d9 to 81a8b1c Compare February 9, 2026 03:13
@zejunchen-zejun
Copy link
Contributor

LGTM

@valarLip valarLip merged commit 04221b0 into main Feb 14, 2026
10 of 11 checks passed
@valarLip valarLip deleted the qwen_next branch February 14, 2026 02:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants