Support Qwen3-Next on ATOM Framework by PerryZhang01 · Pull Request #171 · ROCm/ATOM

PerryZhang01 · 2026-01-29T12:19:19Z

📋 Motivation

This PR introduces support for the Qwen3-Next model (available at: https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Thinking/tree/main) within the ATOM framework.

The Qwen3-Next architecture introduces a novel Gated Delta Network (GDN) module, which requires specialized integration to enable efficient inference on ATOM.

🔧 Technical Implementation

To implement Qwen3-Next support in ATOM, we followed vLLM's approach and integrated three core components:

1. Model Architecture (models/qwen3_next.py)

Defines the fundamental structure of the Qwen3-Next model

Implements the model layers and GDN blocks specific to this architecture

2. Metadata Management (metadata/gdn_attn.py)

Adds GDN metadata definitions and construction methods in prefill and decode phase

Handles the specialized attention mechanisms required by the Gated Delta Network

3. Kernel Operations (ops/)

fla_ops: Implements Flash Attention operations optimized for GDN

mamba_ops: Provides Mamba-style state space model operations for efficient sequence processing

✅ Supported Features

Eager Mode Execution: Full support for standard inference

CUDA Graph Optimization: Enables graph capture for reduced kernel launch overhead and improved performance

🧪 Test Plan & Validation Strategy

Server:

export ATOM_ENABLE_ALLREDUCE_RMSNORM_FUSION=0
export ATOM_ENABLE_DS_QKNORM_QUANT_FUSION=0

#rm -rf ~/.cache/atom/
python3 -m atom.entrypoints.openai_server \
        --model /mnt/raid0/models/Qwen3-Next-80B-A3B-Thinking \
        --gpu-memory-utilization 0.8 \
        -tp 8 \
        --level 0 \
        --server-port 8888

Client:

lm_eval --model local-completions \
        --model_args model=/mnt/raid0/models/Qwen3-Next-80B-A3B-Thinking,base_url=http://localhost:8888/v1/completions,num_concurrent=64,max_retries=3,tokenized_requests=False \
        --tasks gsm8k

Test Result

🔮 Future Enhancement Roadmap

support MTP mode in qwen3-next
more fused ops for performance optimization

ganyi1996ppo · 2026-02-02T05:23:01Z

Remove enforce-eager can enable cudagraph, and with cudagraph enabled, we get following gsm8k score:

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.8711|±  |0.0092|
|     |       |strict-match    |     5|exact_match|↑  |0.8476|±  |0.0099|

github-actions

Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit

ruff

⚠️ [ruff] <E731> _{reported by reviewdog 🐶}
Do not assign a lambda expression, use a def

ATOM/atom/model_ops/mamba_ops/ssd_state_passing.py

Line 130 in bcc6170

grid = lambda META: (triton.cdiv(dim, META["BLOCK_SIZE"]), nheads)

⚠️ [ruff] <F401> _{reported by reviewdog 🐶}
typing.Any imported but unused

ATOM/atom/models/qwen3_next.py

Line 1 in bcc6170

from typing import Any, Dict, Iterable, Optional, Set, Tuple, Union

⚠️ [ruff] <F401> _{reported by reviewdog 🐶}
typing.Dict imported but unused

ATOM/atom/models/qwen3_next.py

Line 1 in bcc6170

from typing import Any, Dict, Iterable, Optional, Set, Tuple, Union

⚠️ [ruff] <F401> _{reported by reviewdog 🐶}
typing.Iterable imported but unused

ATOM/atom/models/qwen3_next.py

Line 1 in bcc6170

from typing import Any, Dict, Iterable, Optional, Set, Tuple, Union

⚠️ [ruff] <F401> _{reported by reviewdog 🐶}
typing.Set imported but unused

ATOM/atom/models/qwen3_next.py

Line 1 in bcc6170

from typing import Any, Dict, Iterable, Optional, Set, Tuple, Union

⚠️ [ruff] <F811> _{reported by reviewdog 🐶}
Redefinition of unused Optional from line 1: Optional redefined here

ATOM/atom/models/qwen3_next.py

Line 10 in bcc6170

from typing import Optional

⚠️ [ruff] <F401> _{reported by reviewdog 🐶}
transformers.Qwen3Config imported but unused

ATOM/atom/models/qwen3_next.py

Line 11 in bcc6170

from transformers import Qwen3Config

⚠️ [ruff] <F401> _{reported by reviewdog 🐶}
transformers.PretrainedConfig imported but unused

ATOM/atom/models/qwen3_next.py

Line 12 in bcc6170

from transformers import PretrainedConfig

⚠️ [ruff] <F401> _{reported by reviewdog 🐶}
atom.model_ops.layernorm.RMSNorm imported but unused

ATOM/atom/models/qwen3_next.py

Line 20 in bcc6170

from atom.model_ops.layernorm import RMSNorm, RMSNormGated, GemmaRMSNorm

⚠️ [ruff] <F401> _{reported by reviewdog 🐶}
atom.utils.forward_context.ForwardContext imported but unused

ATOM/atom/models/qwen3_next.py

Line 38 in bcc6170

from atom.utils.forward_context import ForwardContext, get_forward_context

⚠️ [ruff] <F811> _{reported by reviewdog 🐶}
Redefinition of unused get_tp_group from line 9: get_tp_group redefined here

ATOM/atom/models/qwen3_next.py

Line 54 in bcc6170

get_tp_group,

⚠️ [ruff] <F401> _{reported by reviewdog 🐶}
aiter.dist.parallel_state.get_tp_group imported but unused

ATOM/atom/models/qwen3_next.py

Line 54 in bcc6170

get_tp_group,

⚠️ [ruff] <F821> _{reported by reviewdog 🐶}
Undefined name MambaStateDtypeCalculator

ATOM/atom/models/qwen3_next.py

Line 420 in bcc6170

return MambaStateDtypeCalculator.gated_delta_net_state_dtype(

⚠️ [ruff] <F821> _{reported by reviewdog 🐶}
Undefined name MambaStateShapeCalculator

ATOM/atom/models/qwen3_next.py

Line 425 in bcc6170

return MambaStateShapeCalculator.gated_delta_net_state_shape(

⚠️ [ruff] <F841> _{reported by reviewdog 🐶}
Local variable kv_cache_data is assigned to but never used

ATOM/atom/models/qwen3_next.py

Line 714 in bcc6170

kv_cache_data = forward_context.kv_cache_data

⚠️ [ruff] <F821> _{reported by reviewdog 🐶}
Undefined name self

ATOM/atom/models/qwen3_next.py

Line 1226 in bcc6170

self._forward_core(

⚠️ [ruff] <F811> _{reported by reviewdog 🐶}
Redefinition of unused extract_layer_index from line 217: extract_layer_index redefined here

ATOM/atom/models/utils.py

Line 258 in bcc6170

def extract_layer_index(layer_name: str, num_attn_module: int = 1) -> int:

Copilot

Pull request overview

This PR adds support for the Qwen3-Next model architecture on the ATOM framework, specifically the Qwen3-Next-80B-A3B-Thinking variant. The implementation focuses on supporting the model's unique gated delta net (GDN) linear attention mechanism alongside traditional full attention layers.

Changes:

Added Qwen3-Next model configuration and architecture implementation
Implemented GDN (Gated Delta Net) attention backend with Mamba SSM operations
Extended model loader and runner to handle mixed attention architectures
Added supporting utilities for layer index extraction and state management

Reviewed changes

Copilot reviewed 35 out of 38 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
atom/utils/selector.py	Added `use_gdn` parameter to attention backend selection
atom/utils/forward_context.py	Added `GDNAttentionMetadata` dataclass for GDN attention state
atom/models/utils.py	Added `extract_layer_index` utility with multi-attention module support
atom/model_ops/mamba_ops/*.py	Implemented Mamba SSM operations (state passing, chunk scan, BMM, cumsum)
atom/model_ops/fla_ops/*.py	Added FLA (Flash Linear Attention) operations including chunk-based delta rule
atom/model_ops/layernorm.py	Added `RMSNormGated` and `GemmaRMSNorm` implementations with SiLU activation
atom/model_ops/base_attention.py	Added `LinearAttention` module for GDN attention
atom/model_ops/attentions/gdn_attn.py	Implemented GDN attention backend and metadata builder
atom/model_ops/attention_gdn.py	Core GDN attention implementation with convolution and recurrence
atom/model_ops/attention_mha.py	Fixed ASM layout detection for mixed KV cache dimensions
atom/model_loader/loader.py	Added mamba v2 sharded weight loader and "mtp" parameter filtering
atom/model_engine/model_runner.py	Extended runner with GDN support, KV cache allocation for mixed architectures
atom/model_config/qwen3_next.py	Added complete Qwen3Next configuration class

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

atom/models/utils.py

atom/model_ops/mamba_ops/mamba_ssm.py

atom/model_ops/fla_ops/layernorm_guard.py

atom/model_loader/loader.py

atom/model_engine/model_runner.py

atom/model_ops/attentions/gdn_attn.py

atom/model_config/qwen3_next.py

ChuanLi1101

Leave some comments for your reference.

atom/model_engine/model_runner.py

atom/model_loader/loader.py

atom/models/utils.py

atom/model_ops/base_attention.py

atom/model_ops/layernorm.py

wuhuikx · 2026-02-07T10:39:22Z

Can we add a recipe like "https://github.com/ROCm/ATOM/blob/main/recipes/Qwen3-235b.md" in another PR to help others run this workload?
At first stage, we can focus on the functionality and accuracy.

zejunchen-zejun · 2026-02-09T03:24:52Z

LGTM

Signed-off-by: ganyi <ygan@amd.com>

…cudagraph Signed-off-by: ganyi <ygan@amd.com>

PerryZhang01 changed the title ~~Qwen next~~ Support Qwen3-Next on ATOM Framework Jan 29, 2026

PerryZhang01 marked this pull request as draft January 29, 2026 12:23

github-actions bot reviewed Feb 2, 2026

View reviewed changes

ganyi1996ppo marked this pull request as ready for review February 2, 2026 05:29

Copilot AI review requested due to automatic review settings February 2, 2026 05:29

Copilot AI reviewed Feb 2, 2026

View reviewed changes

ChuanLi1101 reviewed Feb 2, 2026

View reviewed changes

PerryZhang01 force-pushed the qwen_next branch from d1b2c60 to e0d77a9 Compare February 6, 2026 10:24

wuhuikx reviewed Feb 7, 2026

View reviewed changes

atom/model_ops/base_attention.py Outdated Show resolved Hide resolved

atom/model_ops/base_attention.py Outdated Show resolved Hide resolved

atom/model_ops/layernorm.py Show resolved Hide resolved

PerryZhang01 force-pushed the qwen_next branch 3 times, most recently from 9ea93d9 to 81a8b1c Compare February 9, 2026 03:13

PerryZhang01 force-pushed the qwen_next branch from 4c32891 to 02a001b Compare February 9, 2026 09:07

zgplvyou and others added 14 commits February 12, 2026 05:15

[Qwen-Next](feat): support qwen next eager mode

fc1e2b1

[Qwen-next](fix): fix accuracy bug in rope

f064738

[Qwen-Next](fix): recover kv cache num in full attention

6c63a6f

[Qwen-Next](fix): fix reshape cache layout

e475aa9

[Qwen-Next](feat): support multi batch in qwen-next

b120b22

clean metadat build, and fix block table build bug

641843a

Signed-off-by: ganyi <ygan@amd.com>

rewrite cache allocate and enable torch.compile + full_and_piecewise …

76443a3

…cudagraph Signed-off-by: ganyi <ygan@amd.com>

[Qwen-Next](fix): fix some formatting issues

e78620e

[Qwen-Next](fix): fix some errors

95925f9

[Qwen-Next](rm): remove extra ops files

c771664

[Qwen-Next](rm): remove gdn layer in model file

357782a

[Qwen-Next](fix): fix black format

9114b7e

[Qwen-Next](fix): fix ruff format

1179152

[Qwen-Next](fix): fix format error

b78abbf

zgplvyou added 7 commits February 12, 2026 05:16

[Qwen-Next](fix): fix typo in function name

d180d2f

[Qwen-Next](fix): fix NoneType error

af2d43d

[Qwen-Next](fix): recover weight loader

215768e

[Qwen-Next](rm): remove kda file

3acca9d

[Qwen-Next](rm): recover weight loader

e7296e7

[Qwen-Next](rm): remove rmsnorm func in mamba ops

1a4c389

[Qwen-Next](fix): fix kv cache layout

934fe2a

PerryZhang01 force-pushed the qwen_next branch from 4e0949c to 9bbd64c Compare February 12, 2026 05:16

[Qwen-Next](rm): remove fused norm ops

cf3c248

PerryZhang01 force-pushed the qwen_next branch from 9bbd64c to cf3c248 Compare February 12, 2026 05:29

[Qwen-Next](fix): fix weight loader in mtp

8c5419c

valarLip approved these changes Feb 14, 2026

View reviewed changes

valarLip merged commit 04221b0 into main Feb 14, 2026
10 of 11 checks passed

valarLip deleted the qwen_next branch February 14, 2026 02:18

Conversation

PerryZhang01 commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📋 Motivation

🔧 Technical Implementation

1. Model Architecture (models/qwen3_next.py)

2. Metadata Management (metadata/gdn_attn.py)

3. Kernel Operations (ops/)

✅ Supported Features

🧪 Test Plan & Validation Strategy

Test Result

🔮 Future Enhancement Roadmap

Uh oh!

ganyi1996ppo commented Feb 2, 2026

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ChuanLi1101 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wuhuikx commented Feb 7, 2026

Uh oh!

zejunchen-zejun commented Feb 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

PerryZhang01 commented Jan 29, 2026 •

edited

Loading