Skip to content

[Feature]: AITER fused-moe to support inter_dim 384 for amd/MiniMax-M2.1-MXFP4 tp=4 (used in vLLM) #2191

@hongxiayang

Description

@hongxiayang

(1) Context: vllm issue report vllm-project/vllm#35637

Sample error message:

RuntimeError: wrong! device_gemm with the specified compilation parameters does not support this GEMM problem

(2) Tuning for fused_moe with inter_dim 384 failed.

Sample entry for tuning:

token,model_dim,inter_dim,expert,topk,act_type,dtype,q_dtype_a,q_dtype_w,q_type,use_g1u1,doweight_stage1
2048,3072,384,256,8,ActivationType.Silu,torch.bfloat16,torch.float4_e2m1fn_x2,torch.float4_e2m1fn_x2,QuantType.per_1x32,1,0

(3) Feature request:
Supporting this use case in AITER can enable vLLM to continue using the aiter-fused-moe for better performance.

Thank you.

Operating System

No response

GPU

gfx950

ROCm Component

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions