Skip to content

[Issue]: add_rmsnorm_quant_kernel precision #2236

@Jacob0226

Description

@Jacob0226

Problem Description

This is a follow-up tracking issue for `add_rmsnorm_quant_kernel. Two related PRs are listed below:

  1. Aiter: fix residual_out accuracy of hip rmsnorm fused add #2011
    This PR contains the latest changes to add_rmsnorm_quant_kernel, including updates related to kernel accuracy.
  2. SGLang: [AMD] Skip the flaky test for lora ci test. sgl-project/sglang#20175
    SGLang CI started failing after add_rmsnorm_quant_kernel was merged into AITER. Please refer to this PR for detailed analysis at both the kernel level and model level precision/accuracy test.
    Our experiments show that the AITER kernel (SGLANG_USE_AITER=1) add_rmsnorm_quant_kernel has a larger numerical difference compared to the vLLM kernels (SGLANG_USE_AITER=0) fused_add_rms_norm and rms_norm compared to the ground truth LlamaRMSNorm from the lib transformers.

Operating System

22.04.5 LTS (Jammy Jellyfish)

CPU

AMD EPYC 9655 96-Core Processor

GPU

AMD Instinct MI325X

ROCm Version

ROCm 7.0.0

ROCm Component

No response

Steps to Reproduce

  • Kernel level
    Docker: rocm/sgl-dev:v0.5.9-rocm700-mi30x-20260308
    python test_rmsnorm_3way_consistency.py
    Check rmsnorm_seed_sweep_4panel.png
    test_rmsnorm_3way_consistency.py

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions