Adopt inductor fusion and define quantization fusion pass #4168

wxsIcey · 2025-11-13T07:14:34Z

What this PR does / why we need it?

Adopt inductor fusion and define quantization fusion pass
refer to vllm-project/vllm#23612
needs vllm-project/vllm#28623

Does this PR introduce any user-facing change?

Yes, add new additional_config

How was this patch tested?

def main():
    prompts = [
        "The president of the United States is Mr.",
    ]

    # Create a sampling params object.
    sampling_params = SamplingParams(max_tokens=100, temperature=0.6, top_k=40, top_p=0.95)
    # Create an LLM.
    llm = LLM(
        model="/root/.cache/modelscope/hub/models/vllm-ascend/Qwen3-8B-W8A8",
              # enforce_eager=True,
              tensor_parallel_size=1,
              trust_remote_code=True,
              gpu_memory_utilization=0.7,
              quantization="ascend",
              )

    # Generate texts from the prompts.
    outputs = llm.generate(prompts, sampling_params)
    for output in outputs:
        prompt = output.prompt
        generated_text = output.outputs[0].text
        print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

vLLM version: v0.11.0
vLLM main: vllm-project/vllm@2918c1b

Signed-off-by: Icey <[email protected]>

Signed-off-by: wxsIcey <[email protected]>

github-actions · 2025-11-13T07:14:42Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

github-actions · 2025-11-13T07:18:36Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: wxsIcey <[email protected]>

wxsIcey · 2025-11-13T13:02:28Z

Currently, operator fusion has been achieved through pattern matching using inductors. Using aot-autograd could be a future work, but it has been found that using aot-autograd causes accuracy issues. @whx-sjtu Would you be willing to review it?

whx-sjtu

Nice work. Finally we make it to utilize pattern_matcher of inductor to fuse our add_rms_norm_quant kernel into Fx graph. The whole idea looks good to me with some questions about details as reviewed following.

whx-sjtu · 2025-11-13T13:16:26Z

vllm_ascend/compilation/compiler_interface.py

+    return shape_list
+
+
+class AscendAdaptor(CompilerInterface):


The name AscendAdaptor is too vague; I suggest a more specific one like AscendCompiler.

whx-sjtu · 2025-11-13T13:22:42Z

vllm_ascend/compilation/quant_fusion_pass.py

+          Pattern for AddRMSNormQuant fusion.
+          """
+            output = torch.ops.npu.npu_add_rms_norm(rms_norm_input, residual,
+                                                    rms_norm_weight, 1e-6)


Instead of fixed to 1e-6, the eps should be defined as a static variable of AddRMSNormQuantPattern, with different values of eps corresponding to different pattern objects. Some models might use different eps like 1e-5.

whx-sjtu · 2025-11-13T13:27:30Z

vllm_ascend/compilation/quant_fusion_pass.py

+
+    def __init__(self, vllm_config):
+        super().__init__(vllm_config)
+        self.patterns: PatternMatcherPass = PatternMatcherPass(


The name of self.patterns is a bit confusing here. It should be named as something like self.pattern_match_pass.

whx-sjtu · 2025-11-13T13:30:50Z

vllm_ascend/compilation/quant_fusion_pass.py

+            arg_dtypes, list) and len(arg_dtypes) > 0 else arg_dtypes
+        # We found that the kernel npu_add_rms_norm_quant accept varying data format for different dtypes, therefore, we only
+        # provide the solution on bfloat16 here.
+        return dtype in (torch.bfloat16, )


I don't quiet understand here. Does the format of data also influence pattern matching? Maybe we can define patterns separately for bf16 and fp16 to support them both?

whx-sjtu

I have another question here. With current proposal can we reuse the ready-made fusion passes defined in vLLM, like the SequenceParallel Fusion Pass. Because I'm not very familiar with the stack of the current Fusion pass in vLLM, I'm confirming it here. Reusability is what we expect.

whx-sjtu · 2025-11-13T13:40:22Z

This feature is very important for vllm-ascend. I also hope @jgong5 can take some time to review this PR. Thanks.

wxsIcey · 2025-11-13T13:45:03Z

I have another question here. With current proposal can we reuse the ready-made fusion passes defined in vLLM, like the SequenceParallel Fusion Pass. Because I'm not very familiar with the stack of the current Fusion pass in vLLM, I'm confirming it here. Reusability is what we expect.

Thank you for your reply. The current PR aims to define our own compiler backend to implement custom fusion. Reusing fusion passes in VLLM is my next goal. I will submit an RFC once the solution is finalized.

wxsIcey added 9 commits October 14, 2025 03:29

Define quant fusion pass

f96060b

Signed-off-by: Icey <[email protected]>

fix

e0eba49

Signed-off-by: Icey <[email protected]>

format file

bba9416

Signed-off-by: Icey <[email protected]>

Change to graph fusion

a443661

Signed-off-by: Icey <[email protected]>

tiny fix

34f1805

Signed-off-by: Icey <[email protected]>

tiny fix

38d87cc

Signed-off-by: Icey <[email protected]>

fix graph output

92c395e

Signed-off-by: wxsIcey <[email protected]>

fix

0cfdf1d

Signed-off-by: wxsIcey <[email protected]>

fix

d15fed5

Signed-off-by: wxsIcey <[email protected]>

github-actions bot added module:ops module:core labels Nov 13, 2025

wxsIcey requested review from rjg-lyh and whx-sjtu November 13, 2025 07:18

github-actions bot added the merge-conflicts label Nov 13, 2025

remove auto-grad

3960ea3

Signed-off-by: wxsIcey <[email protected]>

wxsIcey changed the title ~~[wip] Adopt inductor fusion and define quantization fusion pass~~ Adopt inductor fusion and define quantization fusion pass Nov 13, 2025

wxsIcey marked this pull request as ready for review November 13, 2025 12:58

whx-sjtu suggested changes Nov 13, 2025

View reviewed changes

whx-sjtu reviewed Nov 13, 2025

View reviewed changes

wxsIcey requested a review from jgong5 November 13, 2025 13:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adopt inductor fusion and define quantization fusion pass #4168

Adopt inductor fusion and define quantization fusion pass #4168

wxsIcey commented Nov 13, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Nov 13, 2025

Uh oh!

github-actions bot commented Nov 13, 2025

Uh oh!

wxsIcey commented Nov 13, 2025

Uh oh!

whx-sjtu left a comment

Uh oh!

whx-sjtu Nov 13, 2025

Uh oh!

whx-sjtu Nov 13, 2025

Uh oh!

whx-sjtu Nov 13, 2025

Uh oh!

whx-sjtu Nov 13, 2025

Uh oh!

whx-sjtu left a comment

Uh oh!

whx-sjtu commented Nov 13, 2025

Uh oh!

wxsIcey commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Adopt inductor fusion and define quantization fusion pass #4168

Are you sure you want to change the base?

Adopt inductor fusion and define quantization fusion pass #4168

Conversation

wxsIcey commented Nov 13, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Nov 13, 2025

Uh oh!

github-actions bot commented Nov 13, 2025

Uh oh!

wxsIcey commented Nov 13, 2025

Uh oh!

whx-sjtu left a comment

Choose a reason for hiding this comment

Uh oh!

whx-sjtu Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

whx-sjtu Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

whx-sjtu Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

whx-sjtu Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

whx-sjtu left a comment

Choose a reason for hiding this comment

Uh oh!

whx-sjtu commented Nov 13, 2025

Uh oh!

wxsIcey commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wxsIcey commented Nov 13, 2025 •

edited by github-actions bot

Loading