[webgpu] optimize SkipLayerNormalization operator #24164

xhcao · 2025-03-25T08:48:27Z

If the sizes of batch_size and sequence_length are ones, split the hidden_size to improve parallelism.

Description

Motivation and Context

If the sizes of batch_size and sequence_length are ones, split the hidden_size to improve parallelism.

xhcao · 2025-03-25T09:04:46Z

The outputs of SkipLayerNormalization operator in phi3.5 are output and input_skip_bias_sum, and their shapes are [batch_size, sequence_length, hidden_size], on decoding stage, the batch_size and sequence_length are always 1, the outputs' shapes are [1, 1, 3072], there is only one work group, which does not use gpu resources well.
For this situation, 1. the PR splits hidden dim to add workgroup, although adds workload, but reduces the average workload of one work group. 2. Handle output and input_skip_bias_sum in different work groups. If so, the total work groups are 12 for the shape [1, 1, 3072].
Use Intel GPA tool to capture the data, from ~20us to ~10us.

@jchen10 @hujiajie PTAL, thanks

guschmue · 2025-03-26T17:38:03Z

/azp run ONNX Runtime Web CI Pipeline,Windows GPU CI Pipeline,Linux Android Emulator QNN CI Pipeline

guschmue · 2025-03-26T17:38:09Z

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

guschmue · 2025-03-26T17:38:15Z

/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline,Big Models

azure-pipelines · 2025-03-26T17:38:17Z

Azure Pipelines successfully started running 2 pipeline(s).

guschmue · 2025-03-26T17:38:20Z

/azp run Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI

azure-pipelines · 2025-03-26T17:38:31Z

Azure Pipelines successfully started running 3 pipeline(s).

azure-pipelines · 2025-03-26T17:38:31Z

Azure Pipelines successfully started running 2 pipeline(s).

azure-pipelines · 2025-03-26T17:38:36Z

Azure Pipelines successfully started running 7 pipeline(s).

guschmue · 2025-04-01T00:58:05Z

lgtm.
CI pipelines changed - can you merge with main?

xhcao · 2025-04-01T04:08:54Z

lgtm. CI pipelines changed - can you merge with main?

Updated

guschmue · 2025-04-08T15:40:24Z

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

azure-pipelines · 2025-04-08T15:40:46Z

Azure Pipelines successfully started running 5 pipeline(s).

[webgpu] optimize SkipLayerNormalization operator

32dd4cd

If the sizes of batch_size and sequence_length are ones, split the hidden_size to improve parallelism.

xhcao force-pushed the skip-norm-layer branch from fc3af9b to 32dd4cd Compare March 25, 2025 08:50

guschmue added the ep:WebGPU ort-web webgpu provider label Mar 25, 2025

Merge remote-tracking branch 'upstream/main' into skip-norm-layer

5aa04ea

guschmue approved these changes Apr 8, 2025

View reviewed changes

guschmue merged commit 0acb048 into microsoft:main Apr 8, 2025
60 of 69 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[webgpu] optimize SkipLayerNormalization operator #24164

[webgpu] optimize SkipLayerNormalization operator #24164

xhcao commented Mar 25, 2025

xhcao commented Mar 25, 2025

guschmue commented Mar 26, 2025

guschmue commented Mar 26, 2025

guschmue commented Mar 26, 2025

azure-pipelines bot commented Mar 26, 2025

guschmue commented Mar 26, 2025

azure-pipelines bot commented Mar 26, 2025

azure-pipelines bot commented Mar 26, 2025

azure-pipelines bot commented Mar 26, 2025

guschmue commented Apr 1, 2025

xhcao commented Apr 1, 2025

guschmue commented Apr 8, 2025

azure-pipelines bot commented Apr 8, 2025

[webgpu] optimize SkipLayerNormalization operator #24164

[webgpu] optimize SkipLayerNormalization operator #24164

Conversation

xhcao commented Mar 25, 2025

Description

Motivation and Context

xhcao commented Mar 25, 2025

guschmue commented Mar 26, 2025

guschmue commented Mar 26, 2025

guschmue commented Mar 26, 2025

azure-pipelines bot commented Mar 26, 2025

guschmue commented Mar 26, 2025

azure-pipelines bot commented Mar 26, 2025

azure-pipelines bot commented Mar 26, 2025

azure-pipelines bot commented Mar 26, 2025

guschmue commented Apr 1, 2025

xhcao commented Apr 1, 2025

guschmue commented Apr 8, 2025

azure-pipelines bot commented Apr 8, 2025