Support Gemma3 with Clip fused attention #24187

titaiwangms · 2025-03-26T17:48:19Z

Description

Essentially, the vision model is traced differently (this time it's without mask.), and the input indices of op.Add and op.MatMul can be different. Also, fp16 and fp32 need different tracing patterns (op.Cast).

Add another traced pattern to CLIP attention to cover no attention_mask case
Accept different index of input on op.Add and op.MatMul (be more general)
fp16 and fp32 shows different pattern (op.Cast after op.Softmax)
Refactor test_fastgelu.py to cover torch.onnx.export(..., dynamo=True)
Add gemma3 vision attention (SigLip) test to cover both fp16 and fp32

Motivation and Context

To optimize Gemma3 multi-modal model, the changes are needed. https://huggingface.co/google/gemma-3-4b-it

tianleiwu · 2025-04-01T20:32:46Z

onnxruntime/python/tools/transformers/fusion_attention_clip.py

-                    qk_nodes = self.model.match_parent_path(
-                        matmul_qkv, ["Cast", "Cast", "Softmax", "Add", "Mul", "MatMul"], [0, 0, 0, 0, 0, 0]
-                    )
+                    # If attention mask is not used, we can still match the qk path.


Suggest to change the condition so that layout is more friendly:
Before:

if qk_nodes is None: ... else: add_mask = qk_nodes[1]

To

if qk_nodes is not None: add_mask = qk_nodes[1] else: ...

Another possible change is to use match_parent_paths.

Done in #24280

titaiwangms · 2025-04-02T04:17:35Z

The errors seem to be some docker image authorization issues.

titaiwangms · 2025-04-02T17:59:19Z

Closing this to "not using fork" for triggering CI (authorization to docker images)

titaiwangms · 2025-04-02T18:14:11Z

#24280

titaiwangms added 7 commits March 26, 2025 17:31

draft

5b53e88

support fp32

be02c1b

support fp32 - 2

3bd1896

add dynamo-based graph gelu test

7e8afff

update optimize=True

646cd34

force to add onnx file

6fd3967

add fp16 test

b589751

titaiwangms marked this pull request as ready for review April 1, 2025 17:55

titaiwangms requested review from kunal-vaishnavi and tianleiwu April 1, 2025 18:00

tianleiwu reviewed Apr 1, 2025

View reviewed changes

titaiwangms added 2 commits April 1, 2025 23:15

update ci_test dependency

4ccf50d

Merge branch 'main' into titaiwang/gemma3-vision

f569371

titaiwangms closed this Apr 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support Gemma3 with Clip fused attention #24187

Support Gemma3 with Clip fused attention #24187

Uh oh!

titaiwangms commented Mar 26, 2025 •

edited

Loading

Uh oh!

tianleiwu Apr 1, 2025

Uh oh!

titaiwangms Apr 2, 2025

Uh oh!

titaiwangms commented Apr 2, 2025

Uh oh!

titaiwangms commented Apr 2, 2025

Uh oh!

titaiwangms commented Apr 2, 2025

Uh oh!

Uh oh!

Support Gemma3 with Clip fused attention #24187

Support Gemma3 with Clip fused attention #24187

Uh oh!

Conversation

titaiwangms commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

tianleiwu Apr 1, 2025

Choose a reason for hiding this comment

Uh oh!

titaiwangms Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

titaiwangms commented Apr 2, 2025

Uh oh!

titaiwangms commented Apr 2, 2025

Uh oh!

titaiwangms commented Apr 2, 2025

Uh oh!

Uh oh!

titaiwangms commented Mar 26, 2025 •

edited

Loading