Skip to content

Enable ViT torch.compile + CUDA Graph#33

Open
b-mu wants to merge 155 commits intomlperf-inf-mm-q3vl-v6.0from
reduce-vit-kernel-gaps
Open

Enable ViT torch.compile + CUDA Graph#33
b-mu wants to merge 155 commits intomlperf-inf-mm-q3vl-v6.0from
reduce-vit-kernel-gaps

Conversation

@b-mu
Copy link

@b-mu b-mu commented Jan 30, 2026

Purpose

After integration of high-performance kernels for ViT attention, we saw kernel launch overhead. To improve performance, we add two features:

  • torch.compile(): fuse native kernels, e.g. layernorm, elementwise
  • CUDA graph for the ViT: note that the image patch size varies across samples for the ViT, hence we support
    • exact match: we capture a default set of frequently used grids,
    • padding: we also capture grid size in certain buckets, and pad image patch to the nearest bucket,
    • eager: due to memory constraint, we can only capture a limited set of grids, and leave the rest in eager mode, especially the very large grids, since the launch overhead is less noticeable and the benefits would be negligible.

Test Plan

  • Tested end-to-end accuracy with the below configuration
  • Compilation Configs:
    --vllm.cli=--compilation-config='{
      "compile_mm_encoder": true,
      "cudagraph_mm_encoder": true,
      "encoder_cudagraph_verbose": true,
      "encoder_cudagraph_grid_configs": "custom",
      "encoder_cudagraph_max_grid_size": 218,
      "encoder_cudagraph_padded_mode": true,
      "encoder_cudagraph_bucket_sizes": [88, 106, 140, 176, 200],
      "encoder_cudagraph_one_by_one": true
      ...
    }' 

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@b-mu b-mu self-assigned this Jan 30, 2026
@b-mu b-mu changed the title WIP: Reduce Gaps between Kernels in ViT WIP: Enable ViT torch.compile + CUDA Graph Jan 30, 2026
@b-mu b-mu changed the title WIP: Enable ViT torch.compile + CUDA Graph Enable ViT torch.compile + CUDA Graph Feb 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants