-
Notifications
You must be signed in to change notification settings - Fork 5
adapt for opitimized ps_gluon_pa #117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adapts the code to use an optimized ps_gluon_pa implementation for better performance with small batch sizes. The changes fix a typo in variable naming and update the paged attention implementation to use the optimized version with recommended partition splits.
- Fixed typo:
attn_matadata→attn_metadata - Replaced manual partition calculation with
get_recommended_splits()function - Updated to use
torch.ops.aiter.pa_decode_gluonand changed parameter fromone_shottops
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| atom/model_ops/attentions/aiter_attention.py | Fixed typo in variable name attn_matadata to attn_metadata and removed extra blank line |
| atom/model_ops/attention_mha.py | Refactored partition calculation to use get_recommended_splits(), updated kernel call to use torch ops namespace, and changed one_shot parameter to ps=True |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
0245a3c to
17ac2d9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Motivation
adapt for opitimized ps_gluon_pa, for a better perfomance for small bs