-
Notifications
You must be signed in to change notification settings - Fork 39
Enhance Performance by Adding index_t
Template Parameter
#1610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances performance by introducing a new index_t template parameter for average pooling kernels, allowing more efficient indexing.
- Added index_t as a template parameter in AvgPool2dKernelFunctor and AvgPool2dChannelsLastKernelFunctor
- Updated associated launching functions and constructor parameters to use index_t instead of int64_t
Comments suppressed due to low confidence (2)
src/ATen/native/xpu/sycl/AveragePool2dKernels.cpp:25
- [nitpick] Consider adding a space after the comma before 'typename index_t' for improved readability and consistency in template parameter lists.
template <typename scalar_t, typename accscalar_t,typename index_t>
src/ATen/native/xpu/sycl/AveragePool2dKernels.cpp:667
- [nitpick] The dispatch macro identifier 'avg_pool2d_backward_xpu' appears inconsistent with the forward kernel functionality; consider renaming it to better reflect the context, such as 'avg_pool2d_xpu'.
AT_DISPATCH_INDEX_TYPES(at::native::canUse32BitIndexMath(output, INT_MAX) ? ScalarType::Int : ScalarType::Long, "avg_pool2d_backward_xpu", [&] {
@chunhuanMeng , pls. share the detailed performance improvement number here. Meanwhile, the pr description is meaningless. Pls elaborate on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update PR description.
Performance Comparison
|
Select
int64_t
orint
data types at the start of GPU kernel computation, using templates to pass the chosen type. This optimization improves performance for smaller shapes.Performance Improvement Reasons: