-
Notifications
You must be signed in to change notification settings - Fork 5.9k
modified reduce_max reduce_min reduce_prod for higher_performance and fix a bug in reduce_op.cuh #32974
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
modified reduce_max reduce_min reduce_prod for higher_performance and fix a bug in reduce_op.cuh #32974
Conversation
|
Thanks for your contribution! |
…iling123/Paddle into reduce_max_min_prod_all_any
…iling123/Paddle into reduce_max_min_prod_all_any
3069b09 to
2e8ad8f
Compare
4ce20bb to
4bd9644
Compare
d2dea81 to
469e0a5
Compare
469e0a5 to
8700894
Compare
|
LGTM |
1 similar comment
|
LGTM |
Xreki
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| } | ||
| } | ||
|
|
||
| // module function designed for global function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
感觉模板可以再简化一下,一些参数没有必要通过模板传,比如ReduceType。关于ReduceType的if判断只执行一次,并没有在循环里面,所以通过输入参数传也不会多影响性能。减少一些模板,应该能够剪短一些编译时间。
对于TransformOp,感觉LaunchReduceKernel和LaunchKernel这两个函数是不需要将TransformOp作为模板的?ReduceKernelFunction看起来是需要的。另外,LaunchReduceKernel和LaunchKernel函数命名缺乏辨识度,不能准确地表达函数的功能。
| : reduce_dims_origin(origin_reduce_dims), x_dim(x_dim) {} | ||
|
|
||
| // get the parameters of reduceKernel | ||
| void Run() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L170的comment:输入参数都建议改成const std::vector &类型。
|
|
||
| ReduceKernelFunction< | ||
| Ty, Ty, ReduceOp, detail::IdentityFunctor<Ty>, 128, kRank, kReduceRank, | ||
| ReduceType::kReduceHigherDim><<<grid, block, 0, stream>>>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L597 - L599 comment:直接写成CUB_REDUCE_TYPE_CASE(ReduceType::kReduceLastDim)这样?若ReduceType不作为模板,也就不需要这个swith case了。
| framework::TensorCopy(x, y->place(), y); | ||
| y->Resize(out_dims); | ||
| return; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L684 - L689可以挪到L674或L677前面?
| } | ||
| }; | ||
|
|
||
| template <typename T, template <typename, typename> class ReduceOp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
上面的实现都可能需要复用到别的算子里面(比如broadcast反向),但ReduceCudaKernel只用于reduce_xxx算子的实现,所以L749 - L771最好不要放到这个头文件里面。
| int, ops::ProdFunctor>, | ||
| ops::ReduceKernel<paddle::platform::CUDADeviceContext, | ||
| int64_t, ops::ProdFunctor>); | ||
| REGISTER_OP_CUDA_KERNEL( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
从注释来看,原来之所以加这个ifdef是因为原来的reduce采用Eigen实现,而Eigen对double的支持有问题。我们已经全部改成了cuda+cub的方式,或许这个ifdef可以去掉。
PR types
Function optimization
PR changes
OPs
Describe
modified reduce_min reduce_max reduce_prod reduce_all reduce_any
ctest结果:
Test project /paddle_test/commit/Paddle/build
Start 709: test_max_op
100% tests passed, 0 tests failed out of 1
Total Test time (real) = 7.51 sec
Test project /paddle_test/commit/Paddle/build
Start 719: test_min_op
100% tests passed, 0 tests failed out of 1
Total Test time (real) = 6.81 sec
Test project /paddle_test/commit/Paddle/build
Start 826: test_prod_op
100% tests passed, 0 tests failed out of 1
Total Test time (real) = 7.78 sec
以max 为例进行性能比对:
benchmark性能数据如下: