modified reduce_max reduce_min reduce_prod for higher_performance and fix a bug in reduce_op.cuh #32974

AnnaTrainingG · 2021-05-18T15:22:29Z

PR types

Function optimization

PR changes

OPs

Describe

modified reduce_min reduce_max reduce_prod reduce_all reduce_any

ctest结果：

Test project /paddle_test/commit/Paddle/build
Start 709: test_max_op
100% tests passed, 0 tests failed out of 1

Total Test time (real) = 7.51 sec
Test project /paddle_test/commit/Paddle/build
Start 719: test_min_op
100% tests passed, 0 tests failed out of 1

Total Test time (real) = 6.81 sec
Test project /paddle_test/commit/Paddle/build
Start 826: test_prod_op

100% tests passed, 0 tests failed out of 1

Total Test time (real) = 7.78 sec

以max 为例进行性能比对：

axis	case	pytorch us	paddle_old us	paddle_new us	加速比 old/new	加速比pytorch/padle_new	是否为benchmark
axis=0	[512 2048]	12.442	28.272	10.821	2.61	1.15	否
axis=0	[128 1024]	5.595	5.181	3.711	1.40	1.51	否
axis=0	[30522 1024]	162.77	1767.3	152.229	11.61	1.07	否
axis=0	[1024 16]	4.703	2.471	3.509	0.70	1.34	否
axis=0	[256 12800]	18.756	81.647	17.734	4.60	1.06	否
axis=0	[256 10240]	15.742	59.888	15.379	3.89	1.02	否
axis=0	[1024 1280]	11.625	33.204	8.399	3.95	1.38	否
axis=0	[32768 1280]	205.95	3504.7	198.15	17.69	1.04	否
axis=0	[30522 10240]	1414.6	32643	1437.523	22.71	0.98	否
axis=0	[256 10240]	15.257	65.901	14.79	4.46	1.03	否
axis=0	[1024 1280]	8.265	31.31	7.158	4.37	1.15	否
axis=0	[32768 1280]	207.58	3501	198.297	17.66	1.05	否
axis=0	[30522 10240]	1415.5	32554	1438.646	22.63	0.98	否
axis=0	[2560 10240]	127.21	585.19	126.275	4.63	1.01	否
axis=0	[10240 1280]	76.668	413.34	67.667	6.11	1.13	否
axis=0	[32768 2560]	390.23	8323.7	383.609	21.70	1.02	否
axis=0	[30522 1024]	160.21	1808.7	151.341	11.95	1.06	否
axis=0	[16 16 1 1]	2.884	1.332	1.44	0.93	2.00	是

benchmark性能数据如下：

axis	case	pytorch	paddle	paddle_new_last	old/new	pytorch/new
axis: [2, 3]	[16 2048 33 33]	171.1	199.8	164.36	1.22	1.04
axis: [1]	[16 8 128]	3.285	4.234	1.322	3.20	2.48
axis: [0]	[16 16 1 1]	2.884	1.568	1.44	1.09	2.00
axis: []	[30522 1024]	146.45	143.12	142.99	1.00	1.02

reduce_sum优化前后性能变化
reduce维度	加速比	与pytorch对比情况
axis = 0	1.4 ～ 22.7	打平或者超过pytorch。
axis = -1	1.0 ～1.3	打平或者超过pytorch，17个case中有2个case差于pytorch，约为pytorch时间的2倍
axis = 1	2.44 ～24.88	打平或者超过pytorch， 17个case中有1个case差于pytorch，约为pytorch时间的2倍
axis =[]	1.0 ~1.03	打平或者超过pytorch， 17个case中有1个case差于pytorch，约为pytorch时间的2倍

update

paddle-bot-old · 2021-05-18T15:22:32Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

…iling123/Paddle into reduce_max_min_prod_all_any

update

…iling123/Paddle into reduce_max_min_prod_all_any

CLAassistant · 2021-05-27T02:18:25Z

All committers have signed the CLA.

paddle/fluid/operators/reduce_ops/reduce_functor_op.h

paddle/fluid/operators/reduce_ops/reduce_max_op.cu

paddle/fluid/operators/reduce_ops/reduce_op.cuh

…iling123/Paddle into reduce_max_min_prod_all_any

paddle/fluid/operators/reduce_ops/reduce_functor_op.h

paddle/fluid/operators/reduce_ops/reduce_max_op.cu

paddle/fluid/operators/reduce_ops/reduce_op.cu.h

update

paddle/fluid/operators/reduce_ops/reduce_op.cu.h

xingfeng01 · 2021-06-21T03:33:13Z

LGTM

ZzSean · 2021-06-21T07:39:55Z

LGTM

Xreki

LGTM

Xreki · 2021-06-22T04:51:08Z

paddle/fluid/operators/reduce_ops/reduce_op.cu.h

  }
 }

+// module function designed for global function


感觉模板可以再简化一下，一些参数没有必要通过模板传，比如ReduceType。关于ReduceType的if判断只执行一次，并没有在循环里面，所以通过输入参数传也不会多影响性能。减少一些模板，应该能够剪短一些编译时间。

对于TransformOp，感觉LaunchReduceKernel和LaunchKernel这两个函数是不需要将TransformOp作为模板的？ReduceKernelFunction看起来是需要的。另外，LaunchReduceKernel和LaunchKernel函数命名缺乏辨识度，不能准确地表达函数的功能。

Xreki · 2021-06-22T05:09:20Z

paddle/fluid/operators/reduce_ops/reduce_op.cu.h

      : reduce_dims_origin(origin_reduce_dims), x_dim(x_dim) {}

  // get the parameters of reduceKernel
  void Run() {


L170的comment：输入参数都建议改成const std::vector &类型。

Xreki · 2021-06-22T05:18:14Z

paddle/fluid/operators/reduce_ops/reduce_op.cu.h


    ReduceKernelFunction<
        Ty, Ty, ReduceOp, detail::IdentityFunctor<Ty>, 128, kRank, kReduceRank,
        ReduceType::kReduceHigherDim><<<grid, block, 0, stream>>>(


L597 - L599 comment：直接写成CUB_REDUCE_TYPE_CASE(ReduceType::kReduceLastDim)这样？若ReduceType不作为模板，也就不需要这个swith case了。

Xreki · 2021-06-22T05:32:30Z

paddle/fluid/operators/reduce_ops/reduce_op.cu.h

    framework::TensorCopy(x, y->place(), y);
    y->Resize(out_dims);
    return;
  }


L684 - L689可以挪到L674或L677前面？

Xreki · 2021-06-22T05:37:48Z

paddle/fluid/operators/reduce_ops/reduce_op.cu.h

+  }
+};
+
+template <typename T, template <typename, typename> class ReduceOp>


上面的实现都可能需要复用到别的算子里面（比如broadcast反向），但ReduceCudaKernel只用于reduce_xxx算子的实现，所以L749 - L771最好不要放到这个头文件里面。

Xreki · 2021-06-22T05:39:19Z

paddle/fluid/operators/reduce_ops/reduce_prod_op.cu

-                                          int, ops::ProdFunctor>,
-                        ops::ReduceKernel<paddle::platform::CUDADeviceContext,
-                                          int64_t, ops::ProdFunctor>);
+REGISTER_OP_CUDA_KERNEL(


从注释来看，原来之所以加这个ifdef是因为原来的reduce采用Eigen实现，而Eigen对double的支持有问题。我们已经全部改成了cuda+cub的方式，或许这个ifdef可以去掉。

AnnaTrainingG and others added 5 commits March 25, 2021 16:46

Merge pull request #1 from PaddlePaddle/develop

7d58b91

update

Merge pull request #2 from PaddlePaddle/develop

1021e08

update

Merge pull request #3 from PaddlePaddle/develop

43f53fe

update

Merge pull request #4 from PaddlePaddle/develop

d25ab26

update

max_min_prod_all_any

a244f18

AnnaTrainingG mentioned this pull request May 20, 2021

change the name of activation kernel #32374

Closed

AnnaTrainingG and others added 16 commits May 24, 2021 16:12

Update reduce_any_op.cu

af4db5d

modified

d804066

Merge branch 'reduce_max_min_prod_all_any' of https://github.com/niul…

c7826e8

…iling123/Paddle into reduce_max_min_prod_all_any

copyright

6ea9e9a

Merge pull request #5 from PaddlePaddle/develop

8c8717f

update

modified and {} for loop

ff0a6e9

max_min_prod_all_any

7ddaf91

Update reduce_any_op.cu

a43af7d

modified

0a70b82

copyright

c91b26b

modified and {} for loop

54651e0

Merge branch 'reduce_max_min_prod_all_any' of https://github.com/niul…

37fbd4c

…iling123/Paddle into reduce_max_min_prod_all_any

add notes for reduce_op.cuh

35411f7

update

8cea954

update

a719c3c

update

2e8ad8f

AnnaTrainingG force-pushed the reduce_max_min_prod_all_any branch from 3069b09 to 2e8ad8f Compare May 27, 2021 02:19

fix a bug in reduce_Op.cuh

a60b90a

AnnaTrainingG changed the title ~~Reduce max min prod all any~~ Reduce max min prod May 28, 2021

AnnaTrainingG changed the title ~~Reduce max min prod~~ modified reduce_max reduce_min reduce_prod for higher_performance and fix a bug in reduce_op.cuh May 28, 2021

reset reduce_any and reduce_all

4bd9644

AnnaTrainingG force-pushed the reduce_max_min_prod_all_any branch from 4ce20bb to 4bd9644 Compare May 28, 2021 11:28

Update reduce_functor_op.h

790173a

AnnaTrainingG force-pushed the reduce_max_min_prod_all_any branch from d2dea81 to 469e0a5 Compare June 2, 2021 11:56

update TensorReduceFunc

8700894

AnnaTrainingG force-pushed the reduce_max_min_prod_all_any branch from 469e0a5 to 8700894 Compare June 2, 2021 11:59

niuliling123 and others added 2 commits June 3, 2021 02:39

add reduce_functor_op.h pragma once

9e32b0f

update BOUND and kMaxTHread

17dcaf8

Xreki reviewed Jun 7, 2021

View reviewed changes

AnnaTrainingG added 5 commits June 9, 2021 03:16

modified max min prod for cu.h

cb2b619

update for struct

6541ffb

code style reduce_op.cu.h

719e435

device to HOSTDEVICE

5045a49

Merge branch 'reduce_max_min_prod_all_any' of https://github.com/niul…

a5dedb1

…iling123/Paddle into reduce_max_min_prod_all_any

Xreki reviewed Jun 11, 2021

View reviewed changes

AnnaTrainingG and others added 3 commits June 15, 2021 07:01

ReduceCudaKernel

fb69e3d

Merge pull request #15 from PaddlePaddle/develop

24633a5

update

REDUCE_SPLIT_BOUNDARY

b841b34

ZzSean reviewed Jun 15, 2021

View reviewed changes

paddle/fluid/operators/reduce_ops/reduce_op.cu.h Outdated Show resolved Hide resolved

Update reduce_op.cu.h

1fda4d5

ZzSean reviewed Jun 15, 2021

View reviewed changes

paddle/fluid/operators/reduce_ops/reduce_op.cu.h Show resolved Hide resolved

ZzSean reviewed Jun 15, 2021

View reviewed changes

paddle/fluid/operators/reduce_ops/reduce_op.cu.h Outdated Show resolved Hide resolved

ZzSean reviewed Jun 15, 2021

View reviewed changes

paddle/fluid/operators/reduce_ops/reduce_op.cu.h Outdated Show resolved Hide resolved

ZzSean reviewed Jun 16, 2021

View reviewed changes

paddle/fluid/operators/reduce_ops/reduce_op.cu.h Outdated Show resolved Hide resolved

AnnaTrainingG added 4 commits June 16, 2021 06:27

rename reduceTensorFunctor

c85ca05

rename TensorReduceFunc

9cc8ac3

delete HOSTDEVICE

140779d

add left_num * grid.z * grid.y

fa3411c

Xreki approved these changes Jun 22, 2021

View reviewed changes

Xreki merged commit 480b284 into PaddlePaddle:develop Jun 22, 2021

modified reduce_max reduce_min reduce_prod for higher_performance and fix a bug in reduce_op.cuh #32974

modified reduce_max reduce_min reduce_prod for higher_performance and fix a bug in reduce_op.cuh #32974

Uh oh!

Conversation

AnnaTrainingG commented May 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR types

PR changes

Describe

Uh oh!

paddle-bot-old bot commented May 18, 2021

Uh oh!

CLAassistant commented May 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xingfeng01 commented Jun 21, 2021

Uh oh!

ZzSean commented Jun 21, 2021

Uh oh!

Xreki left a comment

Choose a reason for hiding this comment

Uh oh!

Xreki Jun 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Xreki Jun 22, 2021

Choose a reason for hiding this comment

Uh oh!

Xreki Jun 22, 2021

Choose a reason for hiding this comment

Uh oh!

Xreki Jun 22, 2021

Choose a reason for hiding this comment

Uh oh!

Xreki Jun 22, 2021

Choose a reason for hiding this comment

Uh oh!

Xreki Jun 22, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

AnnaTrainingG commented May 18, 2021 •

edited

Loading

CLAassistant commented May 27, 2021 •

edited

Loading

Xreki Jun 22, 2021 •

edited

Loading