Skip to content

[RVV] add rvv f32 kernels for vcos,vexp,vlog,vsigmoid,vsin,vtanh#9926

Open
ken-unger wants to merge 1 commit intogoogle:masterfrom
ken-unger:unary-trig-rvv
Open

[RVV] add rvv f32 kernels for vcos,vexp,vlog,vsigmoid,vsin,vtanh#9926
ken-unger wants to merge 1 commit intogoogle:masterfrom
ken-unger:unary-trig-rvv

Conversation

@ken-unger
Copy link
Copy Markdown
Contributor

Add rvv kernels for f32-vcos, f32-vexp, f32-vlog, f32-vsigmoid, f32-vsin, f32-vtanh.

Most of this is a simple translation of the simd versions to the rvv implementation.

Tested on qemu & bpi-f3.

Results on bpi-f32 running operator-unary-bench. Generally a ~10x improvement over the previous scalar version.

// Previous (scalar)
xnnpack_cosine_f32/N:3840/real_time                       250815 ns
xnnpack_cosine_f32/N:32640/real_time                     2133473 ns
xnnpack_exp_f32/N:3840/real_time                           99876 ns 
xnnpack_exp_f32/N:32640/real_time                         844337 ns
xnnpack_log_f32/N:3840/real_time                          214770 ns
xnnpack_log_f32/N:32640/real_time                        1821376 ns
xnnpack_sigmoid_f32/N:3840/real_time                      130601 ns
xnnpack_sigmoid_f32/N:32640/real_time                    1130085 ns
xnnpack_sine_f32/N:3840/real_time                         241881 ns
xnnpack_sine_f32/N:32640/real_time                       2051720 ns
xnnpack_tanh_f32/N:3840/real_time                         183325 ns
xnnpack_tanh_f32/N:32640/real_time                       1563040 ns

// New
xnnpack_cosine_f32/N:3840/real_time                        22647 ns
xnnpack_cosine_f32/N:32640/real_time                      192981 ns
xnnpack_exp_f32/N:3840/real_time                           18143 ns
xnnpack_exp_f32/N:32640/real_time                         154458 ns 
xnnpack_log_f32/N:3840/real_time                           21046 ns
xnnpack_log_f32/N:32640/real_time                         178911 ns
xnnpack_sigmoid_f32/N:3840/real_time                       20727 ns
xnnpack_sigmoid_f32/N:32640/real_time                     211814 ns
xnnpack_sine_f32/N:3840/real_time                          22310 ns
xnnpack_sine_f32/N:32640/real_time                        190303 ns
xnnpack_tanh_f32/N:3840/real_time                          19419 ns
xnnpack_tanh_f32/N:32640/real_time                        165337 ns

vp = __riscv_vfmul(vx, vp, vl);

// Evaluate the denominator polynomial q.
vfloat32m1_t vq = __riscv_vfadd(__riscv_vfmul(vx2, vbeta_4, vl), vbeta_2, vl);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately RVV doesn't have a vfmadd that allows for addition of a scalar, which would have been useful here and many other places in this PR. One can only add a vector, but then you waste vector registers. So, as a result I've kept this as vfadd/vfmul allowing this (and similarly other kernels) to get to LMUL=8.

#include "src/xnnpack/common.h"
#include "src/xnnpack/microparams.h"
#include "src/xnnpack/vunary.h"
#include "src/xnnpack/simd/f32-scalar.h" // xnn_f32_i32_t
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered moving xnn_f32_i32_t somewhere common ... but wasn't sure the right place, so left it where it was.

@ken-unger
Copy link
Copy Markdown
Contributor Author

@fbarchard and @dsharletg please review when you are able. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant