-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Hi pdaqp authors,
The FPGA pull request cvxgrp/cvxpygen#94 directed me here... I was wondering if you are interested in a SIMD-optimized algorithm code?
Here's the early prototype written in C++20: https://github.com/antonysigma/pdaqp-solver-cpp . Comments are welcome.
Basically, what it does is to consume the pdaqp.c and pdaqp.h from the CVXPyGen tool, and then repack / re-align all matrices and vectors at compile time. Modern compilers are quite capable these days; they can see through the memory aligned data, and then generate SIMD accelerated code.
Here is one example for Ryzen/Intel CPUs, performing the fused dot(n, param) <= b with SIMD floating point instructions, representing the active set HalfspaceID = 1 (please excuse my vocabulary):
hyperplane::isInsideHalfspaceFn<1ul>(vector_math::Vector<(unsigned short)2, float> const&)>:
; return hyperplane::isInsideHalfspace<hp_id>{}(p);
83b0: c5 fb 10 07 vmovsd (%rdi), %xmm0
83b4: c5 f0 57 c9 vxorps %xmm1, %xmm1, %xmm1
; sum += a.data[i] * b.data[i];
83b8: c4 e2 79 b9 0d 5b 8d ff ff vfmadd231ss -0x72a5(%rip), %xmm0, %xmm1 # xmm1 = (xmm0 * mem) + xmm1
# 0x111c <.rodata+0x18c>
83c1: c5 fa 16 c0 vmovshdup %xmm0, %xmm0 # xmm0 = xmm0[1,1,3,3]
83c5: c4 e2 71 99 05 52 8d ff ff vfmadd132ss -0x72ae(%rip), %xmm1, %xmm0 # xmm0 = (xmm0 * mem) + xmm1
# 0x1120 <.rodata+0x190>
83ce: c5 fa 10 0d 32 8d ff ff vmovss -0x72ce(%rip), %xmm1 # 0x1108 <.rodata+0x178>
; return dot(normal, parameter) <= offset;
83d6: c5 f8 2e c8 vucomiss %xmm0, %xmm1
83da: 0f 93 c0 setae %al
; return hyperplane::isInsideHalfspace<hp_id>{}(p);
83dd: c3 retq
83de: cc int3
83df: cc int3