request for comments (RPC): algorithm rewrite to enable CPU vector accelerated instructions

Hi pdaqp authors,

The FPGA pull request https://github.com/cvxgrp/cvxpygen/pull/94 directed me here... I was wondering if you are interested in a [SIMD-optimized](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) algorithm code?

Here's the early prototype written in C++20: https://github.com/antonysigma/pdaqp-solver-cpp . Comments are welcome.

Basically, what it does is to consume the `pdaqp.c` and `pdaqp.h` from the `CVXPyGen` tool, and then repack / re-align all matrices and vectors at compile time. Modern compilers are quite capable these days; they can see through the memory aligned data, and then generate SIMD accelerated code.

Here is one example for Ryzen/Intel CPUs, performing the fused `dot(n, param) <= b` with SIMD floating point instructions, representing the active set `HalfspaceID = 1` (please excuse my vocabulary):
```asm
hyperplane::isInsideHalfspaceFn<1ul>(vector_math::Vector<(unsigned short)2, float> const&)>:
;     return hyperplane::isInsideHalfspace<hp_id>{}(p);
    83b0: c5 fb 10 07                   vmovsd  (%rdi), %xmm0
    83b4: c5 f0 57 c9                   vxorps  %xmm1, %xmm1, %xmm1
;         sum += a.data[i] * b.data[i];
    83b8: c4 e2 79 b9 0d 5b 8d ff ff    vfmadd231ss     -0x72a5(%rip), %xmm0, %xmm1 # xmm1 = (xmm0 * mem) + xmm1
                                                                        # 0x111c <.rodata+0x18c>
    83c1: c5 fa 16 c0                   vmovshdup       %xmm0, %xmm0    # xmm0 = xmm0[1,1,3,3]
    83c5: c4 e2 71 99 05 52 8d ff ff    vfmadd132ss     -0x72ae(%rip), %xmm1, %xmm0 # xmm0 = (xmm0 * mem) + xmm1
                                                                        # 0x1120 <.rodata+0x190>
    83ce: c5 fa 10 0d 32 8d ff ff       vmovss  -0x72ce(%rip), %xmm1    # 0x1108 <.rodata+0x178>
;         return dot(normal, parameter) <= offset;
    83d6: c5 f8 2e c8                   vucomiss        %xmm0, %xmm1
    83da: 0f 93 c0                      setae   %al
;     return hyperplane::isInsideHalfspace<hp_id>{}(p);
    83dd: c3                            retq
    83de: cc                            int3
    83df: cc                            int3
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

request for comments (RPC): algorithm rewrite to enable CPU vector accelerated instructions #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

request for comments (RPC): algorithm rewrite to enable CPU vector accelerated instructions #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions