Skip to content

request for comments (RPC): algorithm rewrite to enable CPU vector accelerated instructions #5

@antonysigma

Description

@antonysigma

Hi pdaqp authors,

The FPGA pull request cvxgrp/cvxpygen#94 directed me here... I was wondering if you are interested in a SIMD-optimized algorithm code?

Here's the early prototype written in C++20: https://github.com/antonysigma/pdaqp-solver-cpp . Comments are welcome.

Basically, what it does is to consume the pdaqp.c and pdaqp.h from the CVXPyGen tool, and then repack / re-align all matrices and vectors at compile time. Modern compilers are quite capable these days; they can see through the memory aligned data, and then generate SIMD accelerated code.

Here is one example for Ryzen/Intel CPUs, performing the fused dot(n, param) <= b with SIMD floating point instructions, representing the active set HalfspaceID = 1 (please excuse my vocabulary):

hyperplane::isInsideHalfspaceFn<1ul>(vector_math::Vector<(unsigned short)2, float> const&)>:
;     return hyperplane::isInsideHalfspace<hp_id>{}(p);
    83b0: c5 fb 10 07                   vmovsd  (%rdi), %xmm0
    83b4: c5 f0 57 c9                   vxorps  %xmm1, %xmm1, %xmm1
;         sum += a.data[i] * b.data[i];
    83b8: c4 e2 79 b9 0d 5b 8d ff ff    vfmadd231ss     -0x72a5(%rip), %xmm0, %xmm1 # xmm1 = (xmm0 * mem) + xmm1
                                                                        # 0x111c <.rodata+0x18c>
    83c1: c5 fa 16 c0                   vmovshdup       %xmm0, %xmm0    # xmm0 = xmm0[1,1,3,3]
    83c5: c4 e2 71 99 05 52 8d ff ff    vfmadd132ss     -0x72ae(%rip), %xmm1, %xmm0 # xmm0 = (xmm0 * mem) + xmm1
                                                                        # 0x1120 <.rodata+0x190>
    83ce: c5 fa 10 0d 32 8d ff ff       vmovss  -0x72ce(%rip), %xmm1    # 0x1108 <.rodata+0x178>
;         return dot(normal, parameter) <= offset;
    83d6: c5 f8 2e c8                   vucomiss        %xmm0, %xmm1
    83da: 0f 93 c0                      setae   %al
;     return hyperplane::isInsideHalfspace<hp_id>{}(p);
    83dd: c3                            retq
    83de: cc                            int3
    83df: cc                            int3

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions