simdpp::float32x4 foo(simdpp::float32x4 a, simdpp::float32x4 b)
{
return simdpp::shuffle4x2<4,0,1,2>(a,b);
}
foo(simdpp::arch_sse4p1::float32<4u, void>, simdpp::arch_sse4p1::float32<4u, void>):
00000000000001c0 pushq %rbp
00000000000001c1 movq %rsp, %rbp
00000000000001c4 shufps $0x90, %xmm0, %xmm1
00000000000001c8 movaps %xmm1, %xmm0
00000000000001cb popq %rbp
00000000000001cc retq
the desired result is { b[0], a[0], a[1], a[2] }, but this code generates { b[0], b[0], a[1], a[2] }
at commit: c27dfae (dev branch)
Apple LLVM version 9.0.0 (clang-900.0.39.2)
simdpp::float32x4 foo(simdpp::float32x4 a, simdpp::float32x4 b)
{
return simdpp::shuffle4x2<4,0,1,2>(a,b);
}
the desired result is { b[0], a[0], a[1], a[2] }, but this code generates { b[0], b[0], a[1], a[2] }
at commit: c27dfae (dev branch)
Apple LLVM version 9.0.0 (clang-900.0.39.2)