Skip to content

shuffle4x2() generates incorrect code on SSE4_1 #131

@peabody-korg

Description

@peabody-korg

simdpp::float32x4 foo(simdpp::float32x4 a, simdpp::float32x4 b)
{
return simdpp::shuffle4x2<4,0,1,2>(a,b);
}

foo(simdpp::arch_sse4p1::float32<4u, void>, simdpp::arch_sse4p1::float32<4u, void>):
00000000000001c0	pushq	%rbp
00000000000001c1	movq	%rsp, %rbp
00000000000001c4	shufps	$0x90, %xmm0, %xmm1
00000000000001c8	movaps	%xmm1, %xmm0
00000000000001cb	popq	%rbp
00000000000001cc	retq

the desired result is { b[0], a[0], a[1], a[2] }, but this code generates { b[0], b[0], a[1], a[2] }

at commit: c27dfae (dev branch)
Apple LLVM version 9.0.0 (clang-900.0.39.2)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions