Linaro GCC

Bad Neon intrinsics code gen when using ld4/st4 on AArch64

Bug #1234146 reported by Matthew Gretton-Dann on 2013-10-02

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Linaro GCC	Confirmed	Undecided	Michael Collison

Bug Description

The attached test case produces the following code for arm-none-eabi:

gcc -S -o- -mcpu=cortex-a9 -mfpu=neon -mfloat-abi=hard /tmp/t.c:

test4:
        add r1, r0, r1
        vmov.i32 d24, #0 @ v8qi
        cmp r0, r1
        bxeq lr
.L7:
        vld4.8 {d20-d23}, [r0]
        vadd.i8 d25, d24, d20
        vmov d16, d25 @ v8qi
        vadd.i8 d25, d25, d21
        vmov d17, d25 @ v8qi
        vadd.i8 d25, d25, d22
        vadd.i8 d24, d25, d23
        vmov d18, d25 @ v8qi
        vmov d19, d24 @ v8qi
        vst4.32 {d16[0], d17[0], d18[0], d19[0]}, [r0]!
        cmp r1, r0
        bne .L7
        bx lr

(Not perfect but the extraneous vmov's are understood and being investigated elsewhere).

For aarch64-none-elf this produces:
aarch64-none-elf-gcc -S -o- /tmp/t.c -O3

test4:
        add x1, x0, x1, uxtw
        cmp x0, x1
        sub sp, sp, #96
        beq .L1
        movi v0.2s, 0
        add x4, sp, 8
        add x3, sp, 16
        add x2, sp, 24
.L3:
        ld4 {v1.8b - v4.8b}, [x0]
        add x5, sp, 32
        st1 {v1.16b - v4.16b}, [x5]
        ld1 {v3.8b}, [x5]
        add x5, sp, 48
        add v3.8b, v0.8b, v3.8b
        ld1 {v2.8b}, [x5]
        add x5, sp, 64
        ld1 {v1.8b}, [x5]
        add v2.8b, v2.8b, v3.8b
        add x5, sp, 80
        add v1.8b, v1.8b, v2.8b
        ld1 {v0.8b}, [x5]
        add v0.8b, v0.8b, v1.8b
        st1 {v3.8b}, [sp]
        st1 {v2.8b}, [x4]
        st1 {v1.8b}, [x3]
        st1 {v0.8b}, [x2]
        // Start of user assembly
// 15030 "/work/builds/gcc-fsf-master/tools/lib/gcc/aarch64-none-elf/4.9.0/include/arm_neon.h" 1
        ld1 {v16.2s - v19.2s}, [sp]
        st4 {v16.s - v19.s}[0], [x0]

// 0 "" 2
        // End of user assembly
        add x0, x0, 16
        cmp x1, x0
        bne .L3
.L1:
        add sp, sp, 96
        ret
        .size test4, .-test4
        .ident "GCC: (GNU) 4.9.0 20130930 (experimental)"

This code is in Linaro GCC 4.8 and FSF trunk. The AArch64 code has significantly more stores and loads.