Failure to use ARMv6 / Cortex-M4 DSP MAC instructions
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Linaro GCC |
Fix Released
|
Undecided
|
Andrew Stubbs | ||
Linaro GCC Tracking |
Fix Released
|
Undecided
|
Andrew Stubbs |
Bug Description
GCC does not produce optimal code for operations where the DSP multiply-
Consider this test code:
int footrunc (int x, int a, int b)
{
return x + (short) a * (short) b;
}
int fooshort (int x, short *a, short *b)
{
return x + *a * *b;
}
long long foolong (long long x, short *a, short *b)
{
return x + *a * *b;
}
Compile as follows:
gcc -S test.c -O2 -mcpu=cortex-a8
With the current Linaro GCC 4.5, we get this output:
footrunc:
uxth r1, r1
uxth r2, r2
smlabb r0, r2, r1, r0
bx lr
fooshort:
ldrh r3, [r1, #0]
ldrh r2, [r2, #0]
smlabb r0, r2, r3, r0
bx lr
foolong:
ldrh r2, [r2, #0]
push {r4}
.save {r4}
ldrh r4, [r3, #0]
smulbb r4, r4, r2
adds r2, r0, r4
mov r0, r2
adc r3, r1, r4, asr #31
mov r1, r3
pop {r4}
bx lr
Upstream GCC 4.6 is a bit better:
footrunc:
uxth r1, r1
uxth r2, r2
smlabb r0, r1, r2, r0
bx lr
fooshort:
ldrh r1, [r1, #0]
ldrh r3, [r2, #0]
smlabb r0, r1, r3, r0
bx lr
foolong:
ldrh r2, [r2, #0]
ldrh r3, [r3, #0]
smulbb r3, r2, r3
adds r0, r0, r3
adc r1, r1, r3, asr #31
bx lr
But the ideal output *should* be this:
footrunc:
@ The uxth instructions GCC generates are redundant.
smlabb r0, r1, r2, r0
bx lr
fooshort:
@ GCC gets this right (register allocation differences should be harmless).
ldrh r1, [r1]
ldrh r2, [r2]
smlabb r0, r1, r2, r0
bx lr
foolong:
@ GCC does not use the long-accumulate version.
ldrh r2, [r2]
ldrh r3, [r3]
smlalbb r0, r1, r2, r3
bx lr
[CodeSourcery Tracker ID #8610]
Related branches
- Linaro Toolchain Developers: Pending requested
-
Diff: 45 lines (+21/-4)3 files modifiedChangeLog.linaro (+8/-0)
gcc/config/arm/arm.md (+5/-4)
gcc/testsuite/gcc.target/arm/smlalbb.c (+8/-0)
tags: | added: speed task |
Changed in gcc-linaro-tracking: | |
assignee: | nobody → Andrew Stubbs (ams-codesourcery) |
tags: | added: 46merge |
Changed in gcc-linaro: | |
status: | In Progress → Fix Released |
Changed in gcc-linaro-tracking: | |
status: | In Progress → Fix Released |
Note that the above is true for Cortex-A8, but for some other cores the decision might go another way (e.g. Cortex-R4 should prefer MLA over SMLAxy). Care should be taken not to make the changes unconditional.