Unnecessary code is generated when using 8 bit and 16 bit variables
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
GNU Arm Embedded Toolchain |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
Using gcc 7.2.1 for ARM thumb with -O3, I see unnecessary instructions being emitted. I tried it on gcc 8.2 and see the same output.
Code is shown below and can be found here: https:/
void shortMult()
{
volatile short int a;
volatile short int b;
a *= b;
}
void charMult()
{
volatile char x;
volatile char y;
x *= y;
}
void shortIntMult(void)
{
volatile short int a;
volatile int b;
b *= a;
}
This generates:
shortMult():
sub sp, sp, #8
ldrh r1, [sp, #6]
ldrh r2, [sp, #4]
mul r3, r2, r1
lsl r3, r3, #16
asr r3, r3, #16
strh r3, [sp, #4] @ movhi
add sp, sp, #8
bx lr
charMult():
sub sp, sp, #8
ldrb r2, [sp, #7] @ zero_extendqisi2
ldrb r1, [sp, #6] @ zero_extendqisi2
mul r3, r2, r1
and r3, r3, #255
strb r3, [sp, #6]
add sp, sp, #8
bx lr
shortIntMult():
sub sp, sp, #8
ldrh r3, [sp, #2]
ldr r2, [sp, #4]
lsl r1, r3, #16
asr r1, r1, #16
mul r3, r2, r1
str r3, [sp, #4]
add sp, sp, #8
bx lr
Looking at shortMult():
The strh r3, [sp, #4] writes out 16 bits. There is no need to do the lsl and asr - they don't modify the bits we write out.
Looking at charMult():
Similar issue as above - strb r3, [sp, #6] writes out 8 bits. There is no need to do the and r3, r3, #255 - it doesn't modify the bits we write out.
Looking at shortIntMult():
a is loaded into r3 using ldrh and then sign extended using lsl r1, r3, #16 and asr r1, r1, #16
a can be loaded with sign extension using ldrsh
Thank you.
Changed in gcc-arm-embedded: | |
status: | New → Confirmed |
Confirmed. I've compiled for Armv7E-M where smullbb is used but otherwise the same happens. The middle end seemed to do something to do something funky already since it converted operand to unsigned short and then back to short int for the result. However changing a and b to unsigned short does not change the assembly (it does improve gimple).
Not sure what's happening but expand looks particularly dumb:
(insn 12 11 13 2 (set (reg:HI 117) list:REG_ EQUAL (mult:HI (subreg/s/v:HI (reg:SI 110 [ a.1_1 ]) 0)
(subreg/ s/v:HI (reg:SI 112 [ b.0_4 ]) 0))
(zero_ extend: SI (reg:HI 117))) "test.c":6 -1
(subreg/ s/v:HI (reg:SI 111 [ _2 ]) 0)) "test.c":6 -1
(const_ int -4 [0xffffffffffff fffc])) [1 a+0 S2 A32])
(subreg:HI (reg:SI 118) 0)) "test.c":6 -1
(expr_
(nil)))
(insn 13 12 14 2 (set (reg:SI 111 [ _2 ])
(nil))
(insn 14 13 15 2 (set (reg:HI 119)
(nil))
(insn 15 14 0 2 (set (mem/v/c:HI (plus:SI (reg/f:SI 105 virtual-stack-vars)
(reg:HI 119)) "test.c":6 -1
(nil))
So it thinks it can do a 16bit multiply (later the compiler realizes that smullbb produces a 32bit register so the multiplication dest becomes SImode), then extend it to SI and reduce it to HI in the next instruction. This is probably unrelated though as the zero_extend gets removed as it figures out smulbb produces a 32bit output. The real problem is that GCC does not see the subreg as redundant.