Duplicate instructions in both paths in conditional code
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Linaro GCC |
Triaged
|
Medium
|
Unassigned |
Bug Description
I've found some cases where GCC will generate the same instructions down both paths of conditional execution. It's probably best to just show an example. In this case, it's from make_node from SPEC2000 176.gcc. It's easier to see with ARM code, so I'll start there, from the code right after the literal pool from the switch statement :
0x0003ae64: CMP r0,#1
0x0003ae68: LDREQ r8,[pc,#1272] ; [0x3b368] = 0x1a1794
0x0003ae6c: LDR r7,[r3,r0,LSL #2]
0x0003ae70: LDRNE r3,[pc,#1264] ; [0x3b368] = 0x1a1794
0x0003ae74: MOVEQ r3,r8
0x0003ae78: ADD r7,r7,#3
0x0003ae7c: LSL r7,r7,#2
0x0003ae80: MOVEQ r1,r7
0x0003ae84: MOVNE r1,r7
0x0003ae88: LSR r6,r7,#2
0x0003ae8c: SUBEQ r5,r6,#1
0x0003ae90: SUBNE r5,r6,#1
0x0003ae94: LSLEQ r6,r6,#2
0x0003ae98: LSLNE r6,r6,#2
0x0003ae9c: MOV r4,r6
In the bottom part of the code, the instEQ code is the same as the instNE code. At the top it's a bit more complex but the net effect is still that it's loading into r3 from 0x1a1794, but the EQ path includes an additional MOVEQ to make it happen. The two code paths don't produce identical results (the EQ path changes r8) but it would be more efficient to just do a conditional move instead of duplicating the entire block (assuming r8 is even live after this block - I didn't look too closely).
The thumb-2 code is similar, but instead of conditional execution the code branches out of line to handle the EQ case :
0x0002ef60: LDR r3,[r9,#0x1c]
0x0002ef64: LDR r7,[r3,r10,LSL #2]
0x0002ef68: ADDS r7,#3
0x0002ef6a: LSLS r7,r7,#2
0x0002ef6c: CMP r10,#1
0x0002ef70: BEQ.W 0x2f248 ; make_node + 952
0x0002ef74: LSRS r6,r7,#2
0x0002ef76: LDR r3,[pc,#428] ; [0x2f124] = 0x142574
0x0002ef78: SUBS r5,r6,#1
0x0002ef7a: MOV r1,r7
0x0002ef7c: LSLS r6,r6,#2
0x0002ef7e: MOV r4,r6
0x0002ef80: LDR r0,[r8,#0x10]
The target of the BEQ.W :
0x0002f248: LSRS r6,r7,#2
0x0002f24a: LDR r8,[pc,#164] ; [0x2f2f0] = 0x142574
0x0002f24e: SUBS r5,r6,#1
0x0002f250: MOV r1,r7
0x0002f252: LSLS r6,r6,#2
0x0002f254: MOV r3,r8
0x0002f256: MOV r4,r6
0x0002f258: B 0x2ef80 ; make_node + 240
This is basically the same code as the ARM case (even similar registers). Here it's even more apparent that the code starting at 0x2ef74 matches up with the code as 0x2f248.
This shows up in a number of other places. In emovi from the same test :
0x000a7af8: PUSH {r4-r6}
0x000a7afc: CMP r3,#0
0x000a7b00: MVNLT r2,#0
0x000a7b04: MOVGE r3,#0
0x000a7b08: STRHLT r2,[r1],#2
0x000a7b0c: MOV r6,#0x7fff
0x000a7b10: STRHGE r3,[r1],#2
0x000a7b14: MOV r3,r1
0x000a7b18: LDRH r2,[r0,#0xa]
The two conditional stores could be combined into one if the MVNLT/MOVGE wrote to the same destination, and in this case R2 & R3 are both dead after the stores so there's no side effects to worry about.
from 256.bzip2 in generateMTFValues (there's another in the same function - same code with different registers) :
0x0000a588: CMP r4,#1
0x0000a58c: STRHEQ r4,[r0,#0]
0x0000a590: MOVNE r4,#0
0x0000a594: MOVEQ r12,r1
0x0000a598: STRHNE r4,[r0,#0]
This could be MOVNE r4,#0; STRH r4,[r0,#0] with the MOVEQ r12, r1 scheduled wherever it would fit. It's not as big a gain as the others, but it may be easier to use to find the problem in the first place.
I'm building with -O3 -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -fnocommon, plus either -marm or -mthumb.
Last tested with the Feb 2011 release.
tags: | added: size speed task |
Changed in gcc-linaro: | |
status: | New → Triaged |
importance: | Undecided → Medium |
I am out of the office until 17/04/2011.
Note: This is an automated response to your message "[Bug 759193] [NEW]
Duplicate instructions in both paths in conditional code" sent on
13/4/2011 0:10:45.
This is the only notification you will receive while this person is away.