Linaro GCC

Drop in STREAM performance with gcc-linaro 4.8 without -fschedule-insns(2) flags

Bug #1211330 reported by Viswanath Puttagunta on 2013-08-12

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Linaro GCC	Fix Committed	Undecided	Maxim Kuvyrkov

Bug Description

Release observed on
- gcc-linaro 4.8 based release: 13.06: (gcc-linaro-arm-linux-gnueabihf-4.8-2013.06_linux)
- Drop in performance first observed in gcc-linaro 4.7 based 13.03 release.

Description of Issue:
- Observing STREAM benchmark degradation on TI’s Keystone 2 device (Cortex-A15 based).
- Digging into various flags found that adding ‘-fschedule-insns’, ‘-fschedule-insns2’ are causing improvement in performance. Trying to understand why.

More info regarding STREAM benchmark
- STREAM benchmark: http://www.streambench.org
- Source code: http://www.cs.virginia.edu/stream/FTP/

Observations based on gcc-linaro: gcc-linaro-arm-linux-gnueabihf-4.8-2013.06_linux
Note the 3rd run below has much better ‘Scale’ numbers.

FLAGS = $(DEFINES) -O3 -march=armv7-a -ffast-math -mfpu=neon -ftree-vectorize -funsafe-math-optimizations -mfloat-abi=hard -fprefetch-loop-arrays -fomit-framepointer -fforce-addr -mthumb
Function Rate (MB/s)
Copy: 3206.9840
Scale: 1402.7693
Add: 2526.0525
Triad: 2642.3057

FLAGS = $(DEFINES) -O3
Function Rate (MB/s)
Copy: 3216.0284
Scale: 1399.2090
Add: 2499.4611
Triad: 2616.1260

FLAGS = $(DEFINES) -O3 -march=armv7-a -ffast-math -mfpu=neon -ftree-vectorize -funsafe-math-optimizations
-mfloat-abi=hard -fprefetch-loop-arrays -fomit-frame-pointer -fforce-addr -mthumb -fno-schedule-insns -fno-schedule-insns2
Function Rate (MB/s)
Copy: 3230.9757
Scale: 3129.9206
Add: 2607.2765
Triad: 2557.7856

Revision history for this message

Bernhard Rosenkraenzer (berolinux) wrote on 2013-08-16:

Looking at the benchmark results posted, I think "Digging into various flags found that adding ‘-fschedule-insns’, ‘-fschedule-insns2’ are causing improvement in performance." should actually be -fno-schedule-insns and -fno-schedule-insns2?

Do the results change if you add -mcpu=cortex-a15 -mtune=cortex-a15? There's a couple of differences between A15 and generic v7-a that might effect instruction scheduling

Revision history for this message

Lalindra Jayatilleke (lalindra) wrote on 2013-11-08:

Yes, -fno-schedule-insns and -fno-schedule-insns2 were causing improvement. -mcpu=cortex-a15 -mtune=cortex-a15 did not improve numbers. Any other updates on this bug?
Thanks.

Christophe Lyon (christophe-lyon) on 2014-08-22