Drop in STREAM performance with gcc-linaro 4.8 without -fschedule-insns(2) flags
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Linaro GCC |
Fix Committed
|
Undecided
|
Maxim Kuvyrkov |
Bug Description
Release observed on
- gcc-linaro 4.8 based release: 13.06: (gcc-linaro-
- Drop in performance first observed in gcc-linaro 4.7 based 13.03 release.
Description of Issue:
- Observing STREAM benchmark degradation on TI’s Keystone 2 device (Cortex-A15 based).
- Digging into various flags found that adding ‘-fschedule-insns’, ‘-fschedule-insns2’ are causing improvement in performance. Trying to understand why.
More info regarding STREAM benchmark
- STREAM benchmark: http://
- Source code: http://
Observations based on gcc-linaro: gcc-linaro-
Note the 3rd run below has much better ‘Scale’ numbers.
FLAGS = $(DEFINES) -O3 -march=armv7-a -ffast-math -mfpu=neon -ftree-vectorize -funsafe-
Function Rate (MB/s)
Copy: 3206.9840
Scale: 1402.7693
Add: 2526.0525
Triad: 2642.3057
FLAGS = $(DEFINES) -O3
Function Rate (MB/s)
Copy: 3216.0284
Scale: 1399.2090
Add: 2499.4611
Triad: 2616.1260
FLAGS = $(DEFINES) -O3 -march=armv7-a -ffast-math -mfpu=neon -ftree-vectorize -funsafe-
-mfloat-abi=hard -fprefetch-
Function Rate (MB/s)
Copy: 3230.9757
Scale: 3129.9206
Add: 2607.2765
Triad: 2557.7856
Changed in gcc-linaro: | |
assignee: | nobody → Maxim Kuvyrkov (maxim-kuvyrkov) |
Looking at the benchmark results posted, I think "Digging into various flags found that adding ‘-fschedule-insns’, ‘-fschedule-insns2’ are causing improvement in performance." should actually be -fno-schedule-insns and -fno-schedule- insns2?
Do the results change if you add -mcpu=cortex-a15 -mtune=cortex-a15? There's a couple of differences between A15 and generic v7-a that might effect instruction scheduling