All AMD architectures cache details will be computed based on
__cpuid__ `0x8000_001D` and the reference to __cpuid__ `0x8000_0006` will be
zeroed out for future architectures.
reports all zeros for its caches after this change (build from commit cea74a4a24c36202309e8254f1f938e2166488f3, which includes commit mentioned above):
The CPU is probably old enough that we don't use temporal stores, so there is probably not going to be a crash in glibc. But lack of accurate cache sizes probably still causes performance regressions elsewhere (although no one is going to use CPUs that old for their performance, admittedly).
This commit:
commit 103a469dc7755fd 9e8ccf362f3dd4c 55dc761908
Author: Sajan Karumanchi <email address hidden>
Date: Wed Jan 18 18:29:04 2023 +0100
x86: Cache computation for AMD architecture.
All AMD architectures cache details will be computed based on
__cpuid__ `0x8000_001D` and the reference to __cpuid__ `0x8000_0006` will be
zeroed out for future architectures.
Reviewed-by: Premachandra Mallappa <email address hidden>
changed cache size computation on the AMD architecture.
However, the new way of doing things is not supported by all AMD CPUs. This CPU:
processor : 0
vendor_id : AuthenticAMD
cpu family : 16
model : 6
model name : AMD Turion(tm) II Neo N40L Dual-Core Processor
stepping : 3
microcode : 0x10000c8
cpu MHz : 800.000
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt nodeid_msr hw_pstate vmmcall npt lbrv svm_lock nrip_save
bugs : tlb_mmatch apic_c1e fxsave_leak sysret_ss_attrs null_seg amd_e400 spectre_v1 spectre_v2
bogomips : 2995.32
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate
reports all zeros for its caches after this change (build from commit cea74a4a24c3620 2309e8254f1f938 e2166488f3, which includes commit mentioned above):
$ ./ld.so --list-diagnostics | grep -E 'level|threshold' features. non_temporal_ threshold= 0x4040 features. rep_movsb_ threshold= 0x800 features. rep_movsb_ stop_threshold= 0x0 features. rep_stosb_ threshold= 0x800 features. level1_ icache_ size=0x0 features. level1_ icache_ linesize= 0x0 features. level1_ dcache_ size=0x0 features. level1_ dcache_ assoc=0x0 features. level1_ dcache_ linesize= 0x0 features. level2_ cache_size= 0x0 features. level2_ cache_assoc= 0x0 features. level2_ cache_linesize= 0x0 features. level3_ cache_size= 0x0 features. level3_ cache_assoc= 0x0 features. level3_ cache_linesize= 0x0 features. level4_ cache_size= 0xfffffffffffff fff
x86.cpu_
x86.cpu_
x86.cpu_
x86.cpu_
x86.cpu_
x86.cpu_
x86.cpu_
x86.cpu_
x86.cpu_
x86.cpu_
x86.cpu_
x86.cpu_
x86.cpu_
x86.cpu_
x86.cpu_
x86.cpu_
A build from the 2.36 branch (commit b7008a92f505632 f32b313d1033d6d 15c99a0b31) yields this instead:
$ ./ld.so --list-diagnostics | grep -E 'level|threshold' features. non_temporal_ threshold= 0xc0000 features. rep_movsb_ threshold= 0x800 features. rep_movsb_ stop_threshold= 0x100000 features. rep_stosb_ threshold= 0x800 features. level1_ icache_ size=0x10000 features. level1_ icache_ linesize= 0x40 features. level1_ dcache_ size=0x10000 features. level1_ dcache_ assoc=0x2 features. level1_ dcache_ linesize= 0x40 features. level2_ cache_size= 0x100000 features. level2_ cache_assoc= 0x10 features. level2_ cache_linesize= 0x40 features. level3_ cache_size= 0x0 features. level3_ cache_assoc= 0x0 features. level3_ cache_linesize= 0x0 features. level4_ cache_size= 0xfffffffffffff fff
x86.cpu_
x86.cpu_
x86.cpu_
x86.cpu_
x86.cpu_
x86.cpu_
x86.cpu_
x86.cpu_
x86.cpu_
x86.cpu_
x86.cpu_
x86.cpu_
x86.cpu_
x86.cpu_
x86.cpu_
x86.cpu_
So it's a regression.
The CPU is probably old enough that we don't use temporal stores, so there is probably not going to be a crash in glibc. But lack of accurate cache sizes probably still causes performance regressions elsewhere (although no one is going to use CPUs that old for their performance, admittedly).
Some hypervisors also fail to pass through these CPUID values even if they identify the CPU as an AMD model: https:/ /bugzilla. redhat. com/show_ bug.cgi? id=2196271
Addressing hypervisor compatibility might be the important part here.