Some more results:
= Raspberry Pi 3B 1GB =
length | before (MiB/s) | after (MiB/s) | delta ----------|----------------|----------------|---------- 32768 | 48.24 | 46.19 | -4.26% 65536 | 85.99 | 79.96 | -7.02% 131072 | 154.00 | 139.68 | -9.30% 262144 | 178.72 | 164.12 | -8.17% 524288 | 163.56 | 156.55 | -4.28% 1048576 | 246.15 | 234.32 | -4.81%
= Raspberry Pi 3A+ 512MB =
length | before (MiB/s) | after (MiB/s) | delta ----------|----------------|----------------|---------- 32768 | 57.11 | 54.22 | -5.06% 65536 | 101.16 | 94.53 | -6.56% 131072 | 186.94 | 168.37 | -9.94% 262144 | 200.16 | 181.37 | -9.39% 524288 | 175.91 | 168.93 | -3.97% 1048576 | 261.19 | 250.62 | -4.04%
= Raspberry Pi Zero 2 =
length | before (MiB/s) | after (MiB/s) | delta ----------|----------------|----------------|---------- 32768 | 40.58 | 38.75 | -4.51% 65536 | 72.51 | 67.57 | -6.81% 131072 | 132.02 | 121.20 | -8.20% 262144 | 165.26 | 149.13 | -9.76% 524288 | 160.46 | 153.15 | -4.55% 1048576 | 241.92 | 230.87 | -4.57%
Worth noting that the Pi 4 uses the 2711 SoC, while these (the 3B, 3A+, and Zero 2) all use the older 2837 SoC. In other words, while the new memcpy seems "okay" on the 2711, it's got "some" performance regression on the 2837.
Some more results:
= Raspberry Pi 3B 1GB =
length | before (MiB/s) | after (MiB/s) | delta ---|--- ------- ------| ------- ------- --|---- ------
-------
32768 | 48.24 | 46.19 | -4.26%
65536 | 85.99 | 79.96 | -7.02%
131072 | 154.00 | 139.68 | -9.30%
262144 | 178.72 | 164.12 | -8.17%
524288 | 163.56 | 156.55 | -4.28%
1048576 | 246.15 | 234.32 | -4.81%
= Raspberry Pi 3A+ 512MB =
length | before (MiB/s) | after (MiB/s) | delta ---|--- ------- ------| ------- ------- --|---- ------
-------
32768 | 57.11 | 54.22 | -5.06%
65536 | 101.16 | 94.53 | -6.56%
131072 | 186.94 | 168.37 | -9.94%
262144 | 200.16 | 181.37 | -9.39%
524288 | 175.91 | 168.93 | -3.97%
1048576 | 261.19 | 250.62 | -4.04%
= Raspberry Pi Zero 2 =
length | before (MiB/s) | after (MiB/s) | delta ---|--- ------- ------| ------- ------- --|---- ------
-------
32768 | 40.58 | 38.75 | -4.51%
65536 | 72.51 | 67.57 | -6.81%
131072 | 132.02 | 121.20 | -8.20%
262144 | 165.26 | 149.13 | -9.76%
524288 | 160.46 | 153.15 | -4.55%
1048576 | 241.92 | 230.87 | -4.57%
Worth noting that the Pi 4 uses the 2711 SoC, while these (the 3B, 3A+, and Zero 2) all use the older 2837 SoC. In other words, while the new memcpy seems "okay" on the 2711, it's got "some" performance regression on the 2837.