I was bumping up to the config you had (but with one PT device).
- Host phys bits machine type for larger mappings - more CPUS 1->32
Adding/removing a PT device in the configs above doesn't change a lot.
As assumed none of these increased the time tremendously. Then I went to bump up the memory size.
T1: use 1.2 TB but no PT device #1: 6 sec #2: 21 sec #3: 16 sec
This only very slightly increases in #2 due to more memory that needs to be set up.
T2: use 1.2 TB with one PT device #1: 253 sec #2: 20 sec #3: 18 sec
The time consuming part is a single process, with kernel side load. Associated userpace is qemu, but the load i in the kernel close to 100%.
Samples: 62K of event 'cycles:ppp', Event count (approx.): 34521154809 Overhead Shared Object Symbol 73.91% [kernel] [k] clear_page_erms 9.53% [kernel] [k] clear_huge_page 1.65% [kernel] [k] follow_trans_huge_pmd
I was bumping up to the config you had (but with one PT device).
- Host phys bits machine type for larger mappings
- more CPUS 1->32
Adding/removing a PT device in the configs above doesn't change a lot.
As assumed none of these increased the time tremendously.
Then I went to bump up the memory size.
T1: use 1.2 TB but no PT device
#1: 6 sec
#2: 21 sec
#3: 16 sec
This only very slightly increases in #2 due to more memory that needs to be set up.
T2: use 1.2 TB with one PT device
#1: 253 sec
#2: 20 sec
#3: 18 sec
The time consuming part is a single process, with kernel side load.
Associated userpace is qemu, but the load i in the kernel close to 100%.
Samples: 62K of event 'cycles:ppp', Event count (approx.): 34521154809 trans_huge_ pmd
Overhead Shared Object Symbol
73.91% [kernel] [k] clear_page_erms
9.53% [kernel] [k] clear_huge_page
1.65% [kernel] [k] follow_