Thanks for all the testing, Josh. I reverted the paravirtualized TLB flushing patches in the test kernel. Do you think we should spin a new kernel without it while we try to find the main cause of the problem?
I enabled the tracepoint available in arch/x86/hyperv/mmu.c for both the mainline and linux-azure kernel and I got some interesting information. The mainline kernel never does a call to flush_tlb_others passing TLB_FLUSH_ALL while in the 4.11 and 4.13 linux-azure kernels that is done very ofter.
I'm attaching the tracing files for both kernels. You can check that TLB_FLUSH_ALL is given to flush_tlb_others when `end` is equal to "ffffffffffffffff" (-1ULL).
Also if I force hyperv_flush_tlb_others_ex() to do a native flush when end is equal to TLB_FLUSH_ALL the problem does not occur. That is another alternative for a temporary fix.
I believe the mainline kernel is carrying the same bug as linux-azure but the problematic path (end == TLB_FLUSH_ALL) is not being executed.
Thanks for all the testing, Josh. I reverted the paravirtualized TLB flushing patches in the test kernel. Do you think we should spin a new kernel without it while we try to find the main cause of the problem?
I enabled the tracepoint available in arch/x86/ hyperv/ mmu.c for both the mainline and linux-azure kernel and I got some interesting information. The mainline kernel never does a call to flush_tlb_others passing TLB_FLUSH_ALL while in the 4.11 and 4.13 linux-azure kernels that is done very ofter.
I'm attaching the tracing files for both kernels. You can check that TLB_FLUSH_ALL is given to flush_tlb_others when `end` is equal to "ffffffffffffffff" (-1ULL).
Also if I force hyperv_ flush_tlb_ others_ ex() to do a native flush when end is equal to TLB_FLUSH_ALL the problem does not occur. That is another alternative for a temporary fix.
I believe the mainline kernel is carrying the same bug as linux-azure but the problematic path (end == TLB_FLUSH_ALL) is not being executed.