Maybe you got pointers to those reports about irgbalance? I not really sure what could be monitored to find more information. I went back and looked at all the bad page error messages and one thing that all of them seem to have in common is that there is a page->mapping set which has bit 0 set. And that points to that page being previously used for a anonymous mapping.
That may be heap used by libc for malloc but somehow I would imagine if that would be broken, there should be many more issues. So maybe this can be narrowed down to something that uses mmap with MAP_ANONYMOUS and somehow causes pages to go back onto the pool before they are unmapped...
Maybe you got pointers to those reports about irgbalance? I not really sure what could be monitored to find more information. I went back and looked at all the bad page error messages and one thing that all of them seem to have in common is that there is a page->mapping set which has bit 0 set. And that points to that page being previously used for a anonymous mapping.
That may be heap used by libc for malloc but somehow I would imagine if that would be broken, there should be many more issues. So maybe this can be narrowed down to something that uses mmap with MAP_ANONYMOUS and somehow causes pages to go back onto the pool before they are unmapped...