The table name appears to change, it isn't consistently the same table (rosterusers, rostergroups, muc_affiliations). Further this database was stood up by using a mysqldump from the master, so I would assume we could not have imported in any corrupted pages through this manner of loading a slave? There are no events in Linux logging (/var/log/messages, dmesg, etc) that indicate hardware. Further this crash happens on multiple servers in this customer's cluster for this dataset, on PS 5.5.32 and now PS 5.5.41.
Note that this cluster at customer site has 4 servers, two of which have never crashed and their only difference is that they have a BP < 100GB. The instances that are crashing are ~200GB . I suggested workaround of smaller BP but that isn't an acceptable long term solution for them.
Customer is going to try a 5.6 installation on a slave to see if it too continues to crash when applying replication events.
The table name appears to change, it isn't consistently the same table (rosterusers, rostergroups, muc_affiliations). Further this database was stood up by using a mysqldump from the master, so I would assume we could not have imported in any corrupted pages through this manner of loading a slave? There are no events in Linux logging (/var/log/messages, dmesg, etc) that indicate hardware. Further this crash happens on multiple servers in this customer's cluster for this dataset, on PS 5.5.32 and now PS 5.5.41.
Note that this cluster at customer site has 4 servers, two of which have never crashed and their only difference is that they have a BP < 100GB. The instances that are crashing are ~200GB . I suggested workaround of smaller BP but that isn't an acceptable long term solution for them.
Customer is going to try a 5.6 installation on a slave to see if it too continues to crash when applying replication events.
Here is innochecksum over the three tables.
$ innochecksum -v rostergroups.ibd
file rostergroups.ibd = 3649044480 bytes (222720 pages)...
checking pages in range 0 to 222719
page 11775 okay: 5.287% done
page 25279 okay: 11.351% done
page 37887 okay: 17.011% done
page 51711 okay: 23.218% done
page 64895 okay: 29.138% done
page 79423 okay: 35.661% done
page 93951 okay: 42.184% done
page 109119 okay: 48.994% done
page 123583 okay: 55.489% done
page 139135 okay: 62.471% done
page 155135 okay: 69.655% done
page 171967 okay: 77.213% done
page 190015 okay: 85.316% done
page 206975 okay: 92.931% done
page 221887 okay: 99.626% done
$ innochecksum -v rosterusers.ibd
file rosterusers.ibd = 6803161088 bytes (415232 pages)...
checking pages in range 0 to 415231
page 12607 okay: 3.036% done
page 26367 okay: 6.350% done
page 40959 okay: 9.864% done
page 56255 okay: 13.548% done
page 70847 okay: 17.062% done
page 84607 okay: 20.376% done
page 98815 okay: 23.798% done
page 112895 okay: 27.189% done
page 126079 okay: 30.364% done
page 139135 okay: 33.508% done
page 153087 okay: 36.868% done
page 167615 okay: 40.367% done
page 181887 okay: 43.804% done
page 195839 okay: 47.164% done
page 210687 okay: 50.740% done
page 225087 okay: 54.208% done
page 237503 okay: 57.198% done
page 252223 okay: 60.743% done
page 266815 okay: 64.257% done
page 281279 okay: 67.740% done
page 294143 okay: 70.838% done
page 309311 okay: 74.491% done
page 323711 okay: 77.959% done
page 337983 okay: 81.396% done
page 352703 okay: 84.941% done
page 366911 okay: 88.363% done
page 381631 okay: 91.908% done
page 396671 okay: 95.530% done
page 412799 okay: 99.414% done
$ innochecksum -v muc_affiliation s.ibd s.ibd = 1149239296 bytes (70144 pages)...
file muc_affiliation
checking pages in range 0 to 70143
page 4287 okay: 6.113% done
page 18175 okay: 25.912% done
page 31679 okay: 45.164% done
page 45567 okay: 64.964% done
page 56767 okay: 80.931% done