Server hangs in binary log group commit
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Percona Server moved to https://jira.percona.com/projects/PS | Status tracked in 5.7 | |||||
5.1 |
Won't Fix
|
Undecided
|
Unassigned | |||
5.5 |
New
|
High
|
George Ormond Lorch III | |||
5.6 |
New
|
Undecided
|
Unassigned | |||
5.7 |
New
|
Undecided
|
Unassigned |
Bug Description
PS 5.5.28-29.4 introduced a fix for bug 1070856 that created another issue.
Binlog group commit can now hang in MYSQL_BIN_
while (hdr_offs < length)
{
/*
partial header only? save what we can get, process once
we get the rest.
*/
if (hdr_offs + LOG_EVENT_
{
carry= length - hdr_offs;
length= hdr_offs;
}
else
{
/* we've got a full event-header, and it came in one piece */
uchar *log_pos= (uchar *)cache->read_pos + hdr_offs + LOG_POS_OFFSET;
/* fix end_log_pos */
val= uint4korr(log_pos) + group;
/* next event header at ... */
>> log_pos= (uchar *)cache->read_pos + hdr_offs + EVENT_LEN_OFFSET;
>> hdr_offs += uint4korr(log_pos);
}
}
The lines noted above end up calculating the new log log_pos, which contains 0x00000000 and thus the calculation for hdr_offs += uint4korr(log_pos) ends up not moving the hdr_offs, causing a infinite loop.
Through bisecting various releases and hand builds with specific commits, we were able to identify this specific fix as the cause where the customer could/could not reproduce. Customers scripts to reproduce are large, contain private data, and take 30 minutes to run and could not be reduced to a specific series of events that caused the issue.
Customers scripts make heavy use of transaction SAVEPOINTS and ROLLBACK TO SAVEPOINT...for example, one of the series of query patterns that is executing around the time of the hang is:
BEGIN;
SAVEPOINT `IQe40KFaDEKZ3l
SAVEPOINT `5B2D12rdYk6lX_
SAVEPOINT `0dkjUeJqxkWFzJ
SAVEPOINT `9WOYybwgO0qQBy
ROLLBACK TO SAVEPOINT `9WOYybwgO0qQBy
RELEASE SAVEPOINT `9WOYybwgO0qQBy
ROLLBACK;
BEGIN;
SAVEPOINT `76G0bv_
SAVEPOINT `kGjeq1CdA0aKN_
SAVEPOINT `ea4aItMo-
SAVEPOINT `1tXgKAMp70iHY-
ROLLBACK TO SAVEPOINT `1tXgKAMp70iHY-
RELEASE SAVEPOINT `1tXgKAMp70iHY-
COMMIT;
no longer affects: | percona-server |
Percona now uses JIRA for bug reports so this bug report is migrated to: https:/ /jira.percona. com/browse/ PS-3248