2014-01-15 11:19:06 |
Laurynas Biveinis |
description |
Fix for bug 1227581 is incomplete, comments 29-33:
Thread 22 (Thread 0x7ec67baeb700 (LWP 10836)):
+bt
#0 0x00000031e000b43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x0000000000928147 in os_cond_wait (event=0xc9a18870, reset_sig_count=46093) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/os/os0sync.c:207
#2 os_event_wait_low (event=0xc9a18870, reset_sig_count=46093) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/os/os0sync.c:609
#3 0x00000000008557d2 in sync_array_wait_event (arr=0x12b5e90, index=2) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/sync/sync0arr.c:458
#4 0x0000000000856872 in mutex_spin_wait (mutex=0x2817c18, file_name=0xa6ccb8 "/mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0flu.c", line=613) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/sync/sync0sync.c:653
#5 0x00000000008a43fa in pfs_mutex_enter_func (bpage=0x7ee75cabb450, dpage=0x7eb09bd94a20) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/include/sync0sync.ic:222
#6 buf_flush_relocate_on_flush_list (bpage=0x7ee75cabb450, dpage=0x7eb09bd94a20) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0flu.c:613
#7 0x00000000008ab020 in buf_LRU_free_block (bpage=0x7ee75cabb450, zip=0, have_LRU_mutex=1) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0lru.c:2034
#8 0x00000000008ac217 in buf_LRU_free_from_unzip_LRU_list (buf_pool=0x2817898, n_iterations=1) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0lru.c:981
#9 buf_LRU_search_and_free_block (buf_pool=0x2817898, n_iterations=1) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0lru.c:1085
#10 0x00000000008ac8ff in buf_LRU_get_free_block (buf_pool=0x2817898) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0lru.c:1326
#11 0x000000000089b74f in buf_page_get_gen (space=1256, zip_size=4096, offset=73393, rw_latch=3, guess=0x0, mode=10, file=0xa6ad00 "/mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/btr/btr0cur.c", line=3615, mtr=0x7ec67bae41f0) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0buf.c:2667
...
Thread 1 (Thread 0x7ec739c9a700 (LWP 42350)):
+bt
#0 0x00000031e000c69c in pthread_kill () from /lib64/libpthread.so.0
#1 0x00000000006a5aa6 in handle_fatal_signal (sig=6) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/sql/signal_handler.cc:250
#2 <signal handler called>
#3 0x00000031df8328e5 in raise () from /lib64/libc.so.6
#4 0x00000031df8340c5 in abort () from /lib64/libc.so.6
#5 0x00000000008a6506 in buf_page_get_space (bpage=0x7ee75cabb450, flush_type=BUF_FLUSH_LIST, n_to_flush=300, count=0x7ec739c99568) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/include/buf0buf.ic:749
#6 buf_flush_page_and_try_neighbors (bpage=0x7ee75cabb450, flush_type=BUF_FLUSH_LIST, n_to_flush=300, count=0x7ec739c99568) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0flu.c:1634
#7 0x00000000008a69fe in buf_flush_flush_list_batch (buf_pool=0x2817898, flush_type=<value optimized out>, min_n=300, lsn_limit=18446744073709551615) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0flu.c:1772
#8 buf_flush_batch (buf_pool=0x2817898, flush_type=<value optimized out>, min_n=300, lsn_limit=18446744073709551615) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0flu.c:1854
#9 0x00000000008a6ca0 in buf_flush_list (min_n=300, lsn_limit=18446744073709551615) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0flu.c:2071
#10 0x000000000084c3bd in srv_master_thread (arg=<value optimized out>) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/srv/srv0srv.c:3879
#11 0x00000031e0007851 in start_thread () from /lib64/libpthread.so.0
#12 0x00000031df8e894d in clone () from /lib64/libc.so.6
The issue is that the fix for bug 1227581 is not fully correct WRT locking, thus only narrowed the race window instead of closing it. buf_LRU_block_remove_hashed_page()/buf_LRU_free_block() will make a page BUF_BLOCK_REMOVE_HASH while holding an LRU list mutex, a page hash X latch, and the block mutex. buf_flush_page_and_try_neighbors() may hold a flush list mutex instead. So, even though it checks for BUF_BLOCK_REMOVE_HASH pages, there is nothing to protect the flushing candidate page from becoming BUF_BLOCK_REMOVE_HASH after that check. The proper fix is making buf_flush_page_and_try_neigbors() oblivious to BUF_BLOCK_REMOVE_HASH pages all the way until the mutex release before the buf_flush_try_neighbors() call. |
Fix for bug 1227581 is incomplete, comments 29-33:
Thread 22 (Thread 0x7ec67baeb700 (LWP 10836)):
+bt
#0 0x00000031e000b43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x0000000000928147 in os_cond_wait (event=0xc9a18870, reset_sig_count=46093) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/os/os0sync.c:207
#2 os_event_wait_low (event=0xc9a18870, reset_sig_count=46093) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/os/os0sync.c:609
#3 0x00000000008557d2 in sync_array_wait_event (arr=0x12b5e90, index=2) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/sync/sync0arr.c:458
#4 0x0000000000856872 in mutex_spin_wait (mutex=0x2817c18, file_name=0xa6ccb8 "/mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0flu.c", line=613) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/sync/sync0sync.c:653
#5 0x00000000008a43fa in pfs_mutex_enter_func (bpage=0x7ee75cabb450, dpage=0x7eb09bd94a20) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/include/sync0sync.ic:222
#6 buf_flush_relocate_on_flush_list (bpage=0x7ee75cabb450, dpage=0x7eb09bd94a20) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0flu.c:613
#7 0x00000000008ab020 in buf_LRU_free_block (bpage=0x7ee75cabb450, zip=0, have_LRU_mutex=1) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0lru.c:2034
#8 0x00000000008ac217 in buf_LRU_free_from_unzip_LRU_list (buf_pool=0x2817898, n_iterations=1) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0lru.c:981
#9 buf_LRU_search_and_free_block (buf_pool=0x2817898, n_iterations=1) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0lru.c:1085
#10 0x00000000008ac8ff in buf_LRU_get_free_block (buf_pool=0x2817898) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0lru.c:1326
#11 0x000000000089b74f in buf_page_get_gen (space=1256, zip_size=4096, offset=73393, rw_latch=3, guess=0x0, mode=10, file=0xa6ad00 "/mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/btr/btr0cur.c", line=3615, mtr=0x7ec67bae41f0) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0buf.c:2667
...
Thread 1 (Thread 0x7ec739c9a700 (LWP 42350)):
+bt
#0 0x00000031e000c69c in pthread_kill () from /lib64/libpthread.so.0
#1 0x00000000006a5aa6 in handle_fatal_signal (sig=6) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/sql/signal_handler.cc:250
#2 <signal handler called>
#3 0x00000031df8328e5 in raise () from /lib64/libc.so.6
#4 0x00000031df8340c5 in abort () from /lib64/libc.so.6
#5 0x00000000008a6506 in buf_page_get_space (bpage=0x7ee75cabb450, flush_type=BUF_FLUSH_LIST, n_to_flush=300, count=0x7ec739c99568) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/include/buf0buf.ic:749
#6 buf_flush_page_and_try_neighbors (bpage=0x7ee75cabb450, flush_type=BUF_FLUSH_LIST, n_to_flush=300, count=0x7ec739c99568) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0flu.c:1634
#7 0x00000000008a69fe in buf_flush_flush_list_batch (buf_pool=0x2817898, flush_type=<value optimized out>, min_n=300, lsn_limit=18446744073709551615) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0flu.c:1772
#8 buf_flush_batch (buf_pool=0x2817898, flush_type=<value optimized out>, min_n=300, lsn_limit=18446744073709551615) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0flu.c:1854
#9 0x00000000008a6ca0 in buf_flush_list (min_n=300, lsn_limit=18446744073709551615) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0flu.c:2071
#10 0x000000000084c3bd in srv_master_thread (arg=<value optimized out>) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/srv/srv0srv.c:3879
#11 0x00000031e0007851 in start_thread () from /lib64/libpthread.so.0
#12 0x00000031df8e894d in clone () from /lib64/libc.so.6
The issue is that the fix for bug 1227581 is not fully correct WRT locking, thus only narrowed the race window in buf_flush_page_and_try_neighbors()instead of closing it completely. buf_LRU_block_remove_hashed_page()/buf_LRU_free_block() will make a page BUF_BLOCK_REMOVE_HASH while holding an LRU list mutex, a page hash X latch, and the block mutex. buf_flush_page_and_try_neighbors() may hold a flush list mutex instead. So, even though it checks for BUF_BLOCK_REMOVE_HASH pages, there is nothing to protect the flushing candidate page from becoming BUF_BLOCK_REMOVE_HASH after that check. The proper fix is making buf_flush_page_and_try_neigbors() oblivious to BUF_BLOCK_REMOVE_HASH pages all the way until the mutex release before the buf_flush_try_neighbors() call. |
|