No action is taken when wsrep recv thread returns with a fatal error

Bug #428663 reported by Alex Yurchenko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
Fix Released
High
Seppo Jaakola
Trunk
Fix Released
High
Seppo Jaakola

Bug Description

This is generally a fatal condition indicating loss of connectivity to the cluster and even in case of good connectivity it means inability to certify and apply slave writesets.

However currently mysqld stays fully operational: it accepts connections and transactions which result in errors returned at commit time. This is rather confusing both for the user and for any automated connection balancer as everything works fine until you hit COMMIT button.

Proposal: mysqld should shut down right away.

Revision history for this message
Seppo Jaakola (seppo-jaakola) wrote :
Download full text (4.8 KiB)

This was fixed by calling kill_mysql() in wsrep_replication_process(), if wsrep->recv() returns with error.

The fix was tested with 3 node cluster, where one node was disconnected from vsbes by taking the NIC down (ifdown eth1). There was constant sqlgen load against the cluster during this test.

The node run in debug mode, and following log messages show how disconnect was detected and resulted in shutdown:

091004 12:31:42 [Note] DEBUG: mm_galera.c:mm_galera_recv():1265: worker: 0 with seqno: (-1 - 32514) type: GCS_ACT_COMMIT_CUT recvd

:recv_nointr(): Return 113 (No route to host) in header recv
091004 12:47:17 [ERROR] vs_remote_backend.cpp:handle_up():27: VSRBackend::handle_up(): Transport failed
091004 12:47:17 [ERROR] gcs_vs.cpp:conn_run():359: poll error: 'broken backend connection', thread exiting
091004 12:47:17 [Note] DEBUG: gcs_core.c:core_msg_recv():407: returning -107: Transport endpoint is not connected

091004 12:47:17 [Note] DEBUG: gcs.c:gcs_recv_thread():471: gcs_core_recv returned -107: Transport endpoint is not connected
091004 12:47:17 [Note] gcs.c:gcs_recv_thread():573: RECV thread exiting -107: Transport endpoint is not connected
091004 12:47:17 [ERROR] mm_galera.c:mm_galera_recv():1257: gcs_recv() returned 0 (Success)
091004 12:47:17 [ERROR] wsrep recv thread exiting with status: 5
091004 12:47:17 [ERROR] starting shutdown
091004 12:47:17 [Note] Got signal 15 to shutdown mysqld
091004 12:47:17 [Note] /home/galera/mysql-5.1.38-2894/mysql/libexec/mysqld: Normal shutdown

091004 12:47:17 [Note] Before Lock_thread_count
091004 12:47:17 [Note] After lock_thread_count
091004 12:47:17 [Warning] WSREP rollback thread wakes for signal
091004 12:47:17 [Note] Event Scheduler: Purging the queue. 0 events
091004 12:47:17 [Warning] WSREP rollback thread has empty abort queue
091004 12:47:17 [Note] WSREP: rollbacker thread exiting
091004 12:47:17 [Note] wsrep closing connection to cluster
091004 12:47:17 [Note] DEBUG: vs_remote_backend.cpp:leave():150: VSRBackend::leave(): (3,0,1)
091004 12:47:17 [Note] DEBUG: gcs_vs.cpp:gcs_vs_destroy():412: received: 32782, copied: 2020
091004 12:47:17 [Note] DEBUG: gcs_vs.cpp:gcs_vs_destroy():417: gcs_vs_close(): return 0
091004 12:47:17 [Note] DEBUG: gcs.c:gcs_close():655: recv_thread() joined.
091004 12:47:17 [ERROR] mm_galera.c:mm_galera_pre_commit():1680: gcs failed for: 176029, len: 1592, rcode: -4
091004 12:47:17 [ERROR] mm_galera.c:mm_galera_pre_commit():1680: gcs failed for: 176060, len: 1212, rcode: -4
091004 12:47:17 [ERROR] mm_galera.c:mm_galera_pre_commit():1680: gcs failed for: 176065, len: 448, rcode: -4
091004 12:47:17 [ERROR] WSREP connection failure
091004 12:47:17 [Note] gcs.c:gcs_close():682: Closing slave action queue.
091004 12:47:17 [ERROR] WSREP connection failure
091004 12:47:17 [ERROR] WSREP connection failure
091004 12:47:17 [Warning] MySQL is closing a connection that has an active InnoDB transaction. 1 row modifications will roll back.
091004 12:47:17 [ERROR] WSREP connection failure
091004 12:47:17 [Note] mm_galera.c:mm_galera_disconnect():405: Closed GCS connection
091004 12:47:17 [Warning] MySQL is closing a connection that has an active InnoDB transaction. 4 ro...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.