JOINED node shutdown hangs

Bug #1277709 reported by Jay Janssen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
Status tracked in 5.6
5.5
New
Undecided
Unassigned
5.6
New
Undecided
Unassigned
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Incomplete
Undecided
Unassigned

Bug Description

Latest 5.6.15 on Solaris:

2014-02-07 14:07:42 24981 [Note] WSREP: Shifting JOINER -> JOINED (TO: 131754952)
2014-02-07 15:16:31 2a InnoDB: Buffer pool(s) load completed at 140207 15:16:31

At this point the node is JOINED, but still catching up (quite a long way behind). If I try a normal shutdown at this point, it hangs:

2014-02-07 20:49:13 24981 [Note] /mysql/bin/mysqld: Normal shutdown

2014-02-07 20:49:13 24981 [Note] WSREP: Stop replication
2014-02-07 20:49:13 24981 [Note] WSREP: Closing send monitor...
2014-02-07 20:49:13 24981 [Note] WSREP: Closed send monitor.
2014-02-07 20:49:13 24981 [Note] WSREP: gcomm: terminating thread
2014-02-07 20:49:13 24981 [Note] WSREP: gcomm: joining thread
2014-02-07 20:49:13 24981 [Note] WSREP: gcomm: closing backend
2014-02-07 20:49:14 24981 [Note] WSREP: view(view_id(NON_PRIM,0617008d-8fe1-11e3-b107-57639057ed37,8) memb {
        0617008d-8fe1-11e3-b107-57639057ed37,0
} joined {
} left {
} partitioned {
        d4bfdbcf-8f07-11e3-8990-0234c4241646,0
})
2014-02-07 20:49:14 24981 [Note] WSREP: view((empty))
2014-02-07 20:49:14 24981 [Note] WSREP: gcomm: closed
2014-02-07 20:49:14 24981 [Warning] WSREP: 0x1313500 down context(s) not set
2014-02-07 20:49:14 24981 [Warning] WSREP: Failed to send FC_CONT signal: -134 (Transport endpoint is not connected)
2014-02-07 20:49:14 24981 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2014-02-07 20:49:14 24981 [Note] WSREP: Flow-control interval: [16, 16]
2014-02-07 20:49:14 24981 [Note] WSREP: Received NON-PRIMARY.
2014-02-07 20:49:14 24981 [Note] WSREP: Shifting JOINED -> OPEN (TO: 135540936)
2014-02-07 20:49:14 24981 [Note] WSREP: Received self-leave message.
2014-02-07 20:49:14 24981 [Note] WSREP: Flow-control interval: [0, 0]
2014-02-07 20:49:14 24981 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2014-02-07 20:49:14 24981 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 135540936)
2014-02-07 20:49:14 24981 [Note] WSREP: RECV thread exiting 0: Error 0
2014-02-07 20:49:14 24981 [Note] WSREP: recv_thread() joined.
2014-02-07 20:49:14 24981 [Note] WSREP: Closing replication queue.
2014-02-07 20:49:14 24981 [Note] WSREP: Closing slave action queue.
2014-02-07 20:49:14 24981 [Warning] WSREP: Failed to report last committed 132763377, -81 (File descriptor in bad stat
e)
2014-02-07 20:49:16 24981 [Warning] WSREP: Failed to report last committed 132763499, -81 (File descriptor in bad stat
e)
2014-02-07 20:49:18 24981 [Warning] WSREP: Failed to report last committed 132763628, -81 (File descriptor in bad stat
e)
2014-02-07 20:49:19 24981 [Warning] WSREP: Failed to report last committed 132763752, -81 (File descriptor in bad state)
2014-02-07 20:49:21 24981 [Warning] WSREP: Failed to report last committed 132763875, -81 (File descriptor in bad state)
2014-02-07 20:49:23 24981 [Warning] WSREP: Failed to report last committed 132764046, -81 (File descriptor in bad state)
2014-02-07 20:49:23 24981 [Warning] WSREP: Failed to report last committed 132764047, -81 (File descriptor in bad state)
2014-02-07 20:49:25 24981 [Warning] WSREP: Failed to report last committed 132764169, -81 (File descriptor in bad state)
2014-02-07 20:49:27 24981 [Warning] WSREP: Failed to report last committed 132764294, -81 (File descriptor in bad state)
2014-02-07 20:49:28 24981 [Warning] WSREP: Failed to report last committed 132764419, -81 (File descriptor in bad state)
2014-02-07 20:49:30 24981 [Warning] WSREP: Failed to report last committed 132764538, -81 (File descriptor in bad state)

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

@Alex,

Isn't this a duplicate of https://bugs.launchpad.net/galera/+bug/1176852?

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

Though in that case it is IST, here it is a JOINED state.

Revision history for this message
Alex Yurchenko (ayurchen) wrote :

Well, sometimes it will be this, sometimes lp:1176852. In both cases the end effect is the same - the node can't be shutdown until it completes the joining operation. But happens in different code paths.

Revision history for this message
Krunal Bauskar (krunal-bauskar) wrote :

Can you try this with one of the PXC supported OS.

Changed in percona-xtradb-cluster:
status: New → Incomplete
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1609

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.