backend must be restarted message causes node hang
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Galera |
New
|
Undecided
|
Unassigned | ||
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC |
Fix Committed
|
Medium
|
Unassigned |
Bug Description
Scenario:
node1 up and running, 1 node cluster.
Node2 starts, but it has an older grastate. This seems to cause a 'prim' conflict and drops node1 into the Init state. Node2 fails, and node1 goes into this state:
130815 15:21:40 [Note] WSREP: declaring 620654b6-
130815 15:21:40 [Note] WSREP: declaring d7f678fa-
130815 15:21:40 [Warning] WSREP: 32e0c052-
130815 15:21:40 [ERROR] WSREP: caught exception in PC, state dump to stderr follows:
pc::Proto{
32e0c052-
,state_msgs=
32e0c052-
}}
620654b6-
d7f678fa-
}}
,current_
32e0c052-
620654b6-
d7f678fa-
} joined {
620654b6-
d7f678fa-
} left {
} partitioned {
}),pc_view=
32e0c052-
} joined {
} left {
} partitioned {
}),mtu=32636}
130815 15:21:40 [Note] WSREP: evs::msg{
} 116
130815 15:21:40 [ERROR] WSREP: exception caused by message: evs::msg{
}
state after handling message: evs::proto(
current_
32e0c052-
620654b6-
d7f678fa-
} joined {
} left {
} partitioned {
}),
input_map=
fifo_seq=105,
last_sent=0,
known={
32e0c052-
620654b6-
d7f678fa-
}
}130815 15:21:40 [ERROR] WSREP: exception from gcomm, backend must be restarted:
at gcomm/src/
130815 15:21:40 [Note] WSREP: Received self-leave message.
130815 15:21:40 [Note] WSREP: Flow-control interval: [0, 0]
130815 15:21:40 [Note] WSREP: Received SELF-LEAVE. Closing connection.
130815 15:21:40 [Note] WSREP: Shifting SYNCED -> CLOSED (TO: 0)
130815 15:21:40 [Note] WSREP: RECV thread exiting 0: Success
130815 15:21:40 [Note] WSREP: New cluster view: global state: 32e1936e-
130815 15:21:40 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
130815 15:21:40 [Note] WSREP: applier thread exiting (code:0)
However, the daemon stays in the Init state. When I try to shut it down, it just hangs:
130815 15:22:31 [Note] /usr/sbin/mysqld: Normal shutdown
130815 15:22:31 [Note] WSREP: Stop replication
130815 15:22:31 [Note] WSREP: Closing send monitor...
130815 15:22:31 [Note] WSREP: Closed send monitor.
a) Adding galera bug component since it is related to galera.
b) Is it possible to restart backend without restarting mysql?
There is a way by setting wsrep_provider to same value or none
and back.
But currently it hangs. Possibly due to https:/ /bugs.launchpad .net/codership- mysql/+ bug/1208493
I tried with the fix of that bug but I can still replicate this.
c) Even if restarting mysql be considered, looks like it still
hangs.
If it is possible to reproduce this, is it possible to get a
backtrace (from core), just need to send -11 to mysqld when its
hung.