Galera

Lowest group communication layer (evs) fails to handle the situation properly when big number of nodes suddenly start to see each other

Bug #1271918 reported by Miguel Angel Nieto on 2014-01-23

This bug affects 1 person

	Status	Importance	Assigned to	Milestone
Galera	Status tracked in 3.x
2.x	Fix Committed	Undecided	Unassigned
3.x	Fix Committed	Undecided	Unassigned
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC	Status tracked in 5.6
5.5	Fix Released	Medium	Unassigned	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC 5.5.39-25.11
5.6	Fix Released	Medium	Unassigned	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC 5.6.19-25.6

Bug Description

We have a 9 node cluster. Suddenly they stop to see each other:

140122 9:57:38 [Note] WSREP: view(view_id(NON_PRIM,378576e2-82be-11e3-b36b-96b118ad9ea1,10428) memb {
        5c773ef3-82be-11e3-ab13-4ec5e0489f56,
} joined {
} left {
} partitioned {
        378576e2-82be-11e3-b36b-96b118ad9ea1,
        49ce39e7-82be-11e3-a6da-e3fdac1aff99,
        4d4cbe47-5379-11e3-9597-437084d45b0f,
        79ae5df7-82be-11e3-af7a-6fad1d747d02,
        9dbbcf2a-82be-11e3-9c79-367e9eb841fb,
        b7a4648a-82bd-11e3-9b24-33ceb08ce291,
        d682b20c-82bd-11e3-9955-477180b12d21,
        fa3ce7e9-82bd-11e3-92ee-969e5429ffde,
})

Later on the problem is solved but they can't reconnect:

140122 9:58:38 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
140122 9:58:38 [Note] WSREP: Flow-control interval: [16, 16]
140122 9:58:38 [Note] WSREP: Received NON-PRIMARY.
140122 9:58:38 [Note] WSREP: New cluster view: global state: 840ae537-bb36-11e2-0800-55dad0151e6b:47649869, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 2
140122 9:58:38 [Warning] WSREP: evs::proto(5c773ef3-82be-11e3-ab13-4ec5e0489f56, GATHER, view_id(REG,5c773ef3-82be-11e3-ab13-4ec5e0489f56,10430)) source 49ce39e7-82be-11e3-a6da-e3fdac1aff99 is not supposed to be representative
140122 9:58:39 [Warning] WSREP: evs::proto(5c773ef3-82be-11e3-ab13-4ec5e0489f56, GATHER, view_id(REG,5c773ef3-82be-11e3-ab13-4ec5e0489f56,10430)) source 49ce39e7-82be-11e3-a6da-e3fdac1aff99 is not supposed to be representative
140122 9:58:40 [Warning] WSREP: evs::proto(5c773ef3-82be-11e3-ab13-4ec5e0489f56, GATHER, view_id(REG,5c773ef3-82be-11e3-ab13-4ec5e0489f56,10430)) source 49ce39e7-82be-11e3-a6da-e3fdac1aff99 is not supposed to be representative
140122 9:58:41 [Warning] WSREP: evs::proto(5c773ef3-82be-11e3-ab13-4ec5e0489f56, GATHER, view_id(REG,5c773ef3-82be-11e3-ab13-4ec5e0489f56,10430)) source 49ce39e7-82be-11e3-a6da-e3fdac1aff99 is not supposed to be representative
140122 9:58:42 [Warning] WSREP: evs::proto(5c773ef3-82be-11e3-ab13-4ec5e0489f56, GATHER, view_id(REG,5c773ef3-82be-11e3-ab13-4ec5e0489f56,10430)) source 49ce39e7-82be-11e3-a6da-e3fdac1aff99 is not supposed to be representative

Similar messages on all nodes.

Revision history for this message

Jervin R (revin) wrote on 2014-01-23:

Miguel, what is the Galera version? Looks similar, at least in behavior to https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1269236

Revision history for this message

Teemu Ollakka (teemu-ollakka) wrote on 2014-01-24:

This is a bit different than lp:1269236. Message "... is not supposed to be representative" indicates that there were problems forming a new group after nodes reconnected. In lp:1269236 nodes ended up in non-primary because one of them crashed while cluster was fully partitioned.

Revision history for this message

Teemu Ollakka (teemu-ollakka) wrote on 2014-05-15:

Fix committed in https://github.com/codership/galera/issues/14

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-18:

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1096

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.