Galera

JOINER partition during SST causes cluster hang

Bug #1202241 reported by Jay Janssen on 2013-07-17

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Galera	New	Undecided	Unassigned
	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC	Incomplete	High	Unassigned

Bug Description

This is PXC:
Version: '5.5.30' socket: '/var/lib/mysql/mysql.sock' port: 3306 Percona XtraDB Cluster (GPL), wsrep_23.7.4.r3843

JOINER: tamarind (XXX.XXX.XXX.206)
DONOR: tarragon (XXX.XXX.XXX.14)

Other cluster nodes:
tabasco (XXX.XXX.XXX.205), tandoori (XXX.XXX.XXX.207)

Timeline of events:

7:21AM - Cluster node started, enters JOINER state, DONOR starts Xtrabackup SST
7:44AM - JOINER gets partitioned from the cluster, cluster hangs (no write activity)
7:55AM - Xtrabackup takes FTWRL on DONOR
7:56AM - DONOR finishes Xtrabackup
7:57AM - Cluster write activity resumes, JOINER tries to become Synced, but gets error:
130613 7:57:28 [Warning] WSREP: Protocol violation. JOIN message sender 0 (tarragon) is not in state transfer (SYNCED). Message ignored.

Question is why did the cluster pause between 7:44 and 7:56? (logs attached)

Revision history for this message

Alex Yurchenko (ayurchen) wrote on 2013-07-25:

Jay, looks like logs from tamarind and tarragon are identical and belong to a donor node. So we are missing the most important log - from the joiner (which partitioned).

Anyway, the immediate reason for a stall is clear - everybody was waiting for the state exchange message from tamarind.

Revision history for this message

Jay Janssen (jay-janssen) wrote on 2013-07-31: Re: [Bug 1202241] JOINER partition during SST causes cluster hang

Alex,
The logs for tamarind and tarragon look correct to me:

jayj@~/Downloads/logs [517]$ head -n1 tamarind-7_44-7_57.out tarragon-7_44-7_57.out
==> tamarind-7_44-7_57.out <==
130613 7:21:40 [Note] WSREP: IST first seqno 8246011901 not found from cache, falling back to SST

==> tarragon-7_44-7_57.out <==
130613 7:21:40 [Note] WSREP: Running: 'wsrep_sst_xtrabackup --role 'donor' --address 'XXX.XXX.XXX.206:4444/xtrabackup_sst' --auth 'sst_backup:lb5G58PAaeDhk' --socket '/var/lib/mysql/mysql.sock' --datadir '/mnt/ssd/mysql/' --defaults-file '/etc/my.cnf' --gtid '82ff21da-0811-11e2-0800-3771efde9244:8361077172''

Tamarind is the donor, tarragon is the joiner. I do agree it's confusing why this is in tamarind's log:

130613 7:44:25 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 4
130613 7:44:25 [Note] WSREP: STATE_EXCHANGE: sent state UUID: be13da01-d437-11e2-0800-f3569a587936
130613 7:44:25 [Note] WSREP: STATE EXCHANGE: sent state msg: be13da01-d437-11e2-0800-f3569a587936
130613 7:44:25 [Note] WSREP: STATE EXCHANGE: got state msg: be13da01-d437-11e2-0800-f3569a587936 from 0 (tarragon)
130613 7:44:25 [Note] WSREP: STATE EXCHANGE: got state msg: be13da01-d437-11e2-0800-f3569a587936 from 1 (tabasco)
130613 7:44:25 [Note] WSREP: STATE EXCHANGE: got state msg: be13da01-d437-11e2-0800-f3569a587936 from 3 (tandoori)
130613 7:54:55 [Note] WSREP: Provider paused at 82ff21da-0811-11e2-0800-3771efde9244:8363882737
130613 7:55:16 [Note] WSREP: Provider resumed.
130613 7:57:28 [Note] WSREP: STATE EXCHANGE: got state msg: be13da01-d437-11e2-0800-f3569a587936 from 2 (tamarind)
130613 7:57:28 [Note] WSREP: Quorum results:
        version = 2,
        component = PRIMARY,
        conf_id = 285,
        members = 3/4 (joined/total),
        act_id = 8363882737,
        last_appl. = 8363881969,
        protocols = 0/4/2 (gcs/repl/appl),
        group UUID = 82ff21da-0811-11e2-0800-3771efde9244

How could tamarind be stuck waiting for a message from itself?

On Jul 25, 2013, at 1:01 PM, Alex Yurchenko <email address hidden> wrote:

> Jay, looks like logs from tamarind and tarragon are identical and belong
> to a donor node. So we are missing the most important log - from the
> joiner (which partitioned).
>
> Anyway, the immediate reason for a stall is clear - everybody was
> waiting for the state exchange message from tamarind.

Jay Janssen, MySQL Consulting Lead, Percona
http://about.me/jay.janssen

Alex,
  The logs for tamarind and tarragon look correct to me:

jayj@~/Downloads/logs [517]$ head -n1 tamarind-7_44-7_57.out tarragon-7_44-7_57.out
==> tamarind-7_44-7_57.out <==
130613  7:21:40 [Note] WSREP: IST first seqno 8246011901 not found from cache, falling back to SST

==> tarragon-7_44-7_57.out <==
130613  7:21:40 [Note] WSREP: Running: 'wsrep_sst_xtrabackup --role 'donor' --address 'XXX.XXX.XXX.206:4444/xtrabackup_sst' --auth 'sst_backup:lb5G58PAaeDhk' --socket '/var/lib/mysql/mysql.sock' --datadir '/mnt/ssd/mysql/' --defaults-file '/etc/my.cnf' --gtid '82ff21da-0811-11e2-0800-3771efde9244:8361077172''

Tamarind is the donor, tarragon is the joiner.  I do agree it's confusing why this is in tamarind's log:

130613  7:44:25 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 4
130613  7:44:25 [Note] WSREP: STATE_EXCHANGE: sent state UUID: be13da01-d437-11e2-0800-f3569a587936
130613  7:44:25 [Note] WSREP: STATE EXCHANGE: sent state msg: be13da01-d437-11e2-0800-f3569a587936
130613  7:44:25 [Note] WSREP: STATE EXCHANGE: got state msg: be13da01-d437-11e2-0800-f3569a587936 from 0 (tarragon)
130613  7:44:25 [Note] WSREP: STATE EXCHANGE: got state msg: be13da01-d437-11e2-0800-f3569a587936 from 1 (tabasco)
130613  7:44:25 [Note] WSREP: STATE EXCHANGE: got state msg: be13da01-d437-11e2-0800-f3569a587936 from 3 (tandoori)
130613  7:54:55 [Note] WSREP: Provider paused at 82ff21da-0811-11e2-0800-3771efde9244:8363882737
130613  7:55:16 [Note] WSREP: Provider resumed.
130613  7:57:28 [Note] WSREP: STATE EXCHANGE: got state msg: be13da01-d437-11e2-0800-f3569a587936 from 2 (tamarind)
130613  7:57:28 [Note] WSREP: Quorum results:
        version    = 2,
        component  = PRIMARY,
        conf_id    = 285,
        members    = 3/4 (joined/total),
        act_id     = 8363882737,
        last_appl. = 8363881969,
        protocols  = 0/4/2 (gcs/repl/appl),
        group UUID = 82ff21da-0811-11e2-0800-3771efde9244

How could tamarind be stuck waiting for a message from itself?

On Jul 25, 2013, at 1:01 PM, Alex Yurchenko <1202241@bugs.launchpad.net> wrote:

> Jay, looks like logs from tamarind and tarragon are identical and belong
> to a donor node. So we are missing the most important log - from the
> joiner (which partitioned).
> 
> Anyway, the immediate reason for a stall is clear - everybody was
> waiting for the state exchange message from tamarind.

Jay Janssen, MySQL Consulting Lead, Percona
http://about.me/jay.janssen

Revision history for this message

Alex Yurchenko (ayurchen) wrote on 2013-07-31:

Jay, yes contents of these files are slightly different, but you said that they were "sanitized" and I strongly believe they come from the same original log:

tarragon-7_44-7-57.out:
=================
130613 7:21:40 [Note] WSREP: Running: 'wsrep_sst_xtrabackup --role 'donor' --address 'XXX.XXX.XXX.206:4444/xtrabackup_sst' --auth 'sst_backup:lb5G58PAaeDhk' --socket '/var/lib/mysql/mysql.sock' --datadir '/mnt/ssd/mysql/' --defaults-file '/etc/my.cnf' --gtid '82ff21da-0811-11e2-0800-3771efde9244:8361077172''
...
130613 7:44:16 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 3
...
130613 7:44:16 [Note] WSREP: STATE EXCHANGE: got state msg: b92282d0-d437-11e2-0800-644a2bba7edc from 0 (tarragon)
130613 7:44:16 [Note] WSREP: STATE EXCHANGE: got state msg: b92282d0-d437-11e2-0800-644a2bba7edc from 1 (tabasco)
130613 7:44:16 [Note] WSREP: STATE EXCHANGE: got state msg: b92282d0-d437-11e2-0800-644a2bba7edc from 2 (tandoori)

tamarind-7_44-7-57.out:
=================
130613 7:21:40 [Note] WSREP: IST first seqno 8246011901 not found from cache, falling back to SST
130613 7:21:40 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
130613 7:21:40 [Note] WSREP: Running: 'wsrep_sst_xtrabackup --role 'donor' --address 'XXX.XXX.XXX.206:4444/xtrabackup_sst' --auth 'sst_backup:lb5G58PAaeDhk' --socket '/var/lib/mysql/mysql.sock' --datadir '/mnt/ssd/mysql/' --defaults-file '/etc/my.cnf' --gtid '82ff21da-0811-11e2-0800-3771efde9244:8361077172''
...
130613 7:44:16 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 3
...
130613 7:44:16 [Note] WSREP: STATE EXCHANGE: got state msg: b92282d0-d437-11e2-0800-644a2bba7edc from 0 (tarragon)
130613 7:44:16 [Note] WSREP: STATE EXCHANGE: got state msg: b92282d0-d437-11e2-0800-644a2bba7edc from 1 (tabasco)
130613 7:44:16 [Note] WSREP: STATE EXCHANGE: got state msg: b92282d0-d437-11e2-0800-644a2bba7edc from 2 (tandoori)
130613 7:44:16 [Note] WSREP: Quorum results:
        version = 2,
        component = PRIMARY,
        conf_id = 284,
        members = 3/3 (joined/total),
        act_id = 8363856198,
        last_appl. = 8363855423,
        protocols = 0/4/2 (gcs/repl/appl),
        group UUID = 82ff21da-0811-11e2-0800-3771efde9244

- notice that "tarragon" speaks like donor (even though you claim that donor was tamarind, and tarragon was a joiner) and says pretty much the same stuff as "tamarind", AND "tamarind" never cares to get a state exchange messages from itself, but instead gets it from tarragon, tabasco and tandoori in a three-node component, leaving no space for "tamarind" there.

Jay, yes contents of these files are slightly different, but you said  that they were "sanitized" and I strongly believe they come from the same original log:

tarragon-7_44-7-57.out:
=================
130613  7:21:40 [Note] WSREP: Running: 'wsrep_sst_xtrabackup --role 'donor' --address 'XXX.XXX.XXX.206:4444/xtrabackup_sst' --auth 'sst_backup:lb5G58PAaeDhk' --socket '/var/lib/mysql/mysql.sock' --datadir '/mnt/ssd/mysql/' --defaults-file '/etc/my.cnf' --gtid '82ff21da-0811-11e2-0800-3771efde9244:8361077172''
...
130613  7:44:16 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 3
...
130613  7:44:16 [Note] WSREP: STATE EXCHANGE: got state msg: b92282d0-d437-11e2-0800-644a2bba7edc from 0 (tarragon)
130613  7:44:16 [Note] WSREP: STATE EXCHANGE: got state msg: b92282d0-d437-11e2-0800-644a2bba7edc from 1 (tabasco)
130613  7:44:16 [Note] WSREP: STATE EXCHANGE: got state msg: b92282d0-d437-11e2-0800-644a2bba7edc from 2 (tandoori)

tamarind-7_44-7-57.out:
=================
130613  7:21:40 [Note] WSREP: IST first seqno 8246011901 not found from cache, falling back to SST
130613  7:21:40 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
130613  7:21:40 [Note] WSREP: Running: 'wsrep_sst_xtrabackup --role 'donor' --address 'XXX.XXX.XXX.206:4444/xtrabackup_sst' --auth 'sst_backup:lb5G58PAaeDhk' --socket '/var/lib/mysql/mysql.sock' --datadir '/mnt/ssd/mysql/' --defaults-file '/etc/my.cnf' --gtid '82ff21da-0811-11e2-0800-3771efde9244:8361077172''
...
130613  7:44:16 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 3
...
130613  7:44:16 [Note] WSREP: STATE EXCHANGE: got state msg: b92282d0-d437-11e2-0800-644a2bba7edc from 0 (tarragon)
130613  7:44:16 [Note] WSREP: STATE EXCHANGE: got state msg: b92282d0-d437-11e2-0800-644a2bba7edc from 1 (tabasco)
130613  7:44:16 [Note] WSREP: STATE EXCHANGE: got state msg: b92282d0-d437-11e2-0800-644a2bba7edc from 2 (tandoori)
130613  7:44:16 [Note] WSREP: Quorum results:
        version    = 2,
        component  = PRIMARY,
        conf_id    = 284,
        members    = 3/3 (joined/total),
        act_id     = 8363856198,
        last_appl. = 8363855423,
        protocols  = 0/4/2 (gcs/repl/appl),
        group UUID = 82ff21da-0811-11e2-0800-3771efde9244

- notice that "tarragon" speaks like donor (even though you claim that donor was tamarind, and tarragon was a joiner) and says pretty much the same stuff as "tamarind",  AND "tamarind" never cares to get a state exchange messages from itself, but instead gets it from tarragon, tabasco and tandoori in a three-node component, leaving no space for "tamarind" there.

Revision history for this message

Jay Janssen (jay-janssen) wrote on 2013-07-31:

tamarind-7_44-7_57.out Edit (9.0 KiB, application/octet-stream; name="tamarind-7_44-7_57.out")

You were right. Sanitizing, I just masked out the full IP. I think this was just a dumb copy-paste error.

Anyway, I'm attaching the correct log for tamarind.

On Jul 31, 2013, at 9:58 AM, Alex Yurchenko <email address hidden> wrote:

> Jay, yes contents of these files are slightly different, but you said
> that they were "sanitized" and I strongly believe they come from the
> same original log:

Jay Janssen, MySQL Consulting Lead, Percona
http://about.me/jay.janssen

Revision history for this message

Alex Yurchenko (ayurchen) wrote on 2013-07-31:

Now we're talking... and the talk is that the main Galera recv loop was absent from the logs from 7:44... to 7:57. Apparently it was stuck somewhere. I can tell that it was stuck at least 3 events before it had to send the state exchange message, however I don't know where exactly... In any case, this was caused by some extraordinary conditions on the box.

Just in case: what was the transaction rate there during SST and how much RAM was there?

Revision history for this message

Jay Janssen (jay-janssen) wrote on 2013-08-02:

On Jul 31, 2013, at 12:57 PM, Alex Yurchenko <email address hidden> wrote:

> Just in case: what was the transaction rate there during SST

I can't tell that for sure. They have a relatively high rate of transactions/writesets. Currently, it is 2-3k per second, I think it's safe to assume that area during this issue

> and how
> much RAM was there?

Tamarind has 48G of RAM -- all of these servers are reasonably provisioned.

Jay Janssen, MySQL Consulting Lead, Percona
http://about.me/jay.janssen

Revision history for this message

Jay Janssen (jay-janssen) wrote on 2013-08-02:

Sanitized logs from all cluster nodes Edit (3.1 KiB, application/x-tar)

Revision history for this message

Krunal Bauskar (krunal-bauskar) wrote on 2015-11-17:

@jay

Can you check this out with latest 5.6 and update if issue still exist.

Changed in percona-xtradb-cluster:
status:	New → Incomplete
importance:	Undecided → High

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-18:

#10

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-970

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.