Changing dynamically on variable wsrep_cluster_name is not working as expected

Bug #1620439 reported by Paul Namuag
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Fix Released
Low
Krunal Bauskar

Bug Description

According to the manual, wsrep_cluster_name can be set dynamically: https://www.percona.com/doc/percona-xtradb-cluster/5.6/wsrep-system-index.html#wsrep_cluster_name

Problem:

There are 5 nodes in the cluster i.e.:

192.190.140.21
192.190.140.22
192.190.140.23
192.190.140.24
192.190.140.25

and ideally, only two nodes (.22 and .23) were restarted having the .cnf file was saved persistently containing wsrep_cluster_name = "somename" from wsrep_clustername = "oldname". However, the rest of the 3 nodes (.21, .24, and .25) are still up and their wsrep_cluster_name was changed to "somename" from "oldname" dynamically, i.e.

mysql> SET GLOBAL wsrep_cluster_name="somename";

However, the two nodes failed to restart since they are now connecting to a new wsrep_cluster_name which is "somename" and I was expecting to work but got the following error:

Sep 6 09:47:19 sdsunrsp02 mysqld: 2016-09-05 23:47:19 7497 [Warning] WSREP: handshake with 4f252c54 tcp://192.190.140.21:4567 failed: 'invalid group'
Sep 6 09:47:19 sdsunrsp02 mysqld: 2016-09-05 23:47:19 7497 [Note] WSREP: view((empty))
Sep 6 09:47:19 sdsunrsp02 mysqld: 2016-09-05 23:47:19 7497 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
Sep 6 09:47:19 sdsunrsp02 mysqld: #011 at gcomm/src/pc.cpp:connect():162
Sep 6 09:47:19 sdsunrsp02 mysqld: 2016-09-05 23:47:19 7497 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out)
Sep 6 09:47:19 sdsunrsp02 mysqld: 2016-09-05 23:47:19 7497 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1379: Failed to open channel 'somename' at 'gcomm://192.190.140.21,192.190.140.22,192.190.140.23,192.190.140.24,192.190.140.25': -110 (Connection timed out)
Sep 6 09:47:19 sdsunrsp02 mysqld: 2016-09-05 23:47:19 7497 [ERROR] WSREP: gcs connect failed: Connection timed out
Sep 6 09:47:19 sdsunrsp02 mysqld: 2016-09-05 23:47:19 7497 [ERROR] WSREP: wsrep::connect(gcomm://192.190.140.21,192.190.140.22,192.190.140.23,192.190.140.24,192.190.140.25) failed: 7
Sep 6 09:47:19 sdsunrsp02 mysqld: 2016-09-05 23:47:19 7497 [ERROR] Aborting

The error above is taken from .22 node but has the same error log in .23 as well. Upon checking the wsrep_cluster_name variables, cluster size, cluster address, and the incoming addresses, it shows the following:

 -- ip: 192.190.140.21 --
+-----------------------+-------------------------------------------------------------------------------+
| Variable_name | Value |
+-----------------------+-------------------------------------------------------------------------------+
| wsrep_cluster_address | gcomm://192.190.140.21,192.190.140.22,192.190.140.23,192.190.140.24,192.190.140.25 |
| wsrep_cluster_name | somename |
+-----------------------+-------------------------------------------------------------------------------+

 -- ip: 192.190.140.24 --
+-----------------------+-------------------------------------------------------------------------------+
| Variable_name | Value |
+-----------------------+-------------------------------------------------------------------------------+
| wsrep_cluster_address | gcomm://192.190.140.21,192.190.140.22,192.190.140.23,192.190.140.24,192.190.140.25 |
| wsrep_cluster_name | somename |
+-----------------------+-------------------------------------------------------------------------------+

 -- ip: 192.190.140.25 --
+-----------------------+-------------------------------------------------------------------------------+
| Variable_name | Value |
+-----------------------+-------------------------------------------------------------------------------+
| wsrep_cluster_address | gcomm://192.190.140.21,192.190.140.22,192.190.140.23,192.190.140.24,192.190.140.25 |
| wsrep_cluster_name | somename |
+-----------------------+-------------------------------------------------------------------------------+

 -- ip: 192.190.140.21 --
+------------------------------+----------------------------------------------------------+
| Variable_name | Value |
+------------------------------+----------------------------------------------------------+
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced |
| wsrep_incoming_addresses | 192.190.140.21:3306,192.190.140.25:3306,192.190.140.24:3306 |
| wsrep_cluster_size | 3 |
| wsrep_cluster_state_uuid | 7f64591c-68fe-11e6-bb3a-eee3b6d126ed |
| wsrep_cluster_status | Primary |
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy <email address hidden> |
| wsrep_provider_version | 3.14(r53b88eb) |
+------------------------------+----------------------------------------------------------+

 -- ip: 192.190.140.24 --
+------------------------------+----------------------------------------------------------+
| Variable_name | Value |
+------------------------------+----------------------------------------------------------+
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced |
| wsrep_incoming_addresses | 192.190.140.21:3306,192.190.140.25:3306,192.190.140.24:3306 |
| wsrep_cluster_size | 3 |
| wsrep_cluster_state_uuid | 7f64591c-68fe-11e6-bb3a-eee3b6d126ed |
| wsrep_cluster_status | Primary |
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy <email address hidden> |
| wsrep_provider_version | 3.14(r53b88eb) |
+------------------------------+----------------------------------------------------------+

 -- ip: 192.190.140.25 --
+------------------------------+----------------------------------------------------------+
| Variable_name | Value |
+------------------------------+----------------------------------------------------------+
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced |
| wsrep_incoming_addresses | 192.190.140.21:3306,192.190.140.25:3306,192.190.140.24:3306 |
| wsrep_cluster_size | 3 |
| wsrep_cluster_state_uuid | 7f64591c-68fe-11e6-bb3a-eee3b6d126ed |
| wsrep_cluster_status | Primary |
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy <email address hidden> |
| wsrep_provider_version | 3.14(r53b88eb) |
+------------------------------+----------------------------------------------------------+

excluding the .22 and .23 since they're down, it's shown above that all of the 3 nodes that are up and are having it's wsrep_cluster_name = "somename".

All versions of these nodes are:

-- ip: 192.190.140.21 --
+-------------------------+---------------------------------------------------------------------------------------------------+
| Variable_name | Value |
+-------------------------+---------------------------------------------------------------------------------------------------+
| innodb_version | 5.6.28-76.1 |
| protocol_version | 10 |
| slave_type_conversions | |
| version | 5.6.28-76.1-56-log |
| version_comment | Percona XtraDB Cluster (GPL), Release rel76.1, Revision f9b078d, WSREP version 25.14, wsrep_25.14 |
| version_compile_machine | x86_64 |
| version_compile_os | Linux |
+-------------------------+---------------------------------------------------------------------------------------------------+

 -- ip: 192.190.140.22 --
+-------------------------+---------------------------------------------------------------------------------------------------+
| Variable_name | Value |
+-------------------------+---------------------------------------------------------------------------------------------------+
| innodb_version | 5.6.28-76.1 |
| protocol_version | 10 |
| slave_type_conversions | |
| version | 5.6.28-76.1-56-log |
| version_comment | Percona XtraDB Cluster (GPL), Release rel76.1, Revision f9b078d, WSREP version 25.14, wsrep_25.14 |
| version_compile_machine | x86_64 |
| version_compile_os | Linux |
+-------------------------+---------------------------------------------------------------------------------------------------+

 -- ip: 192.190.140.23 --
+-------------------------+---------------------------------------------------------------------------------------------------+
| Variable_name | Value |
+-------------------------+---------------------------------------------------------------------------------------------------+
| innodb_version | 5.6.28-76.1 |
| protocol_version | 10 |
| slave_type_conversions | |
| version | 5.6.28-76.1-56-log |
| version_comment | Percona XtraDB Cluster (GPL), Release rel76.1, Revision f9b078d, WSREP version 25.14, wsrep_25.14 |
| version_compile_machine | x86_64 |
| version_compile_os | Linux |
+-------------------------+---------------------------------------------------------------------------------------------------+

 -- ip: 192.190.140.24 --
+-------------------------+---------------------------------------------------------------------------------------------------+
| Variable_name | Value |
+-------------------------+---------------------------------------------------------------------------------------------------+
| innodb_version | 5.6.28-76.1 |
| protocol_version | 10 |
| slave_type_conversions | |
| version | 5.6.28-76.1-56-log |
| version_comment | Percona XtraDB Cluster (GPL), Release rel76.1, Revision f9b078d, WSREP version 25.14, wsrep_25.14 |
| version_compile_machine | x86_64 |
| version_compile_os | Linux |
+-------------------------+---------------------------------------------------------------------------------------------------+

 -- ip: 192.190.140.25 --
+-------------------------+---------------------------------------------------------------------------------------------------+
| Variable_name | Value |
+-------------------------+---------------------------------------------------------------------------------------------------+
| innodb_version | 5.6.28-76.1 |
| protocol_version | 10 |
| slave_type_conversions | |
| version | 5.6.28-76.1-56-log |
| version_comment | Percona XtraDB Cluster (GPL), Release rel76.1, Revision f9b078d, WSREP version 25.14, wsrep_25.14 |
| version_compile_machine | x86_64 |
| version_compile_os | Linux |
+-------------------------+---------------------------------------------------------------------------------------------------+

Alternative Fix: The alternative fix is not really an ideal fix and an alternative literally but for the sake of its solution, we have to shutdown all of the nodes which means there will be downtime. Otherwise, if this bug is fix and wsrep_cluster_name can be set without any further issues or errors seen, then downtime can be avoided

Tags: i135099
Revision history for this message
Krunal Bauskar (krunal-bauskar) wrote :

1. Changing wsrep_cluster_name involved load and unload of provider and which involves re-creating node connection to cluster channel.

(Existing logic of just changing cache variable value is not of no-use and so we see the problem mentioned above).

2. wsrep_cluster_name is used during handshake but otherwise it is just an cosmetic variable.

With that inputs we decided to make this variable read-only (can be set only during startup).

commit 2573508e4f032735e711782e0caab5a88afd9709
Author: Krunal Bauskar <email address hidden>
Date: Tue Nov 22 09:57:52 2016 +0530

    - PXC#731: make wsrep_cluster_name read-only

      Changing wsrep_cluster_name at dynamically actually involves
      unload and loading of provider (that re-create node cluster connection).
      This is heavy operation that user never expect to be done for
      a cosemtic variable like wsrep_cluster_name.

      Most of the user never tend to change it and set it during configuration time.
      It just act as a indetifier to uniquely identify nodes of given cluster
      if you have multiple pxc cluster (besides being used for handshake during
      initial node joining sequence).

      With that background in place we have decided to make this variable read-only.

Changed in percona-xtradb-cluster:
status: New → Confirmed
importance: Undecided → Low
assignee: nobody → Krunal Bauskar (krunal-bauskar)
status: Confirmed → Fix Committed
Changed in percona-xtradb-cluster:
milestone: none → 5.6.34-26.19
Changed in percona-xtradb-cluster:
status: Fix Committed → Fix Released
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-731

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.