Node starting (w/o bootstrap) that can't find primary just hangs
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC | Status tracked in 5.6 | |||||
5.5 |
Invalid
|
Undecided
|
Unassigned | |||
5.6 |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
If I start a node normally and there is no cluster to find, the node waits in NON-PRIMARY until something happens. I've discovered that
wsrep_provider_
does not modify this behavior.
I cannot log into the node to do anything (like pc.bootstrap=true), nor can I seem to gracefully shutdown such a node. The only signal it seems to respect is kill -9:
[root@node3 mysql]# ps axf
...
18979 ? Ss 0:00 /bin/sh /usr/bin/
19382 ? Sl 0:01 \_ /usr/sbin/mysqld --basedir=/usr --datadir=
18981 ? Ss 0:00 /bin/bash -ue /usr/bin/
19874 ? S 0:00 \_ sleep 1
[root@node3 mysql]# killall mysqld
18979 ? Ss 0:00 /bin/sh /usr/bin/
19382 ? Sl 0:01 \_ /usr/sbin/mysqld --basedir=/usr --datadir=
18981 ? Ss 0:00 /bin/bash -ue /usr/bin/
19951 ? S 0:00 \_ sleep 1
[root@node3 mysql]# killall -9 mysqld
...
150121 15:13:28 mysqld_safe mysqld from pid file /var/lib/
This behavior seems crappy all the way around:
1. We should have a way to modify this behavior
2. If we're going to trouble to start mysqld, we should be able to log in and set it primary
3. We should have a way to gracefully terminate such a process
@Jay,
Tested with latest galera 3.x HEAD:
/pxc56d/bin/mysqld --defaults- file=/pxc56/ etc/my. cnf.local --basedir=/pxc56d --user=mysql --wsrep- cluster- address= gcomm:/ /NOTEXIST --wsrep- cluster- name=NOCLUSTER --wsrep-debug --skip-grant-tables receive_ address is set to '127.0.0.1:4001' which makes it impossible for another host to reach this one. Please set it to the address which this node can be connected at by other cluster members. defaults_ for_timestamp server option (see documentation for more details). lib/libgalera_ smm.so' 855b-11e4- 9699-bbe796316f 18:-1 keep_period = PT30S; evs.inactive_ check_period = PT0.5S; evs.inactive_ timeout = PT15S; evs.join_ retrans_ period = PT1S; evs.max_ install_ timeouts = 3; evs.send_window = 4; evs.stats_ report_ period = PT1M; evs.suspect_timeout = PT5S; evs.user_ send_window = 2; evs.version = 1; evs.view_ forget_ timeout = PT24H; gcache.dir = /pxc56/datadir/; gcache. keep_pages_ size = 0; gcache.mem_size = 0; gcache.name = /pxc56/ datadir/ /galera. cache; gcache.page_size = 128M; gcache.size = 500M; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_ q_hard_ limit = 922337203685477 5807; gcs.recv_ q_soft_ limit = 0.25; gcs.sync_donor = no; gmcast.listen_addr = tcp://127. 0.0.1:4010; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = 0000-0000- 0000-0000000000 00:-1
2015-01-26 12:14:54 0 [Warning] WSREP: wsrep_sst_
2015-01-26 12:14:54 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_
2015-01-26 12:14:54 55221 [Note] WSREP: Setting wsrep_ready to 0
2015-01-26 12:14:54 55221 [Note] WSREP: Read nil XID from storage engines, skipping position init
2015-01-26 12:14:54 55221 [Note] WSREP: wsrep_load(): loading provider library '/pxc56/
2015-01-26 12:14:54 55221 [Note] WSREP: wsrep_load(): Galera 3.9(rXXXX) by Codership Oy <email address hidden> loaded successfully.
2015-01-26 12:14:54 55221 [Note] WSREP: CRC-32C: using hardware acceleration.
2015-01-26 12:14:54 55221 [Note] WSREP: Found saved state: 5d314bdc-
2015-01-26 12:14:54 55221 [Note] WSREP: Passing config to GCS: base_host = 127.0.0.1; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_
2015-01-26 12:14:54 55221 [Note] WSREP: Service thread queue flushed.
2015-01-26 12:14:54 55221 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
2015-01-26 12:14:54 55221 [Note] WSREP: wsrep_sst_grab()
2015-01-26 12:14:54 55221 [Note] WSREP: Start replication
2015-01-26 12:14:54 55221 [Note] WSREP: Setting initial position to 00000000-
2015-01-26 12:14:54 55221 [Note] WSREP: protonet asio version 0
2015-01-26 12:14:54 55221 [Note] WSREP: Using CRC-32C for message checksums.
2015-01-26 12:14:54 55221 [Note] WSREP: backend: asio
2015-01-26 12:14:54 55221 [Warning] WSREP: access file(gvwstate.dat) failed(No such file or directory)
2015-01-26 12:14:54 55221 [Note] WSREP: restore pc from disk failed
2015-01-26 12:14:54 55221 [Note] WSREP: GMCast version 0
2015-01-26 12:14:55 55221 [Warning] WSREP: Failed to r...