Stein Debian MariaDB cluster failed to deploy on multi-node
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
kolla |
Fix Released
|
Medium
|
Unassigned |
Bug Description
kolla-ansible deploy log:
RUNNING HANDLER [mariadb : wait for slave mariadb] *******
skipping: [10.101.16.18]
FAILED - RETRYING: wait for slave mariadb (10 retries left).
FAILED - RETRYING: wait for slave mariadb (9 retries left).
FAILED - RETRYING: wait for slave mariadb (10 retries left).
FAILED - RETRYING: wait for slave mariadb (9 retries left).
FAILED - RETRYING: wait for slave mariadb (8 retries left).
FAILED - RETRYING: wait for slave mariadb (7 retries left).
FAILED - RETRYING: wait for slave mariadb (8 retries left).
FAILED - RETRYING: wait for slave mariadb (7 retries left).
FAILED - RETRYING: wait for slave mariadb (6 retries left).
FAILED - RETRYING: wait for slave mariadb (6 retries left).
FAILED - RETRYING: wait for slave mariadb (5 retries left).
FAILED - RETRYING: wait for slave mariadb (5 retries left).
FAILED - RETRYING: wait for slave mariadb (4 retries left).
FAILED - RETRYING: wait for slave mariadb (4 retries left).
FAILED - RETRYING: wait for slave mariadb (3 retries left).
FAILED - RETRYING: wait for slave mariadb (3 retries left).
FAILED - RETRYING: wait for slave mariadb (2 retries left).
FAILED - RETRYING: wait for slave mariadb (1 retries left).
FAILED - RETRYING: wait for slave mariadb (2 retries left).
fatal: [10.101.16.24]: FAILED! => {"attempts": 10, "changed": false, "module_stderr": "Shared connection to 10.101.16.24 closed.\r\n", "module_stdout": "Traceback (most recent call last):\r\n File \"/home/
FAILED - RETRYING: wait for slave mariadb (1 retries left).
fatal: [10.101.16.120]: FAILED! => {"attempts": 10, "changed": false, "module_stderr": "Shared connection to 10.101.16.120 closed.\r\n", "module_stdout": "Traceback (most recent call last):\r\n File \"/home/
skipping: [10.101.
changed: [10.101.
changed: [10.101.
FAILED - RETRYING: wait for master mariadb shutdown (30 retries left).
ok: [10.101.
skipping: [10.101.
changed: [10.101.
changed: [10.101.
FAILED - RETRYING: Waiting for master mariadb (10 retries left).
FAILED - RETRYING: Waiting for master mariadb (9 retries left).
FAILED - RETRYING: Waiting for master mariadb (8 retries left).
FAILED - RETRYING: Waiting for master mariadb (7 retries left).
FAILED - RETRYING: Waiting for master mariadb (6 retries left).
FAILED - RETRYING: Waiting for master mariadb (5 retries left).
FAILED - RETRYING: Waiting for master mariadb (4 retries left).
FAILED - RETRYING: Waiting for master mariadb (3 retries left).
FAILED - RETRYING: Waiting for master mariadb (2 retries left).
FAILED - RETRYING: Waiting for master mariadb (1 retries left).
fatal: [10.101.16.18]: FAILED! => {"attempts": 10, "changed": false, "module_stderr": "Shared connection to 10.101.16.18 closed.\r\n", "module_stdout": "Traceback (most recent call last):\r\n File \"/home/
to retry, use: --limit @/home/
10.101.16.120 : ok=49 changed=24 unreachable=0 failed=1
10.101.16.18 : ok=102 changed=54 unreachable=0 failed=1
10.101.16.24 : ok=83 changed=49 unreachable=0 failed=1
localhost : ok=5 changed=0 unreachable=0 failed=0Command failed ansible-playbook -i stein_multinode -e @/etc/kolla/
-------
mariadb.log:
Error in my_thread_
2020-03-17 16:33:02 0 [Note] WSREP: Read nil XID from storage engines, skipping position init
2020-03-17 16:33:02 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/
2020-03-17 16:33:02 0 [Note] WSREP: wsrep_load(): Galera 3.25(rddf9876) by Codership Oy <email address hidden> loaded successfully.
2020-03-17 16:33:02 0 [Note] WSREP: CRC-32C: using "slicing-by-8" algorithm.
2020-03-17 16:33:02 0 [Note] WSREP: Found saved state: 00000000-
2020-03-17 16:33:02 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 10.101.16.24; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_
PT1S; evs.max_
gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_
2020-03-17 16:33:02 0 [Note] WSREP: GCache history reset: 00000000-
2020-03-17 16:33:02 0 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
2020-03-17 16:33:02 0 [Note] WSREP: wsrep_sst_grab()
2020-03-17 16:33:02 0 [Note] WSREP: Start replication
2020-03-17 16:33:02 0 [Note] WSREP: Setting initial position to 00000000-
2020-03-17 16:33:02 0 [Note] WSREP: protonet asio version 0
2020-03-17 16:33:02 0 [Note] WSREP: Using CRC-32C for message checksums.
2020-03-17 16:33:02 0 [Note] WSREP: backend: asio
2020-03-17 16:33:02 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
2020-03-17 16:33:02 0 [Warning] WSREP: access file(/var/
2020-03-17 16:33:02 0 [Note] WSREP: restore pc from disk failed
2020-03-17 16:33:02 0 [Note] WSREP: GMCast version 0
2020-03-17 16:33:02 0 [Note] WSREP: (eac4fba7, 'tcp://
2020-03-17 16:33:02 0 [Note] WSREP: (eac4fba7, 'tcp://
2020-03-17 16:33:02 0 [Note] WSREP: EVS version 0
2020-03-17 16:33:02 0 [Note] WSREP: gcomm: connecting to group 'openstack', peer '10.101.
2020-03-17 16:33:02 0 [Note] WSREP: (eac4fba7, 'tcp://
2020-03-17 16:33:02 0 [Note] WSREP: (eac4fba7, 'tcp://
2020-03-17 16:33:02 0 [Note] WSREP: (eac4fba7, 'tcp://
2020-03-17 16:33:03 0 [Note] WSREP: declaring 6f169a34 at tcp://10.
2020-03-17 16:33:03 0 [Note] WSREP: view(view_
6f169a34,0 eac4fba7,0
} joined {
} left { } partitioned {
})
2020-03-17 16:33:03 0 [Note] WSREP: save pc into disk
2020-03-17 16:33:03 0 [Note] WSREP: forgetting e5e3aa99 (tcp://
2020-03-17 16:33:03 0 [Note] WSREP: gcomm: connected
2020-03-17 16:33:03 0 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2020-03-17 16:33:03 0 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2020-03-17 16:33:03 0 [Note] WSREP: Opened channel 'openstack'
2020-03-17 16:33:03 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
2020-03-17 16:33:03 0 [Note] WSREP: Waiting for SST to complete.
2020-03-17 16:33:03 0 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2020-03-17 16:33:03 0 [Note] WSREP: STATE EXCHANGE: sent state msg: eb14e9b1-
2020-03-17 16:33:03 0 [Note] WSREP: STATE EXCHANGE: got state msg: eb14e9b1-
2020-03-17 16:33:03 0 [Note] WSREP: STATE EXCHANGE: got state msg: eb14e9b1-
2020-03-17 16:33:03 0 [Note] WSREP: Quorum results:
version = 4,
component = PRIMARY,
conf_id = 30,
members = 1/2 (joined/total),
act_id = 0,
last_appl. = -1,
protocols = 0/9/3 (gcs/repl/appl),
group UUID = 6f191617-
2020-03-17 16:33:03 0 [Note] WSREP: Flow-control interval: [23, 23]
2020-03-17 16:33:03 0 [Note] WSREP: Trying to continue unpaused monitor
2020-03-17 16:33:03 0 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 0)
2020-03-17 16:33:03 2 [Note] WSREP: State transfer required:
Group state: 6f191617-
Local state: 00000000-
2020-03-17 16:33:03 2 [Note] WSREP: New cluster view: global state: 6f191617-
2020-03-17 16:33:03 2 [Warning] WSREP: Gap in state sequence. Need state transfer.
2020-03-17 16:33:03 0 [Note] WSREP: Running: 'wsrep_
r=/var/
WSREP_SST: [ERROR] mariabackup binary not found in $PATH (20200317 16:33:04.021)
2020-03-17 16:33:04 0 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_
so --wsrep_on=ON --log-error=
Read: '(null)'
2020-03-17 16:33:04 0 [ERROR] WSREP: Process completed with error: wsrep_sst_
srep_on=ON --log-error=
2020-03-17 16:33:04 2 [ERROR] WSREP: Failed to prepare for 'mariabackup' SST. Unrecoverable.
2020-03-17 16:33:04 2 [ERROR] Aborting
affects: | kolla → kolla-ansible |
affects: | kolla-ansible → kolla |
Changed in kolla: | |
status: | New → Confirmed |
Changed in kolla: | |
milestone: | none → 8.0.3 |
status: | Confirmed → Fix Committed |
importance: | Undecided → Medium |
Changed in kolla: | |
status: | Fix Committed → Fix Released |
You could try this for more resilient deploy: https:/ /review. opendev. org/706078
However, this looks like issue with images. @Marcin for aarch64.