etcd remains unhealthy after unit removal

Bug #1967569 reported by Berkay Tekin Öz
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Etcd Charm
Fix Released
Undecided
Berkay Tekin Öz

Bug Description

Removing any unit(leader or not) from etcd results in etcd being stuck at an unhealthy state. The main cause seems to be that the etcd peers are not getting updated as necessary, resulting in dangling peers(removed units) in the cluster that are unreachable.

Steps to reproduce:

1. Deploy easyrsa with `juju deploy cs:~containers/easyrsa-441`
2. Deploy etcd with `juju deploy cs:~containers/etcd-655`
3. Relate etcd and easyrsa with `juju add-relation etcd easyrsa`
4. Add 2 more etcd units `juju add-unit -n 2 etcd`
5. Remove a unit from etcd `juju remove-unit etcd/2`

Some related logs can be seen below:

unit-etcd-1: 21:02:16 INFO unit.etcd/1.juju-log Invoking reactive handler: reactive/etcd.py:112:check_cluster_health
unit-etcd-1: 21:02:18 ERROR unit.etcd/1.juju-log ['/snap/bin/etcd.etcdctl', 'cluster-health']
unit-etcd-1: 21:02:18 ERROR unit.etcd/1.juju-log {'ETCDCTL_API': '2', 'ETCDCTL_CA_FILE': '/var/snap/etcd/common/ca.crt', 'ETCDCTL_CERT_FILE': '/var/snap/etcd/common/server.crt', 'ETCDCTL_KEY_FILE': '/var/snap/etcd/common/server.key'}
unit-etcd-1: 21:02:18 ERROR unit.etcd/1.juju-log b'member 4092336adfba56b6 is healthy: got healthy result from https://10.20.194.211:2379\nfailed to check the health of member c5f431bd0a6193f3 on https://10.20.194.223:2379: Get https://10.20.194.223:2379/health: dial tcp 10.20.194.223:2379: connect: no route to host\nmember c5f431bd0a6193f3 is unreachable: [https://10.20.194.223:2379] are all unreachable\nmember d24dec4fbb4997cd is healthy: got healthy result from https://10.20.194.70:2379\ncluster is degraded\n'
unit-etcd-1: 21:02:18 ERROR unit.etcd/1.juju-log None
unit-etcd-1: 21:02:18 WARNING unit.etcd/1.juju-log Notice: Unit failed cluster-health check

Revision history for this message
Berkay Tekin Öz (berkayoz) wrote :
Revision history for this message
Kevin W Monroe (kwmonroe) wrote :
Changed in charm-etcd:
status: New → Fix Committed
assignee: nobody → Berkay Tekin Öz (berkayoz)
milestone: none → 1.24+ck1
Adam Dyess (addyess)
tags: added: backport-needed
Adam Dyess (addyess)
tags: removed: backport-needed
Adam Dyess (addyess)
Changed in charm-etcd:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.