etcd suck waiting because of registration failure, others errored with 0 known peers

Bug #1962023 reported by Alexander Balderson
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Etcd Charm
New
Undecided
Unassigned

Bug Description

On a deployment of openstack with HA vault, 2 etcd units are stuck because of errored with zero known peers, and the logs reporting:

2022-02-22 05:48:20 WARNING unit.etcd/0.update-status logger.go:60 Error: open /var/snap/etcd/common/server.crt: no such file or directory
2022-02-22 05:48:20 ERROR unit.etcd/0.juju-log server.go:327 ['/snap/bin/etcd.etcdctl', 'cluster-health']
2022-02-22 05:48:20 ERROR unit.etcd/0.juju-log server.go:327 {'ETCDCTL_API': '2', 'ETCDCTL_CA_FILE': '/var/snap/etcd/common/ca.crt', 'ETCDCTL_CERT_FILE': '/var/snap/etcd/common/server.crt', 'ETCDCTL_KEY_FILE': '/var/snap/etcd/common/server.key'}
2022-02-22 05:48:20 ERROR unit.etcd/0.juju-log server.go:327 b''
2022-02-22 05:48:20 ERROR unit.etcd/0.juju-log server.go:327 None
2022-02-22 05:48:20 WARNING unit.etcd/0.juju-log server.go:327 Notice: Unit failed cluster-health check
2022-02-22 05:48:20 WARNING unit.etcd/0.update-status logger.go:60 open /var/snap/etcd/common/server.crt: no such file or directory
2022-02-22 05:48:20 ERROR unit.etcd/0.juju-log server.go:327 ['/snap/bin/etcd.etcdctl', 'member', 'list']
2022-02-22 05:48:20 ERROR unit.etcd/0.juju-log server.go:327 {'ETCDCTL_API': '2', 'ETCDCTL_CA_FILE': '/var/snap/etcd/common/ca.crt', 'ETCDCTL_CERT_FILE': '/var/snap/etcd/common/server.crt', 'ETCDCTL_KEY_FILE': '/var/snap/etcd/common/server.key'}
2022-02-22 05:48:20 ERROR unit.etcd/0.juju-log server.go:327 b''
2022-02-22 05:48:20 ERROR unit.etcd/0.juju-log server.go:327 None
2022-02-22 05:48:20 INFO unit.etcd/0.juju-log server.go:327 Invoking reactive handler: reactive/etcd.py:139:set_app_version
2022-02-22 05:48:20 INFO unit.etcd/0.juju-log server.go:327 Invoking reactive handler: reactive/etcd.py:153:prepare_tls_certificates
2022-02-22 05:48:21 INFO unit.etcd/0.juju-log server.go:327 Invoking reactive handler: reactive/etcd.py:264:set_db_ingress_address
2022-02-22 05:48:21 INFO unit.etcd/0.juju-log server.go:327 Invoking reactive handler: reactive/etcd.py:271:send_cluster_connection_details
2022-02-22 05:48:21 INFO unit.etcd/0.juju-log server.go:327 Invoking reactive handler: hooks/relations/tls-certificates/requires.py:79:joined:certificates
2022-02-22 05:48:21 INFO unit.etcd/0.juju-log server.go:327 status-set: active: Errored with 0 known peers

The third unit (etcd_2) is reporting that it is failing registration with a similar error:

2022-02-22 05:50:53 WARNING unit.etcd/2.update-status logger.go:60 client: etcd cluster is unavailable or misconfigured; error #0: client: endpoint https://10.246.164.239:2379 exceeded header timeout
2022-02-22 05:50:53 WARNING unit.etcd/2.update-status logger.go:60
2022-02-22 05:50:53 ERROR unit.etcd/2.juju-log server.go:327 ['/snap/bin/etcd.etcdctl', '--endpoint', 'https://10.246.164.239:2379', 'member', 'list']
2022-02-22 05:50:53 ERROR unit.etcd/2.juju-log server.go:327 {'ETCDCTL_API': '2', 'ETCDCTL_CA_FILE': '/var/snap/etcd/common/ca.crt', 'ETCDCTL_CERT_FILE': '/var/snap/etcd/common/server.crt', 'ETCDCTL_KEY_FILE': '/var/snap/etcd/common/server.key'}
2022-02-22 05:50:53 ERROR unit.etcd/2.juju-log server.go:327 b''
2022-02-22 05:50:53 ERROR unit.etcd/2.juju-log server.go:327 None
2022-02-22 05:50:53 INFO unit.etcd/2.juju-log server.go:327 etcdctl.register failed, will retry

crashdump for these logs is attached, but all testruns can be found at https://solutions.qa.canonical.com/bugs/bugs/bug/1962023

Revision history for this message
Alexander Balderson (asbalderson) wrote :
description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.