Thanks for the report. I would say this affects both etcd and calico.
For etcd: Units are sending cluster connection details before etcd is ready[1]. It should delay sending cluster connection details until after etcd has successfully registered (i.e. wait for the "etcd.registered" flag).
For calico: Units are letting a hung calicoctl process block the machine lock indefinitely. It should wrap calicoctl calls[2] with a timeout so that the cluster can eventually unstick itself in case of similar issues.
As a workaround, I suspect if you kill hung calicoctl processes repeatedly, Juju will eventually get through its backlog of hooks and allow the etcd units to progress.
Thanks for the report. I would say this affects both etcd and calico.
For etcd: Units are sending cluster connection details before etcd is ready[1]. It should delay sending cluster connection details until after etcd has successfully registered (i.e. wait for the "etcd.registered" flag).
For calico: Units are letting a hung calicoctl process block the machine lock indefinitely. It should wrap calicoctl calls[2] with a timeout so that the cluster can eventually unstick itself in case of similar issues.
As a workaround, I suspect if you kill hung calicoctl processes repeatedly, Juju will eventually get through its backlog of hooks and allow the etcd units to progress.
[1]: https:/ /github. com/charmed- kubernetes/ layer-etcd/ blob/ae98be0046 953ced628f682ee e266d0e875a62b0 /reactive/ etcd.py# L283-L287 /github. com/charmed- kubernetes/ layer-calico/ blob/2287a08ea5 c7940bbe9b07be1 79e1da15b51cba1 /reactive/ calico. py#L615- L624
[2]: https:/