Etcd units go into error state and require full restart due to inability to hit freezer CG
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Etcd Charm |
New
|
Undecided
|
Unassigned | ||
lxd |
New
|
Undecided
|
Unassigned | ||
snapd |
New
|
Undecided
|
Unassigned |
Bug Description
Doing burn-in testing for Wallaby deployed via Juju with vault and etcd, i'm seeing all three etcd units (all atop focal canotroller VMs) fail every day or two with the following recorded in their unit logs:
```
2021-06-21 16:40:36 DEBUG juju.worker.
2021-06-21 16:40:38 DEBUG unit.etcd/
2021-06-21 16:40:39 WARNING unit.etcd/
2021-06-21 16:40:39 WARNING unit.etcd/
2021-06-21 16:40:39 WARNING unit.etcd/
2021-06-21 16:40:40 WARNING unit.etcd/
2021-06-21 16:40:40 WARNING unit.etcd/
2021-06-21 16:40:40 WARNING unit.etcd/
2021-06-21 16:40:40 WARNING unit.etcd/
2021-06-21 16:40:40 WARNING unit.etcd/
2021-06-21 16:40:40 WARNING unit.etcd/
2021-06-21 16:40:40 WARNING unit.etcd/
2021-06-21 16:40:40 WARNING unit.etcd/
2021-06-21 16:40:40 WARNING unit.etcd/
2021-06-21 16:40:40 WARNING unit.etcd/
2021-06-21 16:40:40 WARNING unit.etcd/
2021-06-21 16:40:40 WARNING unit.etcd/
2021-06-21 16:40:40 WARNING unit.etcd/
2021-06-21 16:40:40 WARNING unit.etcd/
2021-06-21 16:40:40 WARNING unit.etcd/
2021-06-21 16:40:40 WARNING unit.etcd/
2021-06-21 16:40:40 WARNING unit.etcd/
2021-06-21 16:40:40 WARNING unit.etcd/
2021-06-21 16:40:40 WARNING unit.etcd/
2021-06-21 16:40:40 WARNING unit.etcd/
2021-06-21 16:40:40 WARNING unit.etcd/
2021-06-21 16:40:40 ERROR juju.worker.
2021-06-21 16:40:40 DEBUG juju.machinelock machinelock.go:186 machine lock released for etcd/1 uniter (run update-status hook)
```
The LXD container has to be stopped and started again for the unit to return to normal. After some time however, it just goes back to this broken state.
Hello,
What versions of etcd, juju, openstack etc.. were you running?
Thanks,
Heather Lemon