etcd shows update-status hook errors after host reboot
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Etcd Charm |
Confirmed
|
Undecided
|
Unassigned | ||
Etcd Snaps |
New
|
Undecided
|
Unassigned |
Bug Description
I have rebooted the host machine and now etcd unit is stuck in error state:
juju debug-log:
unit-etcd-1: 09:03:00 INFO juju.worker.uniter awaiting error resolution for "start" hook
unit-etcd-1: 09:03:01 WARNING unit.etcd/1.start cannot perform operation: mount --rbind /dev /tmp/snap.
unit-etcd-1: 09:03:01 WARNING unit.etcd/1.start cannot perform operation: mount --rbind /dev /tmp/snap.
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start cannot perform operation: mount --rbind /dev /tmp/snap.
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start cannot perform operation: mount --rbind /dev /tmp/snap.
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start Traceback (most recent call last):
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start File "/var/lib/
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start main()
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start File "/var/lib/
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start bus.dispatch(
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start File "/var/lib/
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start _invoke(
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start File "/var/lib/
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start handler.invoke()
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start File "/var/lib/
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start self._action(*args)
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start File "/var/lib/
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start db.set_
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start File "lib/etcdctl.py", line 193, in version
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start out = check_output(
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start File "/usr/lib/
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start File "/usr/lib/
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start raise CalledProcessEr
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start subprocess.
unit-etcd-1: 09:03:02 ERROR juju.worker.
unit-etcd-1: 09:03:02 INFO juju.worker.uniter awaiting error resolution for "start" hook
unit-etcd-1: 09:03:29 INFO juju.worker.uniter awaiting error resolution for "start" hook
I have 3 etcd units deployed in total. Only one unit is in error state. Etcd units are deployed in lxd containers.
etcd charm revision: 594
For further info, I'm seeing this on a couple units of etcd as well.
running the command manually, you see this error
root@juju- f98bb9- 2-lxd-1: /sys/fs/ cgroup/ freezer# etcd.etcdctl version cgroup/ freezer: No such file or directory
cannot open cgroup hierarchy /sys/fs/
But oddly, the cgroup exists and should be readable, but may not be available due to snap confinement. I'd guess that cgroups got a new plug in upstream snapd, hence the effect taking place after restart. It seems that the issue is the charm's attempt to run etcdctl version command, but that etcd itself is running and functioning.
root@juju- f98bb9- 2-lxd-1: /sys/fs/ cgroup/ freezer# find -ls self_freezing etcd/cgroup. procs etcd/freezer. self_freezing etcd/freezer. parent_ freezing etcd/freezer. state etcd/notify_ on_release etcd/cgroup. clone_children parent_ freezing lxd/cgroup. procs lxd/freezer. self_freezing lxd/freezer. parent_ freezing lxd/freezer. state lxd/notify_ on_release lxd/cgroup. clone_children clone_children
32 0 drwxrwxr-x 4 nobody root 0 Jul 20 20:25 .
33 0 -rw-rw-r-- 1 nobody root 0 Oct 1 21:03 ./cgroup.procs
38 0 -r--r--r-- 1 nobody nogroup 0 Jan 7 23:31 ./freezer.
120 0 drwxr-xr-x 2 root root 0 Jul 20 20:25 ./snap.etcd
121 0 -rw-r--r-- 1 root root 0 Jul 20 20:25 ./snap.
126 0 -r--r--r-- 1 root root 0 Jul 20 20:25 ./snap.
123 0 -rw-r--r-- 1 root root 0 Jul 20 20:25 ./snap.etcd/tasks
127 0 -r--r--r-- 1 root root 0 Jul 20 20:25 ./snap.
125 0 -rw-r--r-- 1 root root 0 Dec 20 00:00 ./snap.
124 0 -rw-r--r-- 1 root root 0 Jul 20 20:25 ./snap.
122 0 -rw-r--r-- 1 root root 0 Jul 20 20:25 ./snap.
35 0 -rw-rw-r-- 1 nobody root 0 Jul 20 20:22 ./tasks
39 0 -r--r--r-- 1 nobody nogroup 0 Jan 7 23:31 ./freezer.
37 0 -rw-r--r-- 1 nobody nogroup 0 Jan 7 23:31 ./freezer.state
88 0 drwxr-xr-x 2 root root 0 Jul 20 20:23 ./snap.lxd
89 0 -rw-r--r-- 1 root root 0 Jul 20 20:23 ./snap.
94 0 -r--r--r-- 1 root root 0 Jul 20 20:23 ./snap.
91 0 -rw-r--r-- 1 root root 0 Jul 20 20:23 ./snap.lxd/tasks
95 0 -r--r--r-- 1 root root 0 Jul 20 20:23 ./snap.
93 0 -rw-r--r-- 1 root root 0 Dec 20 00:00 ./snap.
92 0 -rw-r--r-- 1 root root 0 Jul 20 20:23 ./snap.
90 0 -rw-r--r-- 1 root root 0 Jul 20 20:23 ./snap.
36 0 -rw-r--r-- 1 nobody nogroup 0 Jan 7 23:31 ./notify_on_release
34 0 -rw-r--r-- 1 nobody nogroup 0 Jan 7 23:31 ./cgroup.