Single lxd machine stuck on pending, Container started

Bug #1956981 reported by Bas de Bruijne
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Invalid
High
Unassigned

Bug Description

Single lxd machine stuck on pending:

------------------------------------------
Machine State DNS Inst id Series AZ Message
0 started 10.244.8.128 azurill focal zone1 Deployed
0/lxd/0 started 10.244.8.176 juju-7c9753-0-lxd-0 focal zone1 Container started
0/lxd/1 started 10.246.65.92 juju-7c9753-0-lxd-1 focal zone1 Container started
0/lxd/2 started 10.244.8.178 juju-7c9753-0-lxd-2 focal zone1 Container started
0/lxd/3 pending pending focal Creating container
0/lxd/4 started 10.246.65.79 juju-7c9753-0-lxd-4 focal zone1 Container started
0/lxd/5 started 10.244.8.177 juju-7c9753-0-lxd-5 focal zone1 Container started
0/lxd/6 started 10.246.65.89 juju-7c9753-0-lxd-6 focal zone1 Container started
0/lxd/7 started 10.244.8.179 juju-7c9753-0-lxd-7 focal zone1 Container started
0/lxd/8 started 10.244.8.167 juju-7c9753-0-lxd-8 focal zone1 Container started
0/lxd/9 started 10.244.8.175 juju-7c9753-0-lxd-9 focal zone1 Container started
------------------------------------------

In the logs:
------------------------------------------
var/log/kern.log:Jan 8 01:10:42 azurill kernel: [ 716.131356] audit: type=1400 audit(1641604242.710:332): apparmor="DENIED" operation="file_inherit" namespace="root//lxd-juju-7c9753-0-lxd-3_<var-snap-lxd-common-lxd>" profile="/snap/snapd/14295/usr/lib/snapd/snap-confine" pid=32816 comm="snap-confine" family="netlink" sock_type="raw" protocol=15 requested_mask="send receive" denied_mask="send receive"
var/log/kern.log:Jan 8 01:10:42 azurill kernel: [ 716.193780] audit: type=1400 audit(1641604242.774:333): apparmor="DENIED" operation="file_inherit" namespace="root//lxd-juju-7c9753-0-lxd-3_<var-snap-lxd-common-lxd>" profile="snap-update-ns.lxd" name="/apparmor/.null" pid=32874 comm="6" requested_mask="wr" denied_mask="wr" fsuid=1000000 ouid=0
var/log/kern.log-Jan 8 01:10:46 azurill kernel: [ 720.362843] kauditd_printk_skb: 13 callbacks suppressed
------------------------------------------

Similar messages show up for the different containers on machine 0, but they are not quite the same.

Testrun:
https://solutions.qa.canonical.com/testruns/testRun/ab94b749-b99f-473b-8997-afa48c6815dd

Links to crashdumps:
https://oil-jenkins.canonical.com/artifacts/ab94b749-b99f-473b-8997-afa48c6815dd/index.html

Future occurrences of this bug can be found here:
https://solutions.qa.canonical.com/bugs/bugs/bug/1956981

description: updated
Revision history for this message
Bas de Bruijne (basdbruijne) wrote :

This issue seems to be very active on jammy as well

summary: - Single lxd machine stuck on pending, creating container
+ Single lxd machine stuck on pending, Container started
Revision history for this message
Bas de Bruijne (basdbruijne) wrote :
Revision history for this message
Heather Lanigan (hmlanigan) wrote :

@basdbruijne, the solqa results in #2 do not show the same bug as this.

In the case of #2, the container starts, as does the juju agent on the container, however it has errors.

$ grep pending juju-crashdump-openstack-2022-10-20-07.32.21/juju_status.txt
5/lxd/5 pending 10.246.167.51 juju-dd3b16-5-lxd-5 ubuntu:20.04 zone3 Container started

The machine agent shutdown down starting with?
2022-10-20 03:46:29 DEBUG juju.worker.dependency engine.go:616 "unconverted-api-workers" manifold worker stopped: agent should be terminated

Revision history for this message
John A Meinel (jameinel) wrote :

Solutions QA is saying that they are also seeing failures to start containers on Jammy, and this is preventing them from doing their Openstack Jammy testing.

Changed in juju:
importance: Undecided → High
milestone: none → 2.9.38
status: New → Triaged
Revision history for this message
Bas de Bruijne (basdbruijne) wrote :
Download full text (8.2 KiB)

In testrun https://solutions.qa.canonical.com/testruns/testRun/fd79805c-8f0c-4965-af14-e01017439fe9 I looked around on the life env. Here, 2 machines are in this state:

```
Machine State Address Inst id Series AZ Message
0 started 10.246.167.190 solqa-lab1-server-07 jammy zone1 Deployed
0/lxd/0 started 10.246.167.149 juju-6ba804-0-lxd-0 jammy zone1 Container started
0/lxd/1 pending juju-6ba804-0-lxd-1 jammy zone1 Container started
0/lxd/2 started 10.246.164.253 juju-6ba804-0-lxd-2 jammy zone1 Container started
0/lxd/3 pending juju-6ba804-0-lxd-3 jammy zone1 Container started
0/lxd/4 started 10.246.166.148 juju-6ba804-0-lxd-4 jammy zone1 Container started
0/lxd/5 started 10.246.165.82 juju-6ba804-0-lxd-5 jammy zone1 Container started
0/lxd/6 started 10.246.167.96 juju-6ba804-0-lxd-6 jammy zone1 Container started
0/lxd/7 started 10.246.167.159 juju-6ba804-0-lxd-7 jammy zone1 Container started
0/lxd/8 started 10.246.166.215 juju-6ba804-0-lxd-8 jammy zone1 Container started
0/lxd/9 started 10.246.165.72 juju-6ba804-0-lxd-9 jammy zone1 Container started
0/lxd/10 started 10.246.164.203 juju-6ba804-0-lxd-10 jammy zone1 Container started
```

But logging on to the machines themselves shows no problem:
```
ubuntu@solqa-lab1-server-07:~$ sudo lxc list
To start your first container, try: lxc launch ubuntu:22.04
Or for a virtual machine: lxc launch ubuntu:22.04 --vm

+----------------------+---------+-----------------------+------+-----------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+----------------------+---------+-----------------------+------+-----------+-----------+
| juju-6ba804-0-lxd-0 | RUNNING | 10.246.173.8 (eth1) | | CONTAINER | 0 |
| | | 10.246.172.111 (eth1) | | | |
| | | 10.246.169.47 (eth0) | | | |
| | | 10.246.168.111 (eth0) | | | |
| | | 10.246.167.149 (eth2) | | | |
+----------------------+---------+-----------------------+------+-----------+-----------+
| juju-6ba804-0-lxd-1 | RUNNING | 10.246.176.28 (eth1) | | CONTAINER | 0 |
| | | 10.246.172.62 (eth0) | | | |
+----------------------+---------+-----------------------+------+-----------+-----------+
| juju-6ba804-0-lxd-2 | RUNNING | 10.246.172.251 (eth1) | | CONTAINER | 0 |
| | | 10.246.168.255 (eth0) | | | |
| | | 10.246.164.253 (eth2) | | | |
+----------------------+---------+-----------------------+------+-----------+-----------+
| juju-6ba804-0-lxd-3 | RUNNING | 10.246.169.48 (eth0) | | CONTAINER | 0 |
+----------------------+---------+-----------------------+------+-----------+-----------+
| juju-6ba804-0-lxd-4 | ...

Read more...

tags: added: cdo-qa
Changed in juju:
milestone: 2.9.38 → 2.9.39
Revision history for this message
Moises Emilio Benzan Mora (moisesbenzan) wrote :
John A Meinel (jameinel)
description: updated
Changed in juju:
milestone: 2.9.39 → 2.9.40
Changed in juju:
milestone: 2.9.40 → 2.9.41
Revision history for this message
Cristovao Cordeiro (cjdc) wrote (last edit ):

I confirm this happens too, consistently, when running the tutorial from https://juju.is/docs/sdk/build-and-deploy-minimal-machine-charm

$ juju version
2.9.38-ubuntu-amd64

Changed in juju:
milestone: 2.9.41 → 2.9.42
Changed in juju:
milestone: 2.9.42 → 2.9.43
Changed in juju:
milestone: 2.9.43 → 2.9.44
Changed in juju:
milestone: 2.9.44 → 2.9.45
Changed in juju:
milestone: 2.9.45 → 2.9.46
Revision history for this message
Ian Booth (wallyworld) wrote :

The next 2.9.46 candidate release will not include a fix for this bug and we don't plan on any more 2.9 releases. As such it is being removed from its 2.9 milestone.

If the bug is still important to you, let us know and we can consider it for inclusion on a 3.x milestone.

Changed in juju:
milestone: 2.9.46 → none
Revision history for this message
Bas de Bruijne (basdbruijne) wrote :

Marking as invalid due to inactivity

Changed in juju:
status: Triaged → Invalid
Revision history for this message
Jeffrey Chang (modern911) wrote :

Checked the crashdump, there's no log available for the lxd instance at all, since it blocked during boot up.
We need to find a live environment, and see what block the boot up.

Revision history for this message
Jeffrey Chang (modern911) wrote :
Download full text (9.4 KiB)

bumped into https://solutions.qa.canonical.com/testruns/8ac03f3d-fae7-40e9-ae4d-2092566947eb

2 started 10.241.128.93 meowth ubuntu@22.04 zone2 Deployed
2/lxd/0 started 10.241.128.127 juju-16f545-2-lxd-0 ubuntu@22.04 zone2 Container started
2/lxd/1 started 10.241.128.128 juju-16f545-2-lxd-1 ubuntu@22.04 zone2 Container started
2/lxd/2 pending juju-16f545-2-lxd-2 ubuntu@22.04 zone2 Container started
2/lxd/3 started 10.241.128.105 juju-16f545-2-lxd-3 ubuntu@22.04 zone2 Container started

root@meowth:~# lxc list
+----------------------+---------+-----------------------+------+-----------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+----------------------+---------+-----------------------+------+-----------+-----------+
| juju-16f545-2-lxd-0 | RUNNING | 192.168.108.61 (eth2) | | CONTAINER | 0 |
| | | 192.168.108.20 (eth2) | | | |
| | | 10.242.108.66 (eth1) | | | |
| | | 10.242.108.20 (eth1) | | | |
| | | 10.241.128.127 (eth0) | | | |
+----------------------+---------+-----------------------+------+-----------+-----------+
| juju-16f545-2-lxd-1 | RUNNING | 192.168.108.62 (eth2) | | CONTAINER | 0 |
| | | 10.242.108.67 (eth1) | | | |
| | | 10.241.128.128 (eth0) | | | |
+----------------------+---------+-----------------------+------+-----------+-----------+
| juju-16f545-2-lxd-2 | RUNNING | 192.168.113.9 (eth1) | | CONTAINER | 0 |
| | | 192.168.109.10 (eth0) | | | |
| | | 10.241.128.111 (eth2) | | | |
+----------------------+---------+-----------------------+------+-----------+-----------+
| juju-16f545-2-lxd-3 | RUNNING | 192.168.109.9 (eth0) | | CONTAINER | 0 |
| | | 192.168.108.40 (eth3) | | | |
| | | 192.168.108.34 (eth3) | | | |
| | | 10.242.108.9 (eth2) | | | |
| | | 10.242.108.34 (eth2) | | | |
| | | 10.241.128.105 (eth1) | | | |

syslog from juju-16f545-2-lxd-2
Nov 26 19:39:12 juju-16f545-2-lxd-2 systemd[1]: Starting Update APT News...
Nov 26 19:39:12 juju-16f545-2-lxd-2 systemd[1]: Starting Update the local ESM caches...
Nov 26 19:39:14 juju-16f545-2-lxd-2 systemd-resolved[298]: Using degraded feature set UDP instead of UDP+EDNS0 for DNS server 192.168.108.4.
Nov 26 19:39:17 juju-16f545-2-lxd-2 systemd-resolved[298]: Using degraded feature set TCP instead of UDP for DNS server 192.168.108.3.
Nov 26 19:39:27 juju-16f545-2-lxd-2 systemd-resolved[298]: Using d...

Read more...

Revision history for this message
Jeffrey Chang (modern911) wrote (last edit ):

from one of the pending lxd unit, I feel network is not smooth here.

Nov 28 23:19:13 juju-c14165-2-lxd-0 systemd-resolved[298]: Using degraded feature set UDP instead of UDP+EDNS0 for DNS server 192.168.110.4.
Nov 28 23:19:13 juju-c14165-2-lxd-0 systemd-journald[59]: Forwarding to syslog missed 444 messages.
Nov 28 23:19:17 juju-c14165-2-lxd-0 systemd-resolved[298]: Using degraded feature set UDP instead of UDP+EDNS0 for DNS server 192.168.110.2.
Nov 28 23:19:20 juju-c14165-2-lxd-0 systemd-resolved[298]: Using degraded feature set UDP instead of UDP+EDNS0 for DNS server 10.241.144.5.
Nov 28 23:19:21 juju-c14165-2-lxd-0 systemd[1]: systemd-timedated.service: Deactivated successfully.
Nov 28 23:19:23 juju-c14165-2-lxd-0 systemd-resolved[298]: Using degraded feature set UDP instead of UDP+EDNS0 for DNS server 10.241.144.2.
Nov 28 23:19:26 juju-c14165-2-lxd-0 systemd-resolved[298]: Using degraded feature set UDP instead of UDP+EDNS0 for DNS server 10.241.144.3.
Nov 28 23:19:29 juju-c14165-2-lxd-0 systemd-resolved[298]: Using degraded feature set UDP instead of UDP+EDNS0 for DNS server 10.241.144.4.
Nov 28 23:19:32 juju-c14165-2-lxd-0 systemd-resolved[298]: Using degraded feature set UDP instead of UDP+EDNS0 for DNS server 10.241.144.6.

root@juju-c14165-2-lxd-0:~# ip route
default via 10.241.144.1 dev eth1 proto static
10.241.144.0/21 dev eth1 proto kernel scope link src 10.241.144.102
10.242.10.0/24 dev eth2 proto kernel scope link src 10.242.10.8
192.168.112.0/24 dev eth0 proto kernel scope link src 192.168.112.7

root@juju-c14165-2-lxd-0:~# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
19: eth0@if20: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
    link/ether 00:16:3e:12:49:f0 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.112.7/24 brd 192.168.112.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::216:3eff:fe12:49f0/64 scope link
       valid_lft forever preferred_lft forever
21: eth1@if22: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:16:3e:b4:e6:15 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.241.144.102/21 brd 10.241.151.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::216:3eff:feb4:e615/64 scope link
       valid_lft forever preferred_lft forever
23: eth2@if24: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:16:3e:e3:3f:7a brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.242.10.8/24 brd 10.242.10.255 scope global eth2
       valid_lft forever preferred_lft forever
    inet6 fe80::216:3eff:fee3:3f7a/64 scope link
       valid_lft forever preferred_lft forever

Revision history for this message
Jeffrey Chang (modern911) wrote (last edit ):

root@juju-690314-0-lxd-0:/var/log# ps -efw
UID PID PPID C STIME TTY TIME CMD
root 1095 1 0 03:33 ? 00:00:00 /usr/bin/python3 /usr/bin/cloud-init modules --mode=final
root 1097 1095 0 03:33 ? 00:00:00 /bin/sh -c tee -a /var/log/cloud-init-output.log
root 1098 1097 0 03:33 ? 00:00:00 tee -a /var/log/cloud-init-output.log
root 1099 1095 0 03:33 ? 00:00:00 /usr/bin/apt-get --option=Dpkg::Options::=--force-confold --option=Dpkg::options::=--force-unsafe-io --assume-yes --quiet update
_apt 1110 1099 0 03:33 ? 00:00:00 /usr/lib/apt/methods/http
_apt 1111 1099 0 03:33 ? 00:00:00 /usr/lib/apt/methods/http
_apt 1123 1099 0 03:33 ? 00:00:00 /usr/lib/apt/methods/gpgv
_apt 1278 1099 0 03:33 ? 00:00:01 /usr/lib/apt/methods/store
root 1493 1344 0 04:18 pts/1 00:00:00 ps -efw

confirmed apt-get update got stuck, from /var/log/cloud-init-output.log.

tags: added: severity-high
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.