Canonical Juju

[3.1.5] lxd machine doesn't get IP address

Bug #2038556 reported by Bas de Bruijne on 2023-10-05

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Canonical Juju	New	Undecided	Unassigned

Bug Description

In test run https://solutions.qa.canonical.com/testruns/ec5d60e6-ea74-439d-9444-cfcf729f9f01, which is charmed kubernetes jammy with juju 3.1.5, ceph-mon fails to install with the following message:

============
ceph-mon/0* blocked executing 0/lxd/0 10.246.166.234 Insufficient peer units to bootstrap cluster (require 3)
  filebeat/12 waiting executing 10.246.166.234 (install) Waiting for: elasticsearch, logstash or kafka.
  landscape-client/12 maintenance executing 10.246.166.234 (install) installing charm software
  logrotated/11 waiting allocating 10.246.166.234 agent initialising
  nrpe/18 blocked executing 10.246.166.234 (install) Nagios server not configured or related
  telegraf/12 maintenance executing 10.246.166.234 (install) installing charm software
ceph-mon/1 error idle 1/lxd/0 hook failed: "config-changed"
  filebeat/14 waiting allocating waiting for machine
  landscape-client/14 waiting allocating waiting for machine
  logrotated/13 waiting allocating waiting for machine
  nrpe/20 waiting allocating waiting for machine
  telegraf/14 waiting allocating waiting for machine
ceph-mon/2 blocked executing 2/lxd/0 10.246.166.49 (install) Insufficient peer units to bootstrap cluster (require 3)
  filebeat/15 waiting allocating 10.246.166.49 agent initialising
  landscape-client/15 maintenance executing 10.246.166.49 (install) installing charm software
  logrotated/14 waiting allocating 10.246.166.49 agent initialising
  nrpe/21 waiting allocating 10.246.166.49 agent initialising
  telegraf/15 waiting allocating 10.246.166.49 agent initialising
============

In the log it fails on a network-get command:
============
subprocess.CalledProcessError: Command '['network-get', '--primary-address', 'public']' returned non-zero exit status 1.
============

Indeed, the juju status output does not show a primary-address for the ceph-mon/1 unit.
In the syslog, it looks like lxd does report an IP address:
============
Oct 2 18:26:35 juju-a008bf-1-lxd-0 cloud-init[911]: + ip addr
Oct 2 18:26:35 juju-a008bf-1-lxd-0 cloud-init[911]: 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
Oct 2 18:26:35 juju-a008bf-1-lxd-0 cloud-init[911]: link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
Oct 2 18:26:35 juju-a008bf-1-lxd-0 cloud-init[911]: inet 127.0.0.1/8 scope host lo
Oct 2 18:26:35 juju-a008bf-1-lxd-0 cloud-init[911]: valid_lft forever preferred_lft forever
Oct 2 18:26:35 juju-a008bf-1-lxd-0 cloud-init[911]: inet6 ::1/128 scope host
Oct 2 18:26:35 juju-a008bf-1-lxd-0 cloud-init[911]: valid_lft forever preferred_lft forever
Oct 2 18:26:35 juju-a008bf-1-lxd-0 cloud-init[911]: 19: eth0@if20: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
Oct 2 18:26:35 juju-a008bf-1-lxd-0 cloud-init[911]: link/ether 00:16:3e:69:ab:27 brd ff:ff:ff:ff:ff:ff link-netnsid 0
Oct 2 18:26:35 juju-a008bf-1-lxd-0 cloud-init[911]: inet 10.246.172.101/22 brd 10.246.175.255 scope global eth0
Oct 2 18:26:35 juju-a008bf-1-lxd-0 cloud-init[911]: valid_lft forever preferred_lft forever
============

For some reason, this 10.246.175.255 IP is not found by the juju controller. I don't see any indication in the logs as to why.

More configs and crashdumps can be found here: https://oil-jenkins.canonical.com/artifacts/ec5d60e6-ea74-439d-9444-cfcf729f9f01/index.html

Tags:

Revision history for this message

Bas de Bruijne (basdbruijne) wrote on 2023-10-05:

The same is also observed in 3.1.6 in test run https://solutions.qa.canonical.com/testruns/e06c0728-c08e-43c7-826d-8b5b34b6075f

tags:

added: cdo-qa foundations-engine

Revision history for this message

Ian Booth (wallyworld) wrote on 2023-10-10:

Is this intermittent? Any feel for how often it occurs? From memory, there's been issues before due to races initialising services like netplan and lxd. I can't recall the exact details off hand.

Revision history for this message

Bas de Bruijne (basdbruijne) wrote on 2023-10-10:

It is intermittent and has only occurred 3 times last week: https://solutions.qa.canonical.com/bugs/2038556. It looks a bit like LP: #1991552, but I haven't checked yet if the source of the problem is similar. We are currently having some performance issues with our Maas, which may be related to this. We are working on getting our Maas running smoothly again so we will see if this issue persists.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.