[3.1.5] lxd machine doesn't get IP address

Bug #2038556 reported by Bas de Bruijne
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
New
Undecided
Unassigned

Bug Description

In test run https://solutions.qa.canonical.com/testruns/ec5d60e6-ea74-439d-9444-cfcf729f9f01, which is charmed kubernetes jammy with juju 3.1.5, ceph-mon fails to install with the following message:

============
ceph-mon/0* blocked executing 0/lxd/0 10.246.166.234 Insufficient peer units to bootstrap cluster (require 3)
  filebeat/12 waiting executing 10.246.166.234 (install) Waiting for: elasticsearch, logstash or kafka.
  landscape-client/12 maintenance executing 10.246.166.234 (install) installing charm software
  logrotated/11 waiting allocating 10.246.166.234 agent initialising
  nrpe/18 blocked executing 10.246.166.234 (install) Nagios server not configured or related
  telegraf/12 maintenance executing 10.246.166.234 (install) installing charm software
ceph-mon/1 error idle 1/lxd/0 hook failed: "config-changed"
  filebeat/14 waiting allocating waiting for machine
  landscape-client/14 waiting allocating waiting for machine
  logrotated/13 waiting allocating waiting for machine
  nrpe/20 waiting allocating waiting for machine
  telegraf/14 waiting allocating waiting for machine
ceph-mon/2 blocked executing 2/lxd/0 10.246.166.49 (install) Insufficient peer units to bootstrap cluster (require 3)
  filebeat/15 waiting allocating 10.246.166.49 agent initialising
  landscape-client/15 maintenance executing 10.246.166.49 (install) installing charm software
  logrotated/14 waiting allocating 10.246.166.49 agent initialising
  nrpe/21 waiting allocating 10.246.166.49 agent initialising
  telegraf/15 waiting allocating 10.246.166.49 agent initialising
============

In the log it fails on a network-get command:
============
subprocess.CalledProcessError: Command '['network-get', '--primary-address', 'public']' returned non-zero exit status 1.
============

Indeed, the juju status output does not show a primary-address for the ceph-mon/1 unit.
In the syslog, it looks like lxd does report an IP address:
============
Oct 2 18:26:35 juju-a008bf-1-lxd-0 cloud-init[911]: + ip addr
Oct 2 18:26:35 juju-a008bf-1-lxd-0 cloud-init[911]: 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
Oct 2 18:26:35 juju-a008bf-1-lxd-0 cloud-init[911]: link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
Oct 2 18:26:35 juju-a008bf-1-lxd-0 cloud-init[911]: inet 127.0.0.1/8 scope host lo
Oct 2 18:26:35 juju-a008bf-1-lxd-0 cloud-init[911]: valid_lft forever preferred_lft forever
Oct 2 18:26:35 juju-a008bf-1-lxd-0 cloud-init[911]: inet6 ::1/128 scope host
Oct 2 18:26:35 juju-a008bf-1-lxd-0 cloud-init[911]: valid_lft forever preferred_lft forever
Oct 2 18:26:35 juju-a008bf-1-lxd-0 cloud-init[911]: 19: eth0@if20: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
Oct 2 18:26:35 juju-a008bf-1-lxd-0 cloud-init[911]: link/ether 00:16:3e:69:ab:27 brd ff:ff:ff:ff:ff:ff link-netnsid 0
Oct 2 18:26:35 juju-a008bf-1-lxd-0 cloud-init[911]: inet 10.246.172.101/22 brd 10.246.175.255 scope global eth0
Oct 2 18:26:35 juju-a008bf-1-lxd-0 cloud-init[911]: valid_lft forever preferred_lft forever
============

For some reason, this 10.246.175.255 IP is not found by the juju controller. I don't see any indication in the logs as to why.

More configs and crashdumps can be found here: https://oil-jenkins.canonical.com/artifacts/ec5d60e6-ea74-439d-9444-cfcf729f9f01/index.html

Revision history for this message
Bas de Bruijne (basdbruijne) wrote :
tags: added: cdo-qa foundations-engine
Revision history for this message
Ian Booth (wallyworld) wrote :

Is this intermittent? Any feel for how often it occurs? From memory, there's been issues before due to races initialising services like netplan and lxd. I can't recall the exact details off hand.

Revision history for this message
Bas de Bruijne (basdbruijne) wrote :

It is intermittent and has only occurred 3 times last week: https://solutions.qa.canonical.com/bugs/2038556. It looks a bit like LP: #1991552, but I haven't checked yet if the source of the problem is similar. We are currently having some performance issues with our Maas, which may be related to this. We are working on getting our Maas running smoothly again so we will see if this issue persists.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.