[2.9.37] Single unit cannot find binding for an endpoint, but endpoint has binding set in juju
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Triaged
|
Low
|
Unassigned |
Bug Description
On a deployment testing juju 2.9.37 with Jammy Yoga there was a single barbican (primary) and barbican-vault (subordinate) unit that failed to get the network address for the internal space:
unit-barbican-2: 07:11:30 ERROR unit.barbican/
Traceback (most recent call last):
File "/var/lib/
response = subprocess.
File "/usr/lib/
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/usr/lib/
raise CalledProcessEr
subprocess.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/var/lib/
bus.
File "/var/lib/
_invoke(
File "/var/lib/
handler.
File "/var/lib/
self.
File "/var/lib/
database.
File "/var/lib/
hostname = hookenv.
File "/var/lib/
return f(*args, **kwargs)
File "/var/lib/
raise NoNetworkBindin
charmhelpers.
but looking at the juju show-unit for that application the endpoint has a set address, and the binding is set:
- relation-id: 59
endpoint: shared-db
related-
application
related-units:
barbican-
in-scope: true
data:
barbican-
in-scope: true
data:
barbican-
in-scope: true
data:
The testrun can be found at:
https:/
and the crashdump can be found at:
https:/
for each unit inside the crashdump there are logs for juju show-status-log, juju show-machine, and juju show-unit. where the bindings and address can be seen.
Changed in juju: | |
milestone: | none → 2.9-backlog |
importance: | Undecided → Low |
status: | New → Triaged |
Changed in juju: | |
milestone: | 2.9-backlog → none |
When network-get is run, it is not guaranteed that Juju has collated all the address info - link layer devices, instance ips etc - for the host machine. Therefore the api call may return 0 network info records. This results in the charmhelper "NoNetworkBinding" error.
The Juju agent on the host machine gathers the link layer device info, and the controller polls for cloud allocated host instance addresses, so the network info does eventually become available.
The charm needs to be resilient to the fact that the address info may not immediately be known. It needs to take account of that error and try again a short time later. It needs to set its status to "Waiting" with a suitable message. And if after several attempts the address info is not available, it should set its status to "Blocked".
If this error happens, can you retry the hook - it should run ok the next time, assuming the address info has been populated. This will confirm the above theory and indicate that the charm needs to be fixed.