Wallaby ceilometer.compute.discovery fails to get domain metadata

Bug #1930446 reported by Zakhar Kirpichenko
50
This bug affects 8 people
Affects Status Importance Assigned to Milestone
Ceilometer
In Progress
High
Christophe Useinovic
Ubuntu Cloud Archive
Fix Released
High
Unassigned
Wallaby
Fix Released
High
Unassigned
Xena
Fix Released
High
Unassigned
Yoga
Fix Released
High
Unassigned
ceilometer (Ubuntu)
Fix Released
High
Unassigned
Hirsute
Won't Fix
High
Unassigned
Impish
Fix Released
High
Unassigned
Jammy
Fix Released
High
Unassigned

Bug Description

Hi!

Ceilometer compute agent fails to get libvirt domain metadata on an Ubuntu 20.04 LTS with the latest updates, kernel 5.4.0-65-generic and Openstack Wallaby Nova compute services installed using the official Wallaby repo for Ubuntu 20.04. All components have been deployed manually.

Ceilometer agent is configured with instance_discovery_method = libvirt_metadata. The agent is unable to fetch the domain metadata, and the following error messages appear in /var/log/ceilometer/ceilometer-agent-compute.log on agent start and periodic polling attempts:

2021-06-01 16:01:18.297 1835684 ERROR ceilometer.compute.discovery [-] Fail to get domain uuid baf06f57-ac5b-4661-928c-7adaeaea0311 metadata, libvirtError: metadata not found: Requested metadata element is not present: libvirt.libvirtError: metadata not found: Requested metadata element is not present
2021-06-01 16:01:18.298 1835684 ERROR ceilometer.compute.discovery [-] Fail to get domain uuid 208c0d7a-41a3-4fa6-bf72-2f9594ac6b8d metadata, libvirtError: metadata not found: Requested metadata element is not present: libvirt.libvirtError: metadata not found: Requested metadata element is not present
2021-06-01 16:01:18.300 1835684 ERROR ceilometer.compute.discovery [-] Fail to get domain uuid d979a527-c1ba-4b29-8e30-322d4d2efcf7 metadata, libvirtError: metadata not found: Requested metadata element is not present: libvirt.libvirtError: metadata not found: Requested metadata element is not present
2021-06-01 16:01:18.301 1835684 ERROR ceilometer.compute.discovery [-] Fail to get domain uuid a41f21b6-766d-4979-bbe1-84f421b0c3f2 metadata, libvirtError: metadata not found: Requested metadata element is not present: libvirt.libvirtError: metadata not found: Requested metadata element is not present
2021-06-01 16:01:18.302 1835684 ERROR ceilometer.compute.discovery [-] Fail to get domain uuid fd5ffe32-c6d6-4898-9ba2-2af1ffebd502 metadata, libvirtError: metadata not found: Requested metadata element is not present: libvirt.libvirtError: metadata not found: Requested metadata element is not present
2021-06-01 16:01:18.302 1835684 ERROR ceilometer.compute.discovery [-] Fail to get domain uuid aff042c9-c311-4944-bc42-09ccd5a90eb7 metadata, libvirtError: metadata not found: Requested metadata element is not present: libvirt.libvirtError: metadata not found: Requested metadata element is not present
2021-06-01 16:01:18.303 1835684 ERROR ceilometer.compute.discovery [-] Fail to get domain uuid 9510bc46-e4e2-490c-9cbe-c9eb5e349b8d metadata, libvirtError: metadata not found: Requested metadata element is not present: libvirt.libvirtError: metadata not found: Requested metadata element is not present
2021-06-01 16:01:18.304 1835684 ERROR ceilometer.compute.discovery [-] Fail to get domain uuid 4d2c2c9b-4eff-460a-a00b-19fdbe33f5d4 metadata, libvirtError: metadata not found: Requested metadata element is not present: libvirt.libvirtError: metadata not found: Requested metadata element is not present
2021-06-01 16:01:18.305 1835684 DEBUG ceilometer.polling.manager [-] Skip pollster cpu_l3_cache, no resources found this cycle poll_and_notify /usr/lib/python3/dist-packages/ceilometer/polling/manager.py:177
2021-06-01 16:01:18.305 1835684 DEBUG ceilometer.polling.manager [-] Skip pollster disk.device.write.bytes, no resources found this cycle poll_and_notify /usr/lib/python3/dist-packages/ceilometer/polling/manager.py:177
2021-06-01 16:01:18.305 1835684 DEBUG ceilometer.polling.manager [-] Skip pollster network.incoming.packets, no resources found this cycle poll_and_notify /usr/lib/python3/dist-packages/ceilometer/polling/manager.py:177
2021-06-01 16:01:18.305 1835684 DEBUG ceilometer.polling.manager [-] Skip pollster disk.device.read.requests, no resources found this cycle poll_and_notify /usr/lib/python3/dist-packages/ceilometer/polling/manager.py:177
2021-06-01 16:01:18.306 1835684 DEBUG ceilometer.polling.manager [-] Skip pollster network.outgoing.packets, no resources found this cycle poll_and_notify /usr/lib/python3/dist-packages/ceilometer/polling/manager.py:177
2021-06-01 16:01:18.306 1835684 DEBUG ceilometer.polling.manager [-] Skip pollster network.outgoing.bytes, no resources found this cycle poll_and_notify /usr/lib/python3/dist-packages/ceilometer/polling/manager.py:177
2021-06-01 16:01:18.306 1835684 DEBUG ceilometer.polling.manager [-] Skip pollster network.incoming.bytes, no resources found this cycle poll_and_notify /usr/lib/python3/dist-packages/ceilometer/polling/manager.py:177
2021-06-01 16:01:18.306 1835684 DEBUG ceilometer.polling.manager [-] Skip pollster cpu, no resources found this cycle poll_and_notify /usr/lib/python3/dist-packages/ceilometer/polling/manager.py:177
2021-06-01 16:01:18.306 1835684 DEBUG ceilometer.polling.manager [-] Skip pollster disk.device.write.requests, no resources found this cycle poll_and_notify /usr/lib/python3/dist-packages/ceilometer/polling/manager.py:177
2021-06-01 16:01:18.307 1835684 DEBUG ceilometer.polling.manager [-] Skip pollster disk.device.read.bytes, no resources found this cycle poll_and_notify /usr/lib/python3/dist-packages/ceilometer/polling/manager.py:177
2021-06-01 16:01:18.307 1835684 DEBUG ceilometer.polling.manager [-] Skip pollster memory.usage, no resources found this cycle poll_and_notify /usr/lib/python3/dist-packages/ceilometer/polling/manager.py:177

All domains exist and their metadata is readily available using virsh or a simple Python script. Nova compute service is fully functional, Ceilometer agent is partially functional as it is able to export compute.node.cpu.* metrics but nothing related to libvirt domains.

Installed Ceilometer-related packages:

ceilometer-agent-compute 2:16.0.0-0ubuntu1~cloud0
ceilometer-common 2:16.0.0-0ubuntu1~cloud0
python3-ceilometer 2:16.0.0-0ubuntu1~cloud0

Installed Nova-related packages:

nova-common 3:23.0.0-0ubuntu1~cloud0
nova-compute 3:23.0.0-0ubuntu1~cloud0
nova-compute-kvm 3:23.0.0-0ubuntu1~cloud0
nova-compute-libvirt 3:23.0.0-0ubuntu1~cloud0
python3-nova 3:23.0.0-0ubuntu1~cloud0
python3-novaclient 2:17.4.0-0ubuntu1~cloud0

Installed Libvirt-related packages:

libvirt-clients 6.0.0-0ubuntu8.9
libvirt-daemon 6.0.0-0ubuntu8.9
libvirt-daemon-driver-qemu 6.0.0-0ubuntu8.9
libvirt-daemon-driver-storage-rbd 6.0.0-0ubuntu8.9
libvirt-daemon-system 6.0.0-0ubuntu8.9
libvirt-daemon-system-systemd 6.0.0-0ubuntu8.9
libvirt0:amd64 6.0.0-0ubuntu8.9
python3-libvirt 6.1.0-1

Installed Qemu-related packages:

libvirt-daemon-driver-qemu 6.0.0-0ubuntu8.9
qemu-block-extra:amd64 1:4.2-3ubuntu6.16
qemu-kvm 1:4.2-3ubuntu6.16
qemu-system-common 1:4.2-3ubuntu6.16
qemu-system-data 1:4.2-3ubuntu6.16
qemu-system-gui:amd64 1:4.2-3ubuntu6.16
qemu-system-x86 1:4.2-3ubuntu6.16
qemu-utils 1:4.2-3ubuntu6.16

Apparmor is enabled and running the default configuration, no messages related to apparmor and libvirt, qemu, nova-compute, ceilometer-agent, etc are visible in the logs. I am also attaching the relevant Ceilometer agent and Nova configuration files.

Please let me know if further information is required.

Revision history for this message
Zakhar Kirpichenko (kzakhar) wrote :
description: updated
Revision history for this message
Matthias Runge (mrunge) wrote :

Thank you for your report!

Revision history for this message
Zakhar Kirpichenko (kzakhar) wrote :

xml_string = domain.metadata(...) is failing in /usr/lib/python3/dist-packages/ceilometer/compute/discovery.py at lines 141-143:

136 @libvirt_utils.retry_on_disconnect
137 def discover_libvirt_polling(self, manager, param=None):
138 instances = []
139 for domain in self.connection.listAllDomains():
140 try:
141 xml_string = domain.metadata(
142 libvirt.VIR_DOMAIN_METADATA_ELEMENT,
143 "http://openstack.org/xmlns/libvirt/nova/1.0")
144 except libvirt.libvirtError as e:
145 if libvirt_utils.is_disconnection_exception(e):
146 # Re-raise the exception so it's handled and retries
147 raise
148 LOG.error(
149 "Fail to get domain uuid %s metadata, libvirtError: %s",
150 domain.UUIDString(), e)
151 continue

Specifically, looks like the namespace is wrong: http://openstack.org/xmlns/libvirt/nova/1.0 fails, http://openstack.org/xmlns/libvirt/nova/1.1 works. Sample:

$ cat meta.py
#!/usr/bin/env python3
import sys
import libvirt
conn = None
try:
    conn = libvirt.open("qemu:///system")
except libvirt.libvirtError as e:
    print(repr(e), file=sys.stderr)
    exit(1)
domain = None
domName = 'instance-000002a3'
try:
    domain = conn.lookupByName(domName)
except libvirt.libvirtError as e:
    print(repr(e), file=sys.stderr)
    exit(1)
# this will fail:
# dommeta = domain.metadata(libvirt.VIR_DOMAIN_METADATA_ELEMENT,"http://openstack.org/xmlns/libvirt/nova/1.0")
# this will fail ^^
dommeta = domain.metadata(libvirt.VIR_DOMAIN_METADATA_ELEMENT,"http://openstack.org/xmlns/libvirt/nova/1.1")
print('metadata \n' + dommeta)
conn.close()
exit(0)

$ ./meta.py
metadata
<instance>
  <package version="23.0.0"/>
  <name>BLABLA</name>
  <creationTime>2021-05-27 14:50:33</creationTime>
  <flavor name="M - Medium">
    <memory>8192</memory>
    <disk>0</disk>
    <swap>0</swap>
    <ephemeral>0</ephemeral>
    <vcpus>8</vcpus>
  </flavor>
  <owner>
    <user uuid="06c0a9d4ad7a4dc2adf4a4814e6e112f">BLA</user>
    <project uuid="0879b19a80c44e54bde13fcee6a6b0a4">BLA</project>
  </owner>
  <root type="image" uuid="4b01b36a-dd66-45b1-b9bf-c3b40779aeab"/>
  <ports>
    <port uuid="e5efc4b6-9267-4b46-afbb-c256464c62c6">
      <ip type="fixed" address="A.B.C.D" ipVersion="4"/>
    </port>
  </ports>
</instance>

After adjusting the XML namespace to http://openstack.org/xmlns/libvirt/nova/1.1 in /usr/lib/python3/dist-packages/ceilometer/compute/discovery.py Ceilometer agent works as well.

I'm not sure what's causing the version mismatch, but hope this helps.

Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

o/ I'm writing functional tests for openstack/charm-ceilometer-agent [0][1] that are validating which metrics get published depending on what we put in polling.yaml.

I'm hitting consistently this issue, i.e. I can reproduce it consistently in our test gate [2]

Fail to get domain uuid 9539cf3b-1448-43d8-aff6-93becd57392e metadata, libvirtError: metadata not found: Requested metadata element is not pre
sent: libvirt.libvirtError: metadata not found: Requested metadata element is not present

This is hitting us both on Ubuntu 20.04 and 21.04, but only starting with Wallaby (ceilometer 16.0.0-0ubuntu1~cloud0). All older releases from Rocky to Victoria (ceilometer 15.0.0-0ubuntu2~cloud0) are fine (validated in combination with Ubuntu 18.04, 20.04 and 20.10 where applicable).

[0] https://github.com/openstack-charmers/zaza-openstack-tests/pull/615
[1] https://review.opendev.org/c/openstack/charm-ceilometer-agent/+/803359
[2] https://bugs.launchpad.net/charm-ceilometer-agent/+bug/1938884

Changed in ceilometer:
status: New → Confirmed
Revision history for this message
李亚冲 (879228763-6) wrote :

Hi, I also faced this problem , i use 'virsh' command

```
root@compute02:/var/log/kolla/ceilometer# docker exec nova_libvirt virsh metadata 00fe5107-62e7-4102-a37e-d6696d69f05a http://openstack.org/xmlns/libvirt/nova/1.0
error: metadata not found: Requested metadata element is not present

root@compute02:/var/log/kolla/ceilometer# docker exec nova_libvirt virsh metadata 00fe5107-62e7-4102-a37e-d6696d69f05a http://openstack.org/xmlns/libvirt/nova/1.1
<instance>
  <package version="23.0.3"/>
  <name>fedora_test</name>
  <creationTime>2021-09-15 08:29:50</creationTime>
  <flavor name="1">
    <memory>1024</memory>
    <disk>10</disk>
    <swap>0</swap>
    <ephemeral>0</ephemeral>
    <vcpus>2</vcpus>
  </flavor>
  <owner>
    <user uuid="2e84824d03644c7199c8b4310aef7f2b">admin</user>
    <project uuid="72eb4addd34a4ae4908ab3078cba79df">admin</project>
  </owner>
  <ports>
    <port uuid="2be22994-9776-462f-a864-8705417bf04e">
      <ip type="fixed" address="10.0.0.74" ipVersion="4"/>
    </port>
  </ports>
</instance>
```

1.0 not working

Revision history for this message
Uriel Medina (uriel-oncloud) wrote :

I can confirm this bug.
Specifically I was having issues getting some metrics.
I just change 1.0 for 1.1.
Wonder If this brings problems with another metrics.

Revision history for this message
Christophe Useinovic (cuseinovic) wrote (last edit ):

Hey,
I confirm too this bug, i found this and i forgot to push a fix (my bad :p)

I added a ceilometer_extra_volumes for this with a entier folder of https://github.com/openstack/ceilometer/tree/stable/wallaby/ceilometer/compute with the fix of nova1.0 to nova1.1

Changed in ceilometer:
assignee: nobody → Christophe Useinovic (cuseinovic)
status: Confirmed → In Progress
Matthias Runge (mrunge)
Changed in ceilometer:
importance: Undecided → High
Revision history for this message
Matthias Runge (mrunge) wrote :
Revision history for this message
Christophe Useinovic (cuseinovic) wrote :

Tested with py38 + tox.

======
Totals
======
Ran: 956 tests in 115.6147 sec.
 - Passed: 955
 - Skipped: 1
 - Expected Fail: 0
 - Unexpected Success: 0
 - Failed: 0
Sum of execute time for each test: 426.1125 sec.

==============
Worker Balance
==============
 - Worker 0 (120 tests) => 0:00:42.673975
 - Worker 1 (124 tests) => 0:00:53.060556
 - Worker 2 (110 tests) => 0:01:55.608775
 - Worker 3 (114 tests) => 0:00:46.632188
 - Worker 4 (133 tests) => 0:00:54.534880
 - Worker 5 (120 tests) => 0:00:34.788326
 - Worker 6 (114 tests) => 0:00:43.485239
 - Worker 7 (121 tests) => 0:00:38.682404
py38 run-test: commands[1] | oslo-config-generator --config-file=etc/ceilometer/ceilometer-config-generator.conf
_____________________________________________________________________________________________________ summary _____________________________________________________________________________________________________
  py38: commands succeeded
  congratulations :)

Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

I see this is now fixed on `master` [1], thanks a lot! Could this please be cherry-picked to `stable/wallaby` and `stable/xena`? Thanks!

[1]: https://github.com/openstack/ceilometer/commit/337eeac6

Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :
Changed in ceilometer (Ubuntu Hirsute):
status: New → Confirmed
status: Confirmed → Triaged
Changed in ceilometer (Ubuntu Impish):
status: New → Triaged
Changed in ceilometer (Ubuntu Jammy):
status: New → Triaged
Changed in ceilometer (Ubuntu Impish):
importance: Undecided → High
Changed in ceilometer (Ubuntu Hirsute):
importance: Undecided → High
Changed in ceilometer (Ubuntu Jammy):
importance: Undecided → High
Changed in cloud-archive:
status: New → Triaged
Revision history for this message
Pavlo Shchelokovskyy (pshchelo) wrote :

note that the merged fix is not fixing everything - it will still break for any instance that was booted in Victoria or earlier, and was not rebooted/migrated or otherwise had it's XML regenerated since (btw, could that be why the grenade is failing on the cherry-pick to Wallaby?)

the code should support both versions of the namespace, new and old, and not only in Wallaby as a transitional measure, but more or less indefinitely - in the fields we have seen some "golden" instances that survived consequential openstack upgrades for >4 releases...

Revision history for this message
Pavlo Shchelokovskyy (pshchelo) wrote :

we can check for the libvirt error code smth like this and act accordingly

try:
    xml_string = domain.metatada(...)
except libvirt.libvirtError as exc:
    if exc.get_error_code() == libvirt.VIR_ERR_NO_DOMAIN_METADATA:
        ...

It may make sense to extract that logic to a separate function in the libvirt_utils.py module.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

It's worth noting that grenade failed on the stable/xena review also but that test is non-voting.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

@Pavlo, were you planning to submit a patch for the point you made in comment #13?

Revision history for this message
Brian Murray (brian-murray) wrote :

The Hirsute Hippo has reached End of Life, so this bug will not be fixed for that release.

Changed in ceilometer (Ubuntu Hirsute):
status: Triaged → Won't Fix
Revision history for this message
Tobias Urdin (tobias-urdin) wrote :

Fix for this is merged in Wallaby and Xena and will be in next releases.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/ceilometer 17.0.1

This issue was fixed in the openstack/ceilometer 17.0.1 release.

Revision history for this message
James Page (james-page) wrote :

thanks @tobias-urdin for your work on this issue.

Ubuntu and the Ubuntu Cloud Archive will pick this up as part of the 16.0.x and 17.0.x stable releases.

Revision history for this message
James Page (james-page) wrote :

For reference, I've dropped ceilometer point releases into:

  ppa:james-page/wallaby
  ppa:james-page/xena

for the Wallaby and Xena Ubuntu Cloud Archive pockets - I've tested at Xena and the point release does the trick with regards to resolving the namespace issue.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/ceilometer 18.0.0.0rc1

This issue was fixed in the openstack/ceilometer 18.0.0.0rc1 release candidate.

Changed in ceilometer (Ubuntu Jammy):
status: Triaged → Fix Released
Changed in ceilometer (Ubuntu Impish):
status: Triaged → Fix Released
Changed in ceilometer (Ubuntu):
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.