BDMNotFound race while building resources since 11/27

Bug #1521340 reported by Matt Riedemann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Matt Riedemann

Bug Description

http://logs.openstack.org/94/242594/2/check/gate-tempest-dsvm-nova-v20-api/085f243/logs/screen-n-cpu.txt.gz?level=TRACE#_2015-11-30_18_03_48_003

2015-11-30 18:03:48.003 ERROR nova.compute.manager [req-b3bc5257-4673-4850-9ba0-e1e136c131b3 tempest-DeleteServersTestJSON-651179004 tempest-DeleteServersTestJSON-684647951] [instance: 75986322-d59d-411c-a60c-776415cdd6d1] Failure prepping block device
2015-11-30 18:03:48.003 4993 ERROR nova.compute.manager [instance: 75986322-d59d-411c-a60c-776415cdd6d1] Traceback (most recent call last):
2015-11-30 18:03:48.003 4993 ERROR nova.compute.manager [instance: 75986322-d59d-411c-a60c-776415cdd6d1] File "/opt/stack/new/nova/nova/compute/manager.py", line 2132, in _build_resources
2015-11-30 18:03:48.003 4993 ERROR nova.compute.manager [instance: 75986322-d59d-411c-a60c-776415cdd6d1] block_device_mapping)
2015-11-30 18:03:48.003 4993 ERROR nova.compute.manager [instance: 75986322-d59d-411c-a60c-776415cdd6d1] File "/opt/stack/new/nova/nova/compute/manager.py", line 1689, in _default_block_device_names
2015-11-30 18:03:48.003 4993 ERROR nova.compute.manager [instance: 75986322-d59d-411c-a60c-776415cdd6d1] root_bdm.save()
2015-11-30 18:03:48.003 4993 ERROR nova.compute.manager [instance: 75986322-d59d-411c-a60c-776415cdd6d1] File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 206, in wrapper
2015-11-30 18:03:48.003 4993 ERROR nova.compute.manager [instance: 75986322-d59d-411c-a60c-776415cdd6d1] ctxt, self, fn.__name__, args, kwargs)
2015-11-30 18:03:48.003 4993 ERROR nova.compute.manager [instance: 75986322-d59d-411c-a60c-776415cdd6d1] File "/opt/stack/new/nova/nova/conductor/rpcapi.py", line 246, in object_action
2015-11-30 18:03:48.003 4993 ERROR nova.compute.manager [instance: 75986322-d59d-411c-a60c-776415cdd6d1] objmethod=objmethod, args=args, kwargs=kwargs)
2015-11-30 18:03:48.003 4993 ERROR nova.compute.manager [instance: 75986322-d59d-411c-a60c-776415cdd6d1] File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in call
2015-11-30 18:03:48.003 4993 ERROR nova.compute.manager [instance: 75986322-d59d-411c-a60c-776415cdd6d1] retry=self.retry)
2015-11-30 18:03:48.003 4993 ERROR nova.compute.manager [instance: 75986322-d59d-411c-a60c-776415cdd6d1] File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 90, in _send
2015-11-30 18:03:48.003 4993 ERROR nova.compute.manager [instance: 75986322-d59d-411c-a60c-776415cdd6d1] timeout=timeout, retry=retry)
2015-11-30 18:03:48.003 4993 ERROR nova.compute.manager [instance: 75986322-d59d-411c-a60c-776415cdd6d1] File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 464, in send
2015-11-30 18:03:48.003 4993 ERROR nova.compute.manager [instance: 75986322-d59d-411c-a60c-776415cdd6d1] retry=retry)
2015-11-30 18:03:48.003 4993 ERROR nova.compute.manager [instance: 75986322-d59d-411c-a60c-776415cdd6d1] File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 455, in _send
2015-11-30 18:03:48.003 4993 ERROR nova.compute.manager [instance: 75986322-d59d-411c-a60c-776415cdd6d1] raise result
2015-11-30 18:03:48.003 4993 ERROR nova.compute.manager [instance: 75986322-d59d-411c-a60c-776415cdd6d1] BDMNotFound_Remote: No Block Device Mapping with id 48.
2015-11-30 18:03:48.003 4993 ERROR nova.compute.manager [instance: 75986322-d59d-411c-a60c-776415cdd6d1] Traceback (most recent call last):
2015-11-30 18:03:48.003 4993 ERROR nova.compute.manager [instance: 75986322-d59d-411c-a60c-776415cdd6d1]
2015-11-30 18:03:48.003 4993 ERROR nova.compute.manager [instance: 75986322-d59d-411c-a60c-776415cdd6d1] File "/opt/stack/new/nova/nova/conductor/manager.py", line 85, in _object_dispatch
2015-11-30 18:03:48.003 4993 ERROR nova.compute.manager [instance: 75986322-d59d-411c-a60c-776415cdd6d1] return getattr(target, method)(*args, **kwargs)
2015-11-30 18:03:48.003 4993 ERROR nova.compute.manager [instance: 75986322-d59d-411c-a60c-776415cdd6d1]
2015-11-30 18:03:48.003 4993 ERROR nova.compute.manager [instance: 75986322-d59d-411c-a60c-776415cdd6d1] File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 222, in wrapper
2015-11-30 18:03:48.003 4993 ERROR nova.compute.manager [instance: 75986322-d59d-411c-a60c-776415cdd6d1] return fn(self, *args, **kwargs)
2015-11-30 18:03:48.003 4993 ERROR nova.compute.manager [instance: 75986322-d59d-411c-a60c-776415cdd6d1]
2015-11-30 18:03:48.003 4993 ERROR nova.compute.manager [instance: 75986322-d59d-411c-a60c-776415cdd6d1] File "/opt/stack/new/nova/nova/objects/block_device.py", line 183, in save
2015-11-30 18:03:48.003 4993 ERROR nova.compute.manager [instance: 75986322-d59d-411c-a60c-776415cdd6d1] raise exception.BDMNotFound(id=self.id)
2015-11-30 18:03:48.003 4993 ERROR nova.compute.manager [instance: 75986322-d59d-411c-a60c-776415cdd6d1]
2015-11-30 18:03:48.003 4993 ERROR nova.compute.manager [instance: 75986322-d59d-411c-a60c-776415cdd6d1] BDMNotFound: No Block Device Mapping with id 48.
2015-11-30 18:03:48.003 4993 ERROR nova.compute.manager [instance: 75986322-d59d-411c-a60c-776415cdd6d1]
2015-11-30 18:03:48.003 4993 ERROR nova.compute.manager [instance: 75986322-d59d-411c-a60c-776415cdd6d1]

This started spiking on 11/27. I suspect https://review.openstack.org/#/c/194063/ introduced the race, which was a 2nd attempt a change which was reverted previously because it introduced a race.

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message:%5C%22BDMNotFound:%20No%20Block%20Device%20Mapping%20with%20id%5C%22%20AND%20message:%5C%22_default_block_device_names%5C%22%20AND%20tags:%5C%22screen-n-cpu.txt%5C%22%20AND%20NOT%20build_queue:%5C%22experimental%5C%22

Tags: volumes
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/251543

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: New → In Progress
Matt Riedemann (mriedem)
Changed in nova:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/251543
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5f243d9d868c4079b7a75696e3dfe1008731e7c3
Submitter: Jenkins
Branch: master

commit 5f243d9d868c4079b7a75696e3dfe1008731e7c3
Author: Matt Riedemann <email address hidden>
Date: Mon Nov 30 21:18:06 2015 +0000

    Revert "Detach volume after deleting instance with no host"

    This reverts commit ecdf331bafddfd2bb8c92d3fd96f301bc7ac644f

    Looks like this introduced either the same race as before or
    a new race, but we're seeing BDMNotFound spiking in the gate
    since this was merged on 11/27, so revert and try again.

    Change-Id: Ibcbe35b5d329b183c4d0e8233e8ada26ebc512c2
    Closes-Bug: #1521340

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
Matt Riedemann (mriedem) wrote :

Looks like it might be something else, since we reverted that change yet we're still seeing this failure:

http://logs.openstack.org/43/195443/44/check/gate-tempest-dsvm-postgres-full/fb772cb/logs/screen-n-cpu.txt.gz?level=TRACE#_2015-12-01_06_31_48_170

Revision history for this message
Matt Riedemann (mriedem) wrote :

Nevermind comment 3, I think there was maybe just some timezone lag, it looks like the hits drop off in logstash after we reverted that change.

Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/nova 13.0.0.0b1

This issue was fixed in the openstack/nova 13.0.0.0b1 development milestone.

Changed in nova:
status: Fix Committed → Fix Released
Changed in nova:
milestone: none → mitaka-2
milestone: mitaka-2 → mitaka-1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.