OpenStack Compute (nova)

Evacuate Fails 'Invalid state of instance files' using Ceph Ephemeral RBD

Bug #1340411 reported by hifieli on 2014-07-10

This bug affects 15 people

	Status	Importance	Assigned to	Milestone
OpenStack Compute (nova)	Fix Released	Medium	Feilong Wang	OpenStack Compute (nova) 2015.1.0 "kilo"
Icehouse	Fix Released	Undecided	Unassigned	OpenStack Compute (nova) 2014.1.4
Juno	Fix Released	Undecided	Unassigned	OpenStack Compute (nova) 2014.2.1

Bug Description

Greetings,

We can't seem to be able to evacuate instances from a failed compute node using shared storage. We are using Ceph Ephemeral RBD as the storage medium.

Steps to reproduce:

nova evacuate --on-shared-storage 6e2081ec-2723-43c7-a730-488bb863674c node-24
or
POST to http://ip-address:port/v2/tenant_id/servers/server_id/action with
{"evacuate":{"host":"node-24","onSharedStorage":1}}

Here is what shows up in the logs:

180>Jul 10 20:36:48 node-24 nova-nova.compute.manager AUDIT: Rebuilding instance
<179>Jul 10 20:36:48 node-24 nova-nova.compute.manager ERROR: Setting instance vm_state to ERROR
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 5554, in _error_out_instance_on_exception
    yield
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2434, in rebuild_instance
    _("Invalid state of instance files on shared"
InvalidSharedStorage: Invalid state of instance files on shared storage
<179>Jul 10 20:36:49 node-24 nova-oslo.messaging.rpc.dispatcher ERROR: Exception during message handling: Invalid state of instance files on shared storage
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 133, in _dispatch_and_reply
    incoming.message))
  File "/usr/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 176, in _dispatch
    return self._do_dispatch(endpoint, method, ctxt, args)
  File "/usr/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 122, in _do_dispatch
    result = getattr(endpoint, method)(ctxt, **new_args)
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 393, in decorated_function
    return function(self, context, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/oslo/messaging/rpc/server.py", line 139, in inner
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 88, in wrapped
    payload)
  File "/usr/lib/python2.7/dist-packages/nova/openstack/common/excutils.py", line 68, in __exit__
    six.reraise(self.type_, self.value, self.tb)
  File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 71, in wrapped
    return f(self, context, *args, **kw)
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 274, in decorated_function
    pass
  File "/usr/lib/python2.7/dist-packages/nova/openstack/common/excutils.py", line 68, in __exit__
    six.reraise(self.type_, self.value, self.tb)
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 260, in decorated_function
    return function(self, context, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 327, in decorated_function
    function(self, context, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 303, in decorated_function
    e, sys.exc_info())
  File "/usr/lib/python2.7/dist-packages/nova/openstack/common/excutils.py", line 68, in __exit__
    six.reraise(self.type_, self.value, self.tb)
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 290, in decorated_function
    return function(self, context, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2434, in rebuild_instance
    _("Invalid state of instance files on shared"
InvalidSharedStorage: Invalid state of instance files on shared storage

Tags:

Revision history for this message

Tyler Wilson (loth) wrote on 2014-07-10:

Was able to complete a workaround by

1. Edit nova.instances and replace all references of old node to destination node
2. reset-status of instance to active
3. Issue a hard-reboot to the instance

This will re-create the xml and console log on the destination node and boot the instance using the existing Ceph RBD

Revision history for this message

Dmitry Borodaenko (angdraug) wrote on 2014-07-11:

May be related to https://bugs.launchpad.net/nova/+bug/1250751

Sean Dague (sdague) on 2014-09-11

Changed in nova:
status:	New → Confirmed
importance:	Undecided → Low

Feilong Wang (flwang) on 2014-09-15

Changed in nova:
assignee:	nobody → Fei Long Wang (flwang)

Revision history for this message

Feilong Wang (flwang) wrote on 2014-09-15:

hifieli and Tyler, I doubt it's a configuration issue, can you add the nova instance patch to CephFS and try again, you can follow below document. Cheers.

http://www.ibm.com/developerworks/cloud/library/cl-openstackceph/

Revision history for this message

Feilong Wang (flwang) wrote on 2014-09-15:

Meanwhile, I will investigate if we can improve the check to cover the case without CephFS.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-16: Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/121745

Changed in nova:
status:	Confirmed → In Progress

Revision history for this message

Dmitry Borodaenko (angdraug) wrote on 2014-09-16:

Looks like a duplicate of bug #1249319.

Dan Smith (danms) on 2014-09-25

Changed in nova:
importance:	Low → Medium

Matt Riedemann (mriedem) on 2014-10-09

tags:

added: juno-backport-potential

Ante Karamatić (ivoks) on 2014-10-15

tags:

added: cts

Revision history for this message

Nobuyoshi NIHONGI (nihongi) wrote on 2014-10-15:

I confirmed that the patch also fixes bug #1372472.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-10-29: Fix proposed to nova (stable/juno)

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/131613

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-10-29: Fix merged to nova (master)

Reviewed: https://review.openstack.org/121745
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=91d3272b975572d9866b7d959547e438142dc4fb
Submitter: Jenkins
Branch: master

commit 91d3272b975572d9866b7d959547e438142dc4fb
Author: Fei Long Wang <email address hidden>
Date: Tue Sep 16 15:43:37 2014 +1200

Fix nova evacuate issues for RBD

For RBD scenario, there are some issues in Nova code
now against evacuate function:

    1. Based on current implementation, nova evacuate and
    nova rebuild are sharing some code. When user enables
    the on_shared_storage option for nova evacuate, nova
    will check if the instance path is accessible. For
    the RBD scenario, the volume(block) is shared between
    different hosts, though the path isn't shared at the
    filesystem level. This patch fixes this issue and adds
    test cases for that.

    2. Missing the 'recreate' parameter for rebuild method.
    Though the libvirt driver doesn't implement rebuild
    method(only Ironic driver implements it), but we really
    need to set 'recreate' in kwargs so it gets passed to
    _rebuild_default_impl so we don't call driver.destroy
    on evacuate for shared filesystem/block storage cases.
    It is fixed in this patch and test case is added as well.

Closes-Bug: 1249319
Closes-Bug: 1340411

Change-Id: Idc8c45b055e986cf85730235d5d25777632ad1c1

Changed in nova:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-10-29: Fix proposed to nova (stable/icehouse)

#10

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/131629

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-11-14: Fix merged to nova (stable/juno)

#11

Reviewed: https://review.openstack.org/131613
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7920cfdab2fb10e01544eeb713a1e3bc79bc4996
Submitter: Jenkins
Branch: stable/juno

commit 7920cfdab2fb10e01544eeb713a1e3bc79bc4996
Author: Fei Long Wang <email address hidden>
Date: Tue Sep 16 15:43:37 2014 +1200

Fix nova evacuate issues for RBD

For RBD scenario, there are some issues in Nova code
now against evacuate function:

Closes-Bug: 1249319
Closes-Bug: 1340411

Change-Id: Idc8c45b055e986cf85730235d5d25777632ad1c1
(cherry picked from commit 91d3272b975572d9866b7d959547e438142dc4fb)

tags:

added: in-stable-juno

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-11-23: Fix merged to nova (stable/icehouse)

#12

Reviewed: https://review.openstack.org/131629
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3de3f1066fa47312b8c3075abf790631034d67a3
Submitter: Jenkins
Branch: stable/icehouse

commit 3de3f1066fa47312b8c3075abf790631034d67a3
Author: Fei Long Wang <email address hidden>
Date: Tue Sep 16 15:43:37 2014 +1200

Fix nova evacuate issues for RBD

For RBD scenario, there are some issues in Nova code
now against evacuate function:

Closes-Bug: 1249319
Closes-Bug: 1340411

    Conflicts:
            nova/tests/compute/test_compute_mgr.py
            nova/tests/virt/libvirt/test_libvirt.py
            nova/virt/libvirt/driver.py

    Change-Id: Idc8c45b055e986cf85730235d5d25777632ad1c1
    (cherry picked from commit 91d3272b975572d9866b7d959547e438142dc4fb)
    (cherry picked from commit 7920cfdab2fb10e01544eeb713a1e3bc79bc4996)

tags:

added: in-stable-icehouse

Yaguang Tang (heut2008) on 2014-12-09

tags:

removed: juno-backport-potential

Thierry Carrez (ttx) on 2014-12-18

Changed in nova:
milestone:	none → kilo-1
status:	Fix Committed → Fix Released

Thierry Carrez (ttx) on 2015-04-30

Changed in nova:
milestone:	kilo-1 → 2015.1.0

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

Bug #1372472

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.