ssh driver aborts if a node that was requested to be powered down is already powered down

Bug #1397834 reported by James Polley
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ironic
Fix Released
Medium
Michael Kerrin

Bug Description

Looking at drivers/modules/ssh.py, it seems as though the intended design is that we attempt to power-off a node, then check to see if it's powered off, and then return success/fail

I seem to be hitting a case where the initial check says that the VM is powered off; but a subsequent attempt to power off fails because it's already off. I'm not certain whether this is a race condition, or if the initial check was wrong; but either way, the result is the same: the system is in the desired state, but we throw an unhanded exception.

I think it would make more sense to catch this exception and check again to see if the system is now powered off. If it is, there's no need to let the error bubble up - the system is now in the desired state, so let's go. Of course, if the problem is that the initial check didn't catch the fact that it's powered off, presumably the second check wouldn't catch it either..

Nov 25 15:43:17 undercloud-undercloud-gxpandxta2cg ironic-conductor: 2014-11-25 15:43:17.898 5904 WARNING ironic.conductor.manager [-] Error in tear_down of node bd44c80e-f7b3-497f-8417-84fd5cd7e9a1: Failed to execute command via SSH: LC_ALL=C /usr/bin/virsh --connect qemu:///system destroy baremetal_1.
Nov 25 15:43:17 undercloud-undercloud-gxpandxta2cg ironic-conductor: Traceback (most recent call last):
Nov 25 15:43:17 undercloud-undercloud-gxpandxta2cg ironic-conductor: File "/opt/stack/venvs/ironic/local/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 455, in fire_timers
Nov 25 15:43:17 undercloud-undercloud-gxpandxta2cg ironic-conductor: timer()
Nov 25 15:43:17 undercloud-undercloud-gxpandxta2cg ironic-conductor: File "/opt/stack/venvs/ironic/local/lib/python2.7/site-packages/eventlet/hubs/timer.py", line 58, in __call__
Nov 25 15:43:17 undercloud-undercloud-gxpandxta2cg ironic-conductor: cb(*args, **kw)
Nov 25 15:43:17 undercloud-undercloud-gxpandxta2cg ironic-conductor: File "/opt/stack/venvs/ironic/local/lib/python2.7/site-packages/eventlet/greenthread.py", line 212, in main
Nov 25 15:43:17 undercloud-undercloud-gxpandxta2cg ironic-conductor: result = function(*args, **kwargs)
Nov 25 15:43:17 undercloud-undercloud-gxpandxta2cg ironic-conductor: File "/opt/stack/venvs/ironic/local/lib/python2.7/site-packages/ironic/conductor/manager.py", line 808, in _do_node_tear_down
Nov 25 15:43:17 undercloud-undercloud-gxpandxta2cg ironic-conductor: node.target_provision_state = states.NOSTATE
Nov 25 15:43:17 undercloud-undercloud-gxpandxta2cg ironic-conductor: File "/opt/stack/venvs/ironic/local/lib/python2.7/site-packages/oslo/utils/excutils.py", line 82, in __exit__
Nov 25 15:43:17 undercloud-undercloud-gxpandxta2cg ironic-conductor: six.reraise(self.type_, self.value, self.tb)
Nov 25 15:43:17 undercloud-undercloud-gxpandxta2cg ironic-conductor: File "/opt/stack/venvs/ironic/local/lib/python2.7/site-packages/ironic/conductor/manager.py", line 800, in _do_node_tear_down
Nov 25 15:43:17 undercloud-undercloud-gxpandxta2cg ironic-conductor: new_state = task.driver.deploy.tear_down(task)
Nov 25 15:43:17 undercloud-undercloud-gxpandxta2cg ironic-conductor: File "/opt/stack/venvs/ironic/local/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 129, in wrapper
Nov 25 15:43:17 undercloud-undercloud-gxpandxta2cg ironic-conductor: return f(*args, **kwargs)
Nov 25 15:43:17 undercloud-undercloud-gxpandxta2cg ironic-conductor: File "/opt/stack/venvs/ironic/local/lib/python2.7/site-packages/ironic/drivers/modules/pxe.py", line 364, in tear_down
Nov 25 15:43:17 undercloud-undercloud-gxpandxta2cg ironic-conductor: manager_utils.node_power_action(task, states.POWER_OFF)
Nov 25 15:43:17 undercloud-undercloud-gxpandxta2cg ironic-conductor: File "/opt/stack/venvs/ironic/local/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 129, in wrapper
Nov 25 15:43:17 undercloud-undercloud-gxpandxta2cg ironic-conductor: return f(*args, **kwargs)
Nov 25 15:43:17 undercloud-undercloud-gxpandxta2cg ironic-conductor: File "/opt/stack/venvs/ironic/local/lib/python2.7/site-packages/ironic/conductor/utils.py", line 118, in node_power_action
Nov 25 15:43:17 undercloud-undercloud-gxpandxta2cg ironic-conductor: "Error: %(error)s") % {'target': target_state, 'error': e}
Nov 25 15:43:17 undercloud-undercloud-gxpandxta2cg ironic-conductor: File "/opt/stack/venvs/ironic/local/lib/python2.7/site-packages/oslo/utils/excutils.py", line 82, in __exit__
Nov 25 15:43:17 undercloud-undercloud-gxpandxta2cg ironic-conductor: six.reraise(self.type_, self.value, self.tb)
Nov 25 15:43:17 undercloud-undercloud-gxpandxta2cg ironic-conductor: File "/opt/stack/venvs/ironic/local/lib/python2.7/site-packages/ironic/conductor/utils.py", line 111, in node_power_action
Nov 25 15:43:17 undercloud-undercloud-gxpandxta2cg ironic-conductor: task.driver.power.set_power_state(task, new_state)
Nov 25 15:43:17 undercloud-undercloud-gxpandxta2cg ironic-conductor: File "/opt/stack/venvs/ironic/local/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 129, in wrapper
Nov 25 15:43:17 undercloud-undercloud-gxpandxta2cg ironic-conductor: return f(*args, **kwargs)
Nov 25 15:43:17 undercloud-undercloud-gxpandxta2cg ironic-conductor: File "/opt/stack/venvs/ironic/local/lib/python2.7/site-packages/ironic/drivers/modules/ssh.py", line 548, in set_power_state
Nov 25 15:43:17 undercloud-undercloud-gxpandxta2cg ironic-conductor: state = _power_off(ssh_obj, driver_info)
Nov 25 15:43:17 undercloud-undercloud-gxpandxta2cg ironic-conductor: File "/opt/stack/venvs/ironic/local/lib/python2.7/site-packages/ironic/drivers/modules/ssh.py", line 464, in _power_off
Nov 25 15:43:17 undercloud-undercloud-gxpandxta2cg ironic-conductor: _ssh_execute(ssh_obj, cmd_to_power_off)
Nov 25 15:43:17 undercloud-undercloud-gxpandxta2cg ironic-conductor: File "/opt/stack/venvs/ironic/local/lib/python2.7/site-packages/ironic/drivers/modules/ssh.py", line 270, in _ssh_execute
Nov 25 15:43:17 undercloud-undercloud-gxpandxta2cg ironic-conductor: raise exception.SSHCommandFailed(cmd=cmd_to_exec)
Nov 25 15:43:17 undercloud-undercloud-gxpandxta2cg ironic-conductor: SSHCommandFailed: Failed to execute command via SSH: LC_ALL=C /usr/bin/virsh --connect qemu:///system destroy baremetal_1.

James Polley (tchaypo)
summary: - virsh list_all
+ ssh driver aborts if a node that was requested to be powered down is
+ already powered down
Dmitry Tantsur (divius)
Changed in ironic:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ironic (master)

Fix proposed to branch: master
Review: https://review.openstack.org/147832

Changed in ironic:
assignee: nobody → Michael Kerrin (michael-kerrin-w)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ironic (master)

Reviewed: https://review.openstack.org/147832
Committed: https://git.openstack.org/cgit/openstack/ironic/commit/?id=b19b7b7230792809a5b34b079ad4feaeb25e08b1
Submitter: Jenkins
Branch: master

commit b19b7b7230792809a5b34b079ad4feaeb25e08b1
Author: Michael Kerrin <email address hidden>
Date: Fri Jan 16 13:00:24 2015 +0000

    Fix ssh _get_power_status as it returned status for wrong node

    _get_power_status was checking if the node name was in
    the list of returned running node names. But it was
    matching baremetal_1 against baremetal_11 and returning an
    incorrect status.

    The list of running nodes contains node name quoted with a
    double quote. It can also contain other information. So we
    have to checking that we are matching against the correct
    node name.

    Closes-bug: #1397834
    Change-Id: I2a6d32a212a7a6bbc9c2c788d989386a38173c28

Changed in ironic:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in ironic:
milestone: none → kilo-2
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in ironic:
milestone: kilo-2 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.