OpenStack Heat

database dying can result in FAILED stacks with IN_PROGRESS resources

Bug #1561214 reported by Steve Baker on 2016-03-23

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Heat	Fix Released	Medium	Thomas Herve	OpenStack Heat newton-1 "n1"

Bug Description

Steps to Reproduce:
1. Deploy overcloud, mariadb runs out of file descriptors which causes the deployment to fail, and leaves heat in a bad state.

Running out of file descriptors will be difficult to reproduce. This particular state can be replicated by setting some resources to IN_PROGRESS while their stacks are in an UPDATE_FAILED state.

I'm suggesting a heat-manage command which acts on a single stack and traverses all nested stacks to put any IN_PROGRESS things to FAILED, and clear hooks.

Revision history for this message

Steven Hardy (shardy) wrote on 2016-03-29:

Is there any less destructive way we can handle this, as all FAILED resources will be replaced, even if they are OK?

I'm thinking something which uses similar logic to stack-check so that it actually observes state rather than unconditionally replacing everything - possibly not enough state to do that safely tho I guess.

Thomas Herve (therve) on 2016-04-13

Changed in heat:
assignee:	nobody → Thomas Herve (therve)
milestone:	none → newton-1

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-04-13: Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/305306

Changed in heat:
status:	New → In Progress

Thomas Herve (therve) on 2016-04-15

Changed in heat:
importance:	Undecided → Medium

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-04-25: Fix merged to heat (master)

Reviewed: https://review.openstack.org/305306
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=7977f9f2f324f8916433763132c6ee67213e6ed1
Submitter: Jenkins
Branch: master

commit 7977f9f2f324f8916433763132c6ee67213e6ed1
Author: Thomas Herve <email address hidden>
Date: Wed Apr 13 14:38:59 2016 +0200

Add command to reset one stack status

    Adds a new heat-manage reset_stack_status to recover from specific
    crashes that leaves resources in progress. It removes resource hooks and
    stack locks as well.

Closes-Bug: #1561214
Change-Id: I70fa5857c959bc5f1424d562ff8b7740331b5328

Changed in heat:
status:	In Progress → Fix Released

Revision history for this message

Doug Hellmann (doug-hellmann) wrote on 2016-06-02: Fix included in openstack/heat 7.0.0.0b1

This issue was fixed in the openstack/heat 7.0.0.0b1 development milestone.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.