Fuel for OpenStack

Deployment was stuck as one node was stuck on reboot

Bug #1438933 reported by Sergii Golovatiuk on 2015-03-31

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Fix Committed	High	Łukasz Oleś	Fuel for OpenStack 6.1

Bug Description

On large deployment installation we had a situation when one node was stuck on reboot (20 minutes)

root@node-16:~# uptime -s
2015-03-31 16:58:09

though in astute.log I see

2015-03-31T18:27:32 debug: [535] 135c09a6-b082-40c9-9eaf-1da3d3af4e22: MC agent 'puppetd', method 'enable', results: {:sender=>"25", :statuscode=>0, :statusmsg=>"OK", :data=>{:output=>"Already enabled"}}
2015-03-31T18:27:33 debug: [535] Retry #1 to run mcollective agent on nodes: '16'

which means the reboot was issues somewhere around 16:25-26

We should add tolerate functions like what we do for provisioning.

Tags:

Sergii Golovatiuk (sgolovatiuk) on 2015-03-31

Changed in fuel:
status:	New → Triaged
importance:	Undecided → High
assignee:	nobody → Łukasz Oleś (loles)
milestone:	none → 6.1

Revision history for this message

Łukasz Oleś (loles) wrote on 2015-03-31:

Deployment fails if pre_deployment_action fails on any node. It doesn't fail during pre_deploy action and during deploy. I will prepare a fix

Łukasz Oleś (loles) on 2015-04-01

Changed in fuel:
status:	Triaged → Won't Fix
status:	Won't Fix → In Progress

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-05-12:

@Lukasz, do you have an update or fix on review? Could you please link, if any WIP?

summary:

- Deployment was stuck as one one was stuck on reboot
+ Deployment was stuck as one node was stuck on reboot

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-05-12:

Correct me please, if I'm wrong, but this issue should be fixed in the scope of the https://blueprints.launchpad.net/fuel/+spec/200-nodes-support, hence superseded and won't fix

Changed in fuel:
status:	In Progress → Won't Fix

Revision history for this message

Łukasz Oleś (loles) wrote on 2015-05-12:

It should, but we missed pre deploy actions. I'm working on it

Changed in fuel:
status:	Won't Fix → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-05-14: Fix proposed to fuel-astute (master)

Fix proposed to branch: master
Review: https://review.openstack.org/183081

Vladimir Sharshov (vsharshov) on 2015-05-14

tags:

added: module-astute

OpenStack Infra (hudson-openstack) on 2015-05-16

Changed in fuel:
assignee:	Łukasz Oleś (loles) → Evgeniy L (rustyrobot)

Evgeniy L (rustyrobot) on 2015-05-16

Changed in fuel:
assignee:	Evgeniy L (rustyrobot) → Łukasz Oleś (loles)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-05-16: Fix merged to fuel-astute (master)

Reviewed: https://review.openstack.org/183081
Committed: https://git.openstack.org/cgit/stackforge/fuel-astute/commit/?id=b09729c64b695b2e6fcc88c31843321759ec45d5
Submitter: Jenkins
Branch: master

commit b09729c64b695b2e6fcc88c31843321759ec45d5
Author: Łukasz Oleś <email address hidden>
Date: Wed May 13 03:19:16 2015 +0200

Remove nodes which failed to provision

    Currently during provision some nodes may fail but provision
    will success. This failed nodes are causing pre deployment actions
    to fail.
    This change removes failed nodes from deployment info and from all tasks.
    It is safe to do because currently we allow only compute nodes to fail.

Change-Id: I5c3b677ca49ad9d2fd93a6ca1f524edc91e0766d
Closes-bug: #1438933

Changed in fuel:
status:	In Progress → Fix Committed

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Related blueprints

200 nodes support

Remote bug watches

Bug watches keep track of this bug in other bug trackers.