Ironic

Conductor shutdown always triggers deregistration

Bug #1418474 reported by Mark Goddard on 2015-02-05

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Ironic	Fix Released	Medium	Mark Goddard	Ironic 2015.1.0 "kilo"

Bug Description

When a conductor process is shutdown, it triggers the conductor to deregister itself from the conductor database. In a multi-conductor configuration, this causes a hash ring rebalance, and other conductor processes will take over ownership of any nodes previously assigned to the lost conductor. This process can require a fair amount of overhead, with the PXE driver requiring the PXE state to be configured on the new conductor. Worse yet, if the conductor restarts, another ring rebalance will occur, reverting to the initial state via another take over.

If the shutdown period is known in advance to be short, e.g. for an upgrade, it would be advantageous for the conductor to avoid a ring rebalance. This could be done by signalling to the conductor via some mechanism that it should not degregister itself from the conductor database, but should instead allow the registration to time out. If the conductor is restarted before the registration times out, no ring rebalances will occur.

The proposed trigger is to send SIGHUP to the conductor process.

Mark Goddard (mgoddard) on 2015-02-05

Changed in ironic:
assignee:	nobody → Mark Goddard (mgoddard)
status:	New → In Progress

Dmitry Tantsur (divius) on 2015-02-05

Changed in ironic:
importance:	Undecided → Medium

Revision history for this message

Mark Goddard (mgoddard) wrote on 2015-02-06:

Devananda rightly pointed out that SIGHUP is not the right to for the job.

The way I see it there are two main options:

1. A trigger that causes the process to shutdown without deregistering itself.
2. A trigger that causes the process to avoid deregistering itself when it is shutdown.

I favour the second approach, as it avoids giving a new purpose to an existing signal.

The mechanism for the trigger could be:

- A signal e.g. SIGUSR1/2.
- The existence of a file, possibly with some particular contents or name to ensure it is intended for that process.
- An API call.

The simplest option is the first, and I think think has some merit. It's main drawback is the lack of available signals, which might be reissued for other purposes in future.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-02-13: Fix proposed to ironic (master)

Fix proposed to branch: master
Review: https://review.openstack.org/155785

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-02-19: Fix merged to ironic (master)

Reviewed: https://review.openstack.org/155785
Committed: https://git.openstack.org/cgit/openstack/ironic/commit/?id=ddc8d312e10342faa2518415d00ed9cbf60b372d
Submitter: Jenkins
Branch: master

commit ddc8d312e10342faa2518415d00ed9cbf60b372d
Author: Mark Goddard <email address hidden>
Date: Thu Feb 5 02:07:42 2015 +0000

Avoid deregistering conductor following SIGUSR1

    Allow the conductor to avoid deregistering itself on shutdown, after
    receiving a SIGUSR1 signal. The registration will time out after a
    period defined by the conductor.heartbeat_timeout configuration setting
    (defaults to 60 seconds). If the conductor is restarted within this
    period, the unnecessary thrash caused by two ring rebalances will be
    avoided. This is useful in situations where the downtime is negligible,
    such as an upgrade.

    DocImpact
    Closes-bug: #1418474
    Change-Id: Ie40a7f878c2845dc9cb8fc8082df5d88adb28d0b

Changed in ironic:
status:	In Progress → Fix Committed

Thierry Carrez (ttx) on 2015-03-19

Changed in ironic:
milestone:	none → kilo-3
status:	Fix Committed → Fix Released

Thierry Carrez (ttx) on 2015-04-30

Changed in ironic:
milestone:	kilo-3 → 2015.1.0

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.