OpenStack Compute (nova)

race conditions with server group scheduler policies

Bug #1423648 reported by Chris Friesen on 2015-02-19

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Confirmed	Low	Unassigned

Bug Description

In git commit a79ecbe Russel Bryant submitted a partial fix for a race condition when booting an instance as part of a server group with an "anti-affinity" scheduler policy.

That fix only solves part of the problem, however. There are a number of issues remaining:

1) It's possible to hit a similar race condition for server groups with the "affinity" policy. Suppose we create a new group and then create two instances simultaneously. The scheduler sees an empty group for each, assigns them to different compute nodes, and the policy is violated. We should add a check in _validate_instance_group_policy() to cover the "affinity" case.

2) It's possible to create two instances simultaneously, have them be scheduled to conflicting hosts, both of them detect the problem in _validate_instance_group_policy(), both of them get sent back for rescheduling, and both of them get assigned to conflicting hosts *again*, resulting in an error. In order to fix this I propose that instead of checking against all other instances in the group, we only check against instances that were created before the current instance.

Tags:

Davanum Srinivas (DIMS) (dims-v) on 2015-02-20

Changed in nova:
status:	New → Confirmed
importance:	Undecided → Low

Pawel Koniszewski (pawel-koniszewski) on 2015-02-23

Changed in nova:
assignee:	nobody → Pawel Koniszewski (pawel-koniszewski)

Chris Friesen (cbf123) on 2015-02-23

Changed in nova:
assignee:	Pawel Koniszewski (pawel-koniszewski) → Chris Friesen (cbf123)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-03-09: Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/162746

Changed in nova:
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-03-16:

Fix proposed to branch: master
Review: https://review.openstack.org/164762

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-03-31:

Fix proposed to branch: master
Review: https://review.openstack.org/169489

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-04-21: Fix merged to nova (master)

Reviewed: https://review.openstack.org/162746
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=36a703516251c7268ebceb414ed71e4cab4794b0
Submitter: Jenkins
Branch: master

commit 36a703516251c7268ebceb414ed71e4cab4794b0
Author: Chris Friesen <email address hidden>
Date: Mon Mar 16 09:35:16 2015 -0600

Validate server group affinity policy

    In git commit a79ecbe Russell Bryant submitted a partial fix for a race
    condition when booting an instance as part of a server group with an
    "anti-affinity" scheduler policy.

    It's possible to hit a similar race condition for server groups with
    the "affinity" policy. Suppose we create a new group and then create two
    instances simultaneously. The scheduler sees an empty group for each,
    assigns them to different compute nodes, and the policy is violated.

To guard against this, we extend _validate_instance_group_policy()
to cover the "affinity" case as well as "anti-affinity".

Partial-Bug: #1423648
Change-Id: Icf95390a128e2062293e1f5b7b78fe79747f5f27

OpenStack Infra (hudson-openstack) on 2015-04-21

Changed in nova:
assignee:	Chris Friesen (cbf123) → Jay Pipes (jaypipes)

Jay Pipes (jaypipes) on 2015-04-21

Changed in nova:
assignee:	Jay Pipes (jaypipes) → Chris Friesen (cbf123)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-08-12: Change abandoned on nova (master)

Change abandoned by Michael Still (<email address hidden>) on branch: master
Review: https://review.openstack.org/169489
Reason: This patch has been stalled for a long time, so I am abandoning it. Please feel free to restore it when the code is ready for review.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-08-12:

Change abandoned by Michael Still (<email address hidden>) on branch: master
Review: https://review.openstack.org/164762
Reason: This patch has been stalled for a long time, so I am abandoning it. Please feel free to restore it when the code is ready for review.

Davanum Srinivas (DIMS) (dims-v) on 2016-03-06

Changed in nova:
assignee:	Chris Friesen (cbf123) → nobody
status:	In Progress → Confirmed

Charlotte Han (hanrong) on 2016-06-18

Changed in nova:
assignee:	nobody → Charlotte Han (hanrong)

Revision history for this message

Miguel Alejandro Cantu (miguel-cantu) wrote on 2016-08-15:

Hi Charlotte,

Any updates on this change? I would be more than willing to help out with testing if need be.

-Alex

Charlotte Han (hanrong) on 2016-08-16

Changed in nova:
assignee:	Charlotte Han (hanrong) → nobody

Revision history for this message

Miguel Alejandro Cantu (miguel-cantu) wrote on 2016-08-16:

I'm not too familiar with the nova codebase, but I can learn ^.^.

I'll work off of suggestions made here :
https://review.openstack.org/#/c/164762/9

If anyone could point me in the right direction to some more useful information, that would be great.

Changed in nova:
assignee:	nobody → Miguel Alejandro Cantu (miguel-cantu)

Maciej Szankin (mszankin) on 2016-08-17

Changed in nova:
status:	Confirmed → In Progress

Revision history for this message

Sean Dague (sdague) wrote on 2017-06-23:

There are no currently open reviews on this bug, changing
the status back to the previous state and unassigning. If
there are active reviews related to this bug, please include
links in comments.

Changed in nova:
status:	In Progress → Confirmed
assignee:	Miguel Alejandro Cantu (miguel-cantu) → nobody

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.