In git commit a79ecbe Russel Bryant submitted a partial fix for a race condition when booting an instance as part of a server group with an "anti-affinity" scheduler policy.
That fix only solves part of the problem, however. There are a number of issues remaining:
1) It's possible to hit a similar race condition for server groups with the "affinity" policy. Suppose we create a new group and then create two instances simultaneously. The scheduler sees an empty group for each, assigns them to different compute nodes, and the policy is violated. We should add a check in _validate_instance_group_policy() to cover the "affinity" case.
2) It's possible to create two instances simultaneously, have them be scheduled to conflicting hosts, both of them detect the problem in _validate_instance_group_policy(), both of them get sent back for rescheduling, and both of them get assigned to conflicting hosts *again*, resulting in an error. In order to fix this I propose that instead of checking against all other instances in the group, we only check against instances that were created before the current instance.
Fix proposed to branch: master /review. openstack. org/162746
Review: https:/