Removing an instance while refreshing security groups is racy

Bug #733139 reported by Soren Hansen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Soren Hansen

Bug Description

There's a chance of a race when refreshing security groups while destroying an instance.

refresh_security_group_rules works roughly like so:

for instance in instances:
    with iptables_semaphore:
        remove_filters_for_instance(instance)
        add_filters_for_instance(instance)
iptables.apply()

unfilter_instance (which is called when terminating an instance) works roughly like so:
if instance in self.instances:
    del self.instances[instance]
    remove_filters_for_instance(instance)
    iptables.apply()

Most of the time, it works fine. However, if something is holding the
iptables semaphore, refresh_security_group_rules will have taken the
value from instances and is sitting around waiting for the semaphore to
be released. In the mean time, unfilter_instances may have gone and
removed the filters for the instance. Once the semaphore is released,
refresh_security_group_rules attempts to removes the filters for the
instance. This is essentially a no-op since they're already gone. Then
it adds the rules back. Unfortunately, this doesn't just clutter the
iptables rules, but it actually prevents any traffic from reaching the
next instance that gets the same IP, because there will be an iptables
rule sitting around matching the IP and applying the filters for the now
dead instance, but since the instance is dead, it has no security
groups, so the only filter in place is the fallback which says to drop
everything.

There's another race (this same thing could happen in between unfilter_instance checking for the instance in instances and its removing it from self.instances), but it doesn't apply to our threading model (context switches between greenthreads can't happen in that situation). I'll fix that in the same go anyway, just in case we ever go natively threaded.

Related branches

Soren Hansen (soren)
Changed in nova:
importance: Undecided → Medium
status: New → Triaged
assignee: nobody → Soren Hansen (soren)
status: Triaged → In Progress
Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → 2011.2
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.