Race condition adding a security group rule when another is in-progress

Bug #1393925 reported by Brian Haley
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Security Advisory
Won't Fix
Undecided
Unassigned
neutron
Fix Released
High
Brian Haley
Icehouse
Fix Released
Undecided
Unassigned
Juno
Fix Released
Undecided
Unassigned

Bug Description

I've come across a race condition where I sometimes see a security group rule is never added to iptables, if the OVS agent is in the middle of applying another security group rule when the RPC arrives.

Here's an example scenario:

nova boot --flavor 1 --image $nova_image dev_server1
sleep 4
neutron security-group-rule-create --direction ingress --protocol tcp --port_range_min 1111 --port_range_max 1111 default
neutron security-group-rule-create --direction ingress --protocol tcp --port_range_min 1112 --port_range_max 1112 default

Wait for VM to complete booting, then check iptables:

$ sudo iptables-save | grep 111
-A neutron-openvswi-i741ff910-1 -p tcp -m tcp --dport 1111 -j RETURN

The second rule is missing, and will only get added if you either add another rule, or restart the agent.

My config is just devstack, running with the latest openstack bits as of today. OVS agent w/vxlan and DVR enabled, nothing fancy.

I've been able to track this down to the following code (i'll attach the complete log as a file due to line wraps):

OVS agent receives RPC to setup port
    Port info is gathered for devices and filters for security groups are created
        Iptables "apply" is called
        New security group rule is added, triggering RPC message
        RPC received, and agent seems to add device to list that needs refresh

            Security group rule updated on remote: [u'5f0f5036-d14c-4b57-a855-ed39deaea256'] security_groups_rule_updated
            Security group rule updated [u'5f0f5036-d14c-4b57-a855-ed39deaea256']
            Adding [u'741ff910-12ba-4c1e-9dc9-38f7cbde0dc4'] devices to the list of devices for which firewall needs to be refreshed _security_group_updated

        Iptables "apply" is finished

rpc_loop() in OVS agent does not notice there is more work to do on next loop, so rule never gets added

At this point I'm thinking it could be that self.devices_to_refilter is modified in both _security_group_updated() and setup_port_filters() without any lock/semaphore, but the log doesn't explicity implicate it (perhaps we trust the timestamps too much?).

I will continue to investigate, but if someone has an "aha!" moment after reading this far please add a note.

A colleague here has also been able to duplicate this on his own devstack install, so it wasn't my fat-fingering that caused it.

Revision history for this message
Brian Haley (brian-haley) wrote :
Revision history for this message
Brian Haley (brian-haley) wrote :

Just a follow-up as to what happens when you add another SG rule, the orphaned one does get added:

$ sudo iptables-save | grep 111
-A neutron-openvswi-i741ff910-1 -p tcp -m tcp --dport 1111 -j RETURN

$ neutron security-group-rule-create --direction ingress --protocol tcp --port_range_min 1113 --port_range_max 1113 default

$ sudo iptables-save | grep 111
-A neutron-openvswi-i741ff910-1 -p tcp -m tcp --dport 1111 -j RETURN
-A neutron-openvswi-i741ff910-1 -p tcp -m tcp --dport 1113 -j RETURN
-A neutron-openvswi-i741ff910-1 -p tcp -m tcp --dport 1112 -j RETURN

Revision history for this message
Stephen Ma (stephen-ma) wrote :

What happened here is:

Two SG rules are added one after another. The rule for tcp port 1111 is added first. The rule for tcp port 1112 is added second. The OVS agent process the rules in setup_port_filters (called from OVSagent’s rpc_loop).
OVS agent's setup_port_filter has an optimization: When a new device is added, the rules are processed in at the start of the setup_port_filters routine by calling prepare_devices_filter. This works as expected if by the time prepare_devices_filter is processing the device for the first rule, the second SG rule has already been commited to the database. Both rules will be found in the iptables-save output.

However, if the second SG rule hasn’t been committed into the DB yet by the time prepare_devices_filter is processing the first rule on the new device, the second SG rule will not be implemented in the iptables. The security-group updated notification would have added the device to devices_to_refilter. But the setup_port_filters is running firewall refresh only on those updated devices that are not new devices. The result is that only the 1st rule is found in iptables for the new device.

Changed in neutron:
assignee: nobody → Brian Haley (brian-haley)
tags: added: sg-fw
Changed in neutron:
importance: Undecided → High
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/135667

Changed in neutron:
status: Confirmed → In Progress
Changed in neutron:
milestone: none → kilo-1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/135667
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=ee0e41d09d6ffc3cb7bf8406c8e8b009dfb92e17
Submitter: Jenkins
Branch: master

commit ee0e41d09d6ffc3cb7bf8406c8e8b009dfb92e17
Author: Brian Haley <email address hidden>
Date: Wed Nov 19 12:10:57 2014 -0500

    Fix a race condition adding a security group rule

    setup_port_filters() needs to grab self.devices_to_refilter
    before it calls prepare_devices_filter(), else it could skip
    processing a device if an RPC arrives while it's processing
    new devices. That device will now be handled the next time
    it's called.

    Bug introduced in commit 3046c4ae22b1

    Change-Id: Ib2f460cc095bbea8f9c767dcb9b4d4b66f1a7811
    Closes-Bug: 1393925

Changed in neutron:
status: In Progress → Fix Committed
Stephen Ma (stephen-ma)
tags: added: juno-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/juno)

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/142194

Thierry Carrez (ttx)
Changed in neutron:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/juno)

Reviewed: https://review.openstack.org/142194
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=b29760462ed2c4465e1e46f617b0d36f5dacb66f
Submitter: Jenkins
Branch: stable/juno

commit b29760462ed2c4465e1e46f617b0d36f5dacb66f
Author: Brian Haley <email address hidden>
Date: Wed Nov 19 12:10:57 2014 -0500

    Fix a race condition adding a security group rule

    setup_port_filters() needs to grab self.devices_to_refilter
    before it calls prepare_devices_filter(), else it could skip
    processing a device if an RPC arrives while it's processing
    new devices. That device will now be handled the next time
    it's called.

    Bug introduced in commit 3046c4ae22b1

    (Cherry-picked from ee0e41d09d6ffc3cb7bf8406c8e8b009dfb92e17)
    Change-Id: Ib2f460cc095bbea8f9c767dcb9b4d4b66f1a7811
    Closes-Bug: 1393925

tags: added: in-stable-juno
Changed in ossa:
status: New → Incomplete
information type: Public → Public Security
Revision history for this message
Tristan Cacqueray (tristan-cacqueray) wrote :

I've marked the OSSA tasks as incomplete, pending exploit-ability details of this bug.
According to our latest taxonomy (https://wiki.openstack.org/wiki/Vulnerability_Management#Incident_report_taxonomy) I propose a Class D.

Thought on potential attack vector ?

Revision history for this message
Thierry Carrez (ttx) wrote :

+1 for class D -- this is (was) an annoying bug with security consequences, but can't really see how it could be triggered by an attacker.

Revision history for this message
Jeremy Stanley (fungi) wrote :

Confirmed class D, security-related bug which is not an exploitable vulnerability.

information type: Public Security → Public
Changed in ossa:
status: Incomplete → Won't Fix
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/icehouse)

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/149833

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/icehouse)

Reviewed: https://review.openstack.org/149833
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=4076e7cae8a8883de64634ed2fa30fbe21e211e6
Submitter: Jenkins
Branch: stable/icehouse

commit 4076e7cae8a8883de64634ed2fa30fbe21e211e6
Author: Brian Haley <email address hidden>
Date: Wed Nov 19 12:10:57 2014 -0500

    Fix a race condition adding a security group rule

    setup_port_filters() needs to grab self.devices_to_refilter
    before it calls prepare_devices_filter(), else it could skip
    processing a device if an RPC arrives while it's processing
    new devices. That device will now be handled the next time
    it's called.

    Bug introduced in commit 3046c4ae22b1

    (Cherry-picked from ee0e41d09d6ffc3cb7bf8406c8e8b009dfb92e17)
    Change-Id: Ib2f460cc095bbea8f9c767dcb9b4d4b66f1a7811
    Closes-Bug: 1393925
    (cherry picked from commit b29760462ed2c4465e1e46f617b0d36f5dacb66f)

tags: added: in-stable-icehouse
Thierry Carrez (ttx)
Changed in neutron:
milestone: kilo-1 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.