2015-05-08 20:21:35 |
Brian Haley |
bug |
|
|
added bug |
2015-05-08 20:21:35 |
Brian Haley |
attachment added |
|
Script to add 1000 security group rules https://bugs.launchpad.net/bugs/1453264/+attachment/4393817/+files/big-sec-rules.sh |
|
2015-05-08 20:21:56 |
Brian Haley |
neutron: assignee |
|
Brian Haley (brian-haley) |
|
2015-05-19 23:45:02 |
OpenStack Infra |
neutron: status |
New |
In Progress |
|
2015-05-19 23:45:02 |
OpenStack Infra |
neutron: assignee |
Brian Haley (brian-haley) |
Kevin Benton (kevinbenton) |
|
2015-05-22 18:17:16 |
OpenStack Infra |
neutron: assignee |
Kevin Benton (kevinbenton) |
Brian Haley (brian-haley) |
|
2015-05-26 07:59:14 |
OpenStack Infra |
neutron: assignee |
Brian Haley (brian-haley) |
Kevin Benton (kevinbenton) |
|
2015-05-28 22:58:28 |
OpenStack Infra |
neutron: status |
In Progress |
Fix Committed |
|
2015-06-24 20:18:35 |
Thierry Carrez |
neutron: status |
Fix Committed |
Fix Released |
|
2015-06-24 20:18:35 |
Thierry Carrez |
neutron: milestone |
|
liberty-1 |
|
2015-06-30 02:35:07 |
OpenStack Infra |
tags |
|
in-feature-pecan |
|
2015-06-30 02:35:09 |
OpenStack Infra |
bug watch added |
|
http://bugs.python.org/issue21239 |
|
2015-09-19 20:06:10 |
OpenStack Infra |
tags |
in-feature-pecan |
in-feature-pecan in-stable-kilo |
|
2015-10-11 18:31:50 |
Chuck Short |
nominated for series |
|
neutron/kilo |
|
2015-10-11 18:31:50 |
Chuck Short |
bug task added |
|
neutron/kilo |
|
2015-10-11 18:32:00 |
Chuck Short |
neutron/kilo: status |
New |
Fix Committed |
|
2015-10-11 18:32:04 |
Chuck Short |
neutron/kilo: milestone |
|
2015.1.2 |
|
2015-10-13 19:25:41 |
Chuck Short |
neutron/kilo: status |
Fix Committed |
Fix Released |
|
2015-10-15 12:18:32 |
Thierry Carrez |
neutron: milestone |
liberty-1 |
7.0.0 |
|
2015-11-12 15:00:13 |
Evan Stoner |
bug |
|
|
added subscriber Evan Stoner |
2016-08-29 22:11:15 |
Billy Olsen |
description |
We have customers that typically add a few hundred security group rules or more. We also typically run 30+ VMs per compute node. When about 10+ VMs with a large SG set all get scheduled to the same node, the L2 agent (OVS) can spend many minutes in the iptables_manager.apply() code, so much so that by the time all the rules are updated, the VM has already tried DHCP and failed, leaving it in an unusable state.
While there have been some patches that tried to address this in Juno and Kilo, they've either not helped as much as necessary, or broken SGs completely due to re-ordering the of the iptables rules.
I've been able to show some pretty bad scaling with just a handful of VMs running in devstack based on today's code (May 8th, 2015) from upstream Openstack.
Here's what I tested:
1. I created a security group with 1000 TCP port rules (you could alternately have a smaller number of rules and more VMs, but it's quicker this way)
2. I booted VMs, specifying both the default and "large" SGs, and timed from the second it took Neutron to "learn" about the port until it completed it's work
3. I got a :( pretty quickly
And here's some data:
1-3 VM - didn't time, less than 20 seconds
4th VM - 0:36
5th VM - 0:53
6th VM - 1:11
7th VM - 1:25
8th VM - 1:48
9th VM - 2:14
While it's busy adding the rules, the OVS agent is consuming pretty close to 100% of a CPU for most of this time (from top):
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
25767 stack 20 0 157936 76572 4416 R 89.2 0.5 50:14.28 python
And this is with only ~10K rules at this point! When we start crossing the 20K point VM boot failures start to happen.
I'm filing this bug since we need to take a closer look at this in Liberty and fix it, it's been this way since Havana and needs some TLC.
I've attached a simple script I've used to recreate this, and will start taking a look at options here. |
[Impact]
We have customers that typically add a few hundred security group rules or more. We also typically run 30+ VMs per compute node. When about 10+ VMs with a large SG set all get scheduled to the same node, the L2 agent (OVS) can spend many minutes in the iptables_manager.apply() code, so much so that by the time all the rules are updated, the VM has already tried DHCP and failed, leaving it in an unusable state.
While there have been some patches that tried to address this in Juno and Kilo, they've either not helped as much as necessary, or broken SGs completely due to re-ordering the of the iptables rules.
I've been able to show some pretty bad scaling with just a handful of VMs running in devstack based on today's code (May 8th, 2015) from upstream Openstack.
[Test Case]
Here's what I tested:
1. I created a security group with 1000 TCP port rules (you could alternately have a smaller number of rules and more VMs, but it's quicker this way)
2. I booted VMs, specifying both the default and "large" SGs, and timed from the second it took Neutron to "learn" about the port until it completed it's work
3. I got a :( pretty quickly
And here's some data:
1-3 VM - didn't time, less than 20 seconds
4th VM - 0:36
5th VM - 0:53
6th VM - 1:11
7th VM - 1:25
8th VM - 1:48
9th VM - 2:14
While it's busy adding the rules, the OVS agent is consuming pretty close to 100% of a CPU for most of this time (from top):
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
25767 stack 20 0 157936 76572 4416 R 89.2 0.5 50:14.28 python
And this is with only ~10K rules at this point! When we start crossing the 20K point VM boot failures start to happen.
I'm filing this bug since we need to take a closer look at this in Liberty and fix it, it's been this way since Havana and needs some TLC.
I've attached a simple script I've used to recreate this, and will start taking a look at options here.
[Regression Potential]
Minimal since this has been running in upstream stable for several releases now (Kilo, Liberty, Mitaka). |
|
2016-08-29 22:11:52 |
Billy Olsen |
bug task added |
|
neutron (Ubuntu) |
|
2016-08-29 22:13:26 |
Billy Olsen |
attachment added |
|
trusty patch based on -proposed https://bugs.launchpad.net/ubuntu/+source/neutron/+bug/1453264/+attachment/4730270/+files/lp1453264.debdiff |
|
2016-08-29 22:14:39 |
Billy Olsen |
bug |
|
|
added subscriber Ubuntu Sponsors Team |
2016-08-29 22:16:58 |
Billy Olsen |
bug task added |
|
cloud-archive |
|
2016-08-29 22:17:10 |
Billy Olsen |
nominated for series |
|
cloud-archive/icehouse |
|
2016-08-30 16:44:16 |
Mathew Hodson |
tags |
in-feature-pecan in-stable-kilo |
in-feature-pecan in-stable-kilo trusty |
|
2016-08-30 16:48:16 |
Mathew Hodson |
bug watch removed |
http://bugs.python.org/issue21239 |
|
|
2016-08-30 16:48:47 |
Mathew Hodson |
neutron (Ubuntu): importance |
Undecided |
Medium |
|
2016-08-31 01:51:28 |
Yoshi Kadokawa |
bug |
|
|
added subscriber Yoshi Kadokawa |
2016-08-31 12:11:14 |
Corey Bryant |
bug task added |
|
cloud-archive/icehouse |
|
2016-08-31 12:45:20 |
Launchpad Janitor |
branch linked |
|
lp:~ubuntu-server-dev/neutron/icehouse |
|
2016-08-31 12:54:36 |
Corey Bryant |
cloud-archive/icehouse: status |
New |
Fix Committed |
|
2016-08-31 12:55:53 |
Corey Bryant |
nominated for series |
|
Ubuntu Trusty |
|
2016-08-31 12:55:53 |
Corey Bryant |
bug task added |
|
neutron (Ubuntu Trusty) |
|
2016-08-31 12:56:18 |
Corey Bryant |
neutron (Ubuntu): status |
New |
Invalid |
|
2016-08-31 12:56:25 |
Corey Bryant |
neutron (Ubuntu Trusty): status |
New |
Fix Committed |
|
2016-08-31 12:56:31 |
Corey Bryant |
cloud-archive: status |
New |
Invalid |
|
2016-08-31 12:56:41 |
Corey Bryant |
summary |
iptables_manager can run very slowly when a large number of security group rules are present |
[SRU] iptables_manager can run very slowly when a large number of security group rules are present |
|
2016-08-31 20:18:33 |
Mathew Hodson |
neutron (Ubuntu Trusty): importance |
Undecided |
Medium |
|
2016-08-31 20:22:50 |
Mathew Hodson |
neutron (Ubuntu): status |
Invalid |
Fix Released |
|
2016-08-31 20:23:21 |
Mathew Hodson |
cloud-archive: status |
Invalid |
Fix Released |
|
2016-08-31 20:25:07 |
Mathew Hodson |
neutron (Ubuntu Trusty): status |
Fix Committed |
In Progress |
|
2016-09-06 12:41:44 |
Martin Pitt |
neutron (Ubuntu Trusty): status |
In Progress |
Fix Committed |
|
2016-09-06 12:41:48 |
Martin Pitt |
bug |
|
|
added subscriber Ubuntu Stable Release Updates Team |
2016-09-06 12:41:52 |
Martin Pitt |
bug |
|
|
added subscriber SRU Verification |
2016-09-06 12:42:00 |
Martin Pitt |
tags |
in-feature-pecan in-stable-kilo trusty |
in-feature-pecan in-stable-kilo trusty verification-needed |
|
2016-09-06 12:42:15 |
Martin Pitt |
removed subscriber Ubuntu Sponsors Team |
|
|
|
2016-09-07 16:42:43 |
Billy Olsen |
tags |
in-feature-pecan in-stable-kilo trusty verification-needed |
in-feature-pecan in-stable-kilo trusty verification-done |
|
2016-09-07 16:43:02 |
Billy Olsen |
tags |
in-feature-pecan in-stable-kilo trusty verification-done |
in-feature-pecan in-stable-kilo verification-done |
|
2016-09-14 11:57:45 |
Martin Pitt |
removed subscriber Ubuntu Stable Release Updates Team |
|
|
|
2016-09-14 12:02:01 |
Launchpad Janitor |
neutron (Ubuntu Trusty): status |
Fix Committed |
Fix Released |
|