Description
===========
When launching many instances (100 to 200) in parallel on a physical small-medium scale cluster (20-30 compute nodes), Nova sees timeouts from Neutron and fails to clean the ports. This leads to Nova instances having multiple ports instead of one.
A similar issue [1] was reported long time ago but seems like the fix [2] from Nova was never added.
[1] https://bugs.launchpad.net/neutron/+bug/1160442
[2] https://bugs.launchpad.net/neutron/+bug/1160442/comments/28
Steps to reproduce
==================
Launch 200+ VMs in parallel.
Expected result
===============
All instances should have port each
Actual result
=============
Randomly some instances have multiple ports allocated.
Environment
===========
3 Controllers, 20 computes
20 computes are not co-located, means there is some latency between the controllers and computes as well as computes and computes.
KVM hypervisor
Nova versions:
openstack-nova-common-2015.1.2-18.2.el7ost.noarch
openstack-nova-console-2015.1.2-18.2.el7ost.noarch
openstack-nova-conductor-2015.1.2-18.2.el7ost.noarch
openstack-nova-compute-2015.1.2-18.2.el7ost.noarch
openstack-nova-novncproxy-2015.1.2-18.2.el7ost.noarch
python-nova-2015.1.2-18.2.el7ost.noarch
openstack-nova-api-2015.1.2-18.2.el7ost.noarch
openstack-nova-cert-2015.1.2-18.2.el7ost.noarch
openstack-nova-scheduler-2015.1.2-18.2.el7ost.noarch
python-novaclient-2.23.0-2.el7ost.noarch
Networking:
OpenStack Neutron with PLUMgrid plugin (similar results seen on non-PLUMgrid install).
Logs & Configs
==============
nova.conf is attached
nova-compute logs
-----------------
2016-07-18 07:41:24.972 36869 ERROR nova.compute.manager [-] Instance failed network setup after 1 attempt(s)
2016-07-18 07:41:24.972 36869 TRACE nova.compute.manager Traceback (most recent call last):
2016-07-18 07:41:24.972 36869 TRACE nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1808, in _allocate_network_async
2016-07-18 07:41:24.972 36869 TRACE nova.compute.manager dhcp_options=dhcp_options)
2016-07-18 07:41:24.972 36869 TRACE nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 577, in allocate_for_instance
2016-07-18 07:41:24.972 36869 TRACE nova.compute.manager self._delete_ports(neutron, instance, created_port_ids)
2016-07-18 07:41:24.972 36869 TRACE nova.compute.manager File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__
2016-07-18 07:41:24.972 36869 TRACE nova.compute.manager six.reraise(self.type_, self.value, self.tb)
2016-07-18 07:41:24.972 36869 TRACE nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 569, in allocate_for_instance
2016-07-18 07:41:24.972 36869 TRACE nova.compute.manager security_group_ids, available_macs, dhcp_opts)
2016-07-18 07:41:24.972 36869 TRACE nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 304, in _create_port
2016-07-18 07:41:24.972 36869 TRACE nova.compute.manager port_id = port_client.create_port(port_req_body)['port']['id']
2016-07-18 07:41:24.972 36869 TRACE nova.compute.manager File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 102, in with_params
2016-07-18 07:41:24.972 36869 TRACE nova.compute.manager ret = self.function(instance, *args, **kwargs)
2016-07-18 07:41:24.972 36869 TRACE nova.compute.manager File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 544, in create_port
2016-07-18 07:41:24.972 36869 TRACE nova.compute.manager return self.post(self.ports_path, body=body)
2016-07-18 07:41:24.972 36869 TRACE nova.compute.manager File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 298, in post
2016-07-18 07:41:24.972 36869 TRACE nova.compute.manager headers=headers, params=params)
2016-07-18 07:41:24.972 36869 TRACE nova.compute.manager File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 200, in do_request
2016-07-18 07:41:24.972 36869 TRACE nova.compute.manager content_type=self.content_type())
2016-07-18 07:41:24.972 36869 TRACE nova.compute.manager File "/usr/lib/python2.7/site-packages/neutronclient/client.py", line 306, in do_request
2016-07-18 07:41:24.972 36869 TRACE nova.compute.manager return self.request(url, method, **kwargs)
2016-07-18 07:41:24.972 36869 TRACE nova.compute.manager File "/usr/lib/python2.7/site-packages/neutronclient/client.py", line 294, in request
2016-07-18 07:41:24.972 36869 TRACE nova.compute.manager resp = super(SessionClient, self).request(*args, **kwargs)
2016-07-18 07:41:24.972 36869 TRACE nova.compute.manager File "/usr/lib/python2.7/site-packages/keystoneclient/adapter.py", line 95, in request
2016-07-18 07:41:24.972 36869 TRACE nova.compute.manager return self.session.request(url, method, **kwargs)
2016-07-18 07:41:24.972 36869 TRACE nova.compute.manager File "/usr/lib/python2.7/site-packages/keystoneclient/utils.py", line 318, in inner
2016-07-18 07:41:24.972 36869 TRACE nova.compute.manager return func(*args, **kwargs)
2016-07-18 07:41:24.972 36869 TRACE nova.compute.manager File "/usr/lib/python2.7/site-packages/keystoneclient/session.py", line 382, in request
2016-07-18 07:41:24.972 36869 TRACE nova.compute.manager resp = send(**kwargs)
2016-07-18 07:41:24.972 36869 TRACE nova.compute.manager File "/usr/lib/python2.7/site-packages/keystoneclient/session.py", line 423, in _send_request
2016-07-18 07:41:24.972 36869 TRACE nova.compute.manager raise exceptions.RequestTimeout(msg)
2016-07-18 07:41:24.972 36869 TRACE nova.compute.manager RequestTimeout: Request to http://11.1.2.50:9696/v2.0/ports.json timed out (HTTP 408)
2016-07-18 07:41:24.972 36869 TRACE nova.compute.manager
2016-07-18 07:41:24.977 36869 ERROR nova.compute.manager [req-76499ba7-5324-4b32-8ec7-b5ceb82e612e 8f2cda8f705849e7a81b6b885ca831f5 7fddfc9832a74ac5844211ef810b5093 - - -] [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] Instance failed to spawn
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] Traceback (most recent call last):
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2486, in _build_resources
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] yield resources
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2358, in _build_and_run_instance
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] block_device_info=block_device_info)
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2505, in spawn
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] write_to_disk=True)
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4353, in _get_guest_xml
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] network_info_str = str(network_info)
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] File "/usr/lib/python2.7/site-packages/nova/network/model.py", line 480, in __str__
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] return self._sync_wrapper(fn, *args, **kwargs)
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] File "/usr/lib/python2.7/site-packages/nova/network/model.py", line 463, in _sync_wrapper
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] self.wait()
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] File "/usr/lib/python2.7/site-packages/nova/network/model.py", line 495, in wait
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] self[:] = self._gt.wait()
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 175, in wait
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] return self._exit_event.wait()
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] File "/usr/lib/python2.7/site-packages/eventlet/event.py", line 121, in wait
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] return hubs.get_hub().switch()
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 294, in switch
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] return self.greenlet.switch()
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 214, in main
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] result = function(*args, **kwargs)
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1808, in _allocate_network_async
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] dhcp_options=dhcp_options)
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 577, in allocate_for_instance
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] self._delete_ports(neutron, instance, created_port_ids)
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] six.reraise(self.type_, self.value, self.tb)
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 569, in allocate_for_instance
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] security_group_ids, available_macs, dhcp_opts)
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 304, in _create_port
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] port_id = port_client.create_port(port_req_body)['port']['id']
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 102, in with_params
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] ret = self.function(instance, *args, **kwargs)
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 544, in create_port
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] return self.post(self.ports_path, body=body)
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 298, in post
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] headers=headers, params=params)
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 200, in do_request
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] content_type=self.content_type())
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] File "/usr/lib/python2.7/site-packages/neutronclient/client.py", line 306, in do_request
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] return self.request(url, method, **kwargs)
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] File "/usr/lib/python2.7/site-packages/neutronclient/client.py", line 294, in request
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] resp = super(SessionClient, self).request(*args, **kwargs)
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] File "/usr/lib/python2.7/site-packages/keystoneclient/adapter.py", line 95, in request
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] return self.session.request(url, method, **kwargs)
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] File "/usr/lib/python2.7/site-packages/keystoneclient/utils.py", line 318, in inner
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] return func(*args, **kwargs)
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] File "/usr/lib/python2.7/site-packages/keystoneclient/session.py", line 382, in request
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] resp = send(**kwargs)
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] File "/usr/lib/python2.7/site-packages/keystoneclient/session.py", line 423, in _send_request
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] raise exceptions.RequestTimeout(msg)
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6] RequestTimeout: Request to http://11.1.2.50:9696/v2.0/ports.json timed out (HTTP 408)
2016-07-18 07:41:24.977 36869 TRACE nova.compute.manager [instance: 866caeb1-0f39-43e3-b08e-bec33c7253e6]
2016-07-18 07:41:25.469 36869 ERROR nova.network.neutronv2.api [req-76499ba7-5324-4b32-8ec7-b5ceb82e612e 8f2cda8f705849e7a81b6b885ca831f5 7fddfc9832a74ac5844211ef810b5093 - - -] Unable to clear device ID for port 'None'
2016-07-18 07:41:25.469 36869 TRACE nova.network.neutronv2.api Traceback (most recent call last):
2016-07-18 07:41:25.469 36869 TRACE nova.network.neutronv2.api File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 366, in _unbind_ports
2016-07-18 07:41:25.469 36869 TRACE nova.network.neutronv2.api port_client.update_port(port_id, port_req_body)
2016-07-18 07:41:25.469 36869 TRACE nova.network.neutronv2.api File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 102, in with_params
2016-07-18 07:41:25.469 36869 TRACE nova.network.neutronv2.api ret = self.function(instance, *args, **kwargs)
2016-07-18 07:41:25.469 36869 TRACE nova.network.neutronv2.api File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 549, in update_port
2016-07-18 07:41:25.469 36869 TRACE nova.network.neutronv2.api return self.put(self.port_path % (port), body=body)
2016-07-18 07:41:25.469 36869 TRACE nova.network.neutronv2.api File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 302, in put
2016-07-18 07:41:25.469 36869 TRACE nova.network.neutronv2.api headers=headers, params=params)
2016-07-18 07:41:25.469 36869 TRACE nova.network.neutronv2.api File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 270, in retry_request
2016-07-18 07:41:25.469 36869 TRACE nova.network.neutronv2.api headers=headers, params=params)
2016-07-18 07:41:25.469 36869 TRACE nova.network.neutronv2.api File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 211, in do_request
2016-07-18 07:41:25.469 36869 TRACE nova.network.neutronv2.api self._handle_fault_response(status_code, replybody)
2016-07-18 07:41:25.469 36869 TRACE nova.network.neutronv2.api File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 185, in _handle_fault_response
2016-07-18 07:41:25.469 36869 TRACE nova.network.neutronv2.api exception_handler_v20(status_code, des_error_body)
2016-07-18 07:41:25.469 36869 TRACE nova.network.neutronv2.api File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 83, in exception_handler_v20
2016-07-18 07:41:25.469 36869 TRACE nova.network.neutronv2.api message=message)
2016-07-18 07:41:25.469 36869 TRACE nova.network.neutronv2.api NeutronClientException: 404 Not Found
2016-07-18 07:41:25.469 36869 TRACE nova.network.neutronv2.api
2016-07-18 07:41:25.469 36869 TRACE nova.network.neutronv2.api The resource could not be found.
There is retry logic in nova compute manager (https:/ /github. com/openstack/ nova/blob/ master/ nova/compute/ manager. py#L1357)
and situation is possible
in which neutron created port but nova did not know about that (timeout for example).
I see two possible solutions:
- rollback port allocation on the server side (neutron) if the server can't respond to client i.e. client is disconnected,
- while retrying nova can first get the port for instance and use it if one exists.
Number of retries is according to CONF.network_ allocate_ retries param,
so as a workaround, you can set it to zero and recreate failed instances.