Unable to create instances due to QueuePool limit

Bug #1144108 reported by Jacek Nykis
46
This bug affects 9 people
Affects Status Importance Assigned to Milestone
nova (Ubuntu)
Confirmed
Low
Unassigned

Bug Description

We had problems creating instances. Attempts were causing the following error on the compute node:

2013-03-01 11:52:41 21023 ERROR nova.openstack.common.rpc.amqp [-] Exception during message handling
2013-03-01 11:52:41 21023 ERROR nova.openstack.common.rpc.common [-] Returning exception QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30 to caller
2013-03-01 11:52:41 21023 ERROR nova.openstack.common.rpc.common [-] ['Traceback (most recent call last):\n', ' File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py", line 276, in _process_data\n rval = self.proxy.dispatch(ctxt, version, method, **args)\n', ' File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/dispatcher.py", line 145, in dispatch\n return getattr(proxyobj, method)(ctxt, **kwargs)\n', ' File "/usr/lib/python2.7/dist-packages/nova/network/manager.py", line 258, in wrapped\n return func(self, context, *args, **kwargs)\n', ' File "/usr/lib/python2.7/dist-packages/nova/network/manager.py", line 1120, in get_instance_nw_info\n instance_uuid)\n', ' File "/usr/lib/python2.7/dist-packages/nova/db/api.py", line 523, in virtual_interface_get_by_instance\n return IMPL.virtual_interface_get_by_instance(context, instance_id)\n', ' File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 126, in wrapper\n return f(*args, **kwargs)\n', ' File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 138, in wrapper\n db.instance_get_by_uuid(context, instance_uuid)\n', ' File "/usr/lib/python2.7/dist-packages/nova/db/api.py", line 570, in instance_get_by_uuid\n return IMPL.instance_get_by_uuid(context, uuid)\n', ' File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 126, in wrapper\n return f(*args, **kwargs)\n', ' File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 1485, in instance_get_by_uuid\n filter_by(uuid=uuid).\\\n', ' File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2156, in first\n ret = list(self[0:1])\n', ' File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2023, in __getitem__\n return list(res)\n', ' File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2227, in __iter__\n return self._execute_and_instances(context)\n', ' File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2240, in _execute_and_instances\n close_with_result=True)\n', ' File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2231, in _connection_from_session\n **kw)\n', ' File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 730, in connection\n close_with_result=close_with_result)\n', ' File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 736, in _connection_for_bind\n return engine.contextual_connect(**kwargs)\n', ' File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 2490, in contextual_connect\n self.pool.connect(),\n', ' File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 224, in connect\n return _ConnectionFairy(self).checkout()\n', ' File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 387, in __init__\n rec = self._connection_record = pool._do_get()\n', ' File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 729, in _do_get\n (self.size(), self.overflow(), self._timeout))\n', 'TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30\n']

And this on the controller:
2013-03-01 11:52:41 21023 DEBUG amqplib [-] using channel_id: 1 __init__ /usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py:70
2013-03-01 11:52:41 21023 DEBUG amqplib [-] Channel open _open_ok /usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py:484
2013-03-01 11:52:41 21023 DEBUG amqplib [-] using channel_id: 1 __init__ /usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py:70
2013-03-01 11:52:41 21023 DEBUG amqplib [-] using channel_id: 1 __init__ /usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py:70
2013-03-01 11:52:41 21023 DEBUG amqplib [-] using channel_id: 1 __init__ /usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py:70
2013-03-01 11:52:41 21023 DEBUG amqplib [-] Channel open _open_ok /usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py:484
2013-03-01 11:52:41 21023 DEBUG amqplib [-] Closed channel #1 _do_close /usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py:90
2013-03-01 11:52:41 21023 DEBUG amqplib [-] Closed channel #1 _do_close /usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py:90
2013-03-01 11:52:41 21023 DEBUG amqplib [-] Channel open _open_ok /usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py:484
2013-03-01 11:52:41 21023 DEBUG amqplib [-] Closed channel #1 _do_close /usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py:90
2013-03-01 11:52:41 21023 DEBUG amqplib [-] Channel open _open_ok /usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py:484
2013-03-01 11:52:41 21023 DEBUG amqplib [-] Channel open _open_ok /usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py:484
2013-03-01 11:52:41 21023 DEBUG amqplib [-] using channel_id: 1 __init__ /usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py:70
2013-03-01 11:52:41 21023 ERROR nova.openstack.common.rpc.amqp [-] Exception during message handling
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp Traceback (most recent call last):
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py", line 276, in _process_data
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp rval = self.proxy.dispatch(ctxt, version, method, **args)
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/dispatcher.py", line 145, in dispatch
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp return getattr(proxyobj, method)(ctxt, **kwargs)
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/network/manager.py", line 258, in wrapped
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp return func(self, context, *args, **kwargs)
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/network/manager.py", line 1120, in get_instance_nw_info
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp instance_uuid)
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/db/api.py", line 523, in virtual_interface_get_by_instance
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp return IMPL.virtual_interface_get_by_instance(context, instance_id)
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 126, in wrapper
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp return f(*args, **kwargs)
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 138, in wrapper
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp db.instance_get_by_uuid(context, instance_uuid)
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/db/api.py", line 570, in instance_get_by_uuid
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp return IMPL.instance_get_by_uuid(context, uuid)
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 126, in wrapper
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp return f(*args, **kwargs)
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 1485, in instance_get_by_uuid
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp filter_by(uuid=uuid).\
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2156, in first
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp ret = list(self[0:1])
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2023, in __getitem__
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp return list(res)
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2227, in __iter__
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp return self._execute_and_instances(context)
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2240, in _execute_and_instances
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp close_with_result=True)
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2231, in _connection_from_session
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp **kw)
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 730, in connection
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp close_with_result=close_with_result)
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 736, in _connection_for_bind
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp return engine.contextual_connect(**kwargs)
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 2490, in contextual_connect
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp self.pool.connect(),
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 224, in connect
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp return _ConnectionFairy(self).checkout()
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 387, in __init__
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp rec = self._connection_record = pool._do_get()
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 729, in _do_get
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp (self.size(), self.overflow(), self._timeout))
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30
2013-03-01 11:52:41 21023 TRACE nova.openstack.common.rpc.amqp
2013-03-01 11:52:41 21023 ERROR nova.openstack.common.rpc.common [-] Returning exception QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30 to caller
2013-03-01 11:52:41 21023 ERROR nova.openstack.common.rpc.common [-] ['Traceback (most recent call last):\n', ' File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py", line 276, in _process_data\n rval = self.proxy.dispatch(ctxt, version, method, **args)\n', ' File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/dispatcher.py", line 145, in dispatch\n return getattr(proxyobj, method)(ctxt, **kwargs)\n', ' File "/usr/lib/python2.7/dist-packages/nova/network/manager.py", line 258, in wrapped\n return func(self, context, *args, **kwargs)\n', ' File "/usr/lib/python2.7/dist-packages/nova/network/manager.py", line 1120, in get_instance_nw_info\n instance_uuid)\n', ' File "/usr/lib/python2.7/dist-packages/nova/db/api.py", line 523, in virtual_interface_get_by_instance\n return IMPL.virtual_interface_get_by_instance(context, instance_id)\n', ' File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 126, in wrapper\n return f(*args, **kwargs)\n', ' File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 138, in wrapper\n db.instance_get_by_uuid(context, instance_uuid)\n', ' File "/usr/lib/python2.7/dist-packages/nova/db/api.py", line 570, in instance_get_by_uuid\n return IMPL.instance_get_by_uuid(context, uuid)\n', ' File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 126, in wrapper\n return f(*args, **kwargs)\n', ' File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 1485, in instance_get_by_uuid\n filter_by(uuid=uuid).\\\n', ' File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2156, in first\n ret = list(self[0:1])\n', ' File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2023, in __getitem__\n return list(res)\n', ' File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2227, in __iter__\n return self._execute_and_instances(context)\n', ' File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2240, in _execute_and_instances\n close_with_result=True)\n', ' File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2231, in _connection_from_session\n **kw)\n', ' File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 730, in connection\n close_with_result=close_with_result)\n', ' File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 736, in _connection_for_bind\n return engine.contextual_connect(**kwargs)\n', ' File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 2490, in contextual_connect\n self.pool.connect(),\n', ' File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 224, in connect\n return _ConnectionFairy(self).checkout()\n', ' File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 387, in __init__\n rec = self._connection_record = pool._do_get()\n', ' File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 729, in _do_get\n (self.size(), self.overflow(), self._timeout))\n', 'TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30\n']

Load on the system was very low and mySQL did not look busy at the time. Attempts kept failing and only way we found to restore service was to run "service nova-network restart". After that everything came back to normal.

We are running:
Distributor ID: Ubuntu
Description: Ubuntu 12.04.2 LTS
Release: 12.04

nova-network:
  Installed: 2013.1~g1-0ubuntu1~cloud0
  Candidate: 2013.1.g3-0ubuntu1~cloud0
  Version table:
     2013.1.g3-0ubuntu1~cloud0 0
        500 http://ubuntu-cloud.archive.canonical.com/ubuntu/ precise-updates/grizzly/main amd64 Packages
        500 http://ubuntu-cloud.archive.canonical.com/ubuntu/ precise-proposed/grizzly/main amd64 Packages
 *** 2013.1~g1-0ubuntu1~cloud0 0
        100 /var/lib/dpkg/status
     2012.1.3+stable-20120827-4d2a4afe-0ubuntu1.2 0
        500 http://archive.ubuntu.com/ubuntu/ precise-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu/ precise-security/main amd64 Packages
     2012.1.3+stable-20120827-4d2a4afe-0ubuntu1.0.ISPATCHED.12.04.1 0
        500 http://archive.admin.canonical.com/ubuntu/ precise-cat/main amd64 Packages
     2012.1-0ubuntu2 0
        500 http://archive.ubuntu.com/ubuntu/ precise/main amd64 Packages
Codename: precise

Tags: canonistack
Revision history for this message
Chuck Short (zulcss) wrote :

This is due sqlalchemy not closing your sessions to your database properly, can you try adjusting the following settings:

DEFINE_integer('sql_pool_timeout', 30,
'seconds to wait for connection from pool before erroring')
DEFINE_integer('sql_min_pool_size', 30,
'minimum number of SQL connections to pool')
DEFINE_integer('sql_max_pool_size', 30,
'maximum number of SQL connections to pool')

Changed in nova (Ubuntu):
importance: Undecided → Low
status: New → Incomplete
Revision history for this message
Jacek Nykis (jacekn) wrote :

I still think this is a valid bug.
The problem disappeared after nova-network restart and so far it did not come back. What we would like to find out is why did this start happening. There is no proof of this but it may be ralated to long nova-network uptime.
Changing the values as you suggested could only delay the problem, there is no guarantee that the root cause is caused by low thresholds. If it was the thresholds why would the problem disappear after restart?

Also if you suggest that default values could cause problems should they not be updated?

Jacek Nykis (jacekn)
Changed in nova (Ubuntu):
status: Incomplete → Confirmed
Jacek Nykis (jacekn)
tags: added: canonistack
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.