Launching an instance error out Exception during scheduler.run_instance

Bug #1464089 reported by Vinod Nair
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.20
Fix Committed
Critical
Sanju Abraham
Trunk
Fix Committed
Critical
Sanju Abraham

Bug Description

On a 2.20 46 Juno based HA install patched with oslo patches, launching instances errors out
The below traceback is seen in the nova conductor log

root@cs-scale-2:/store# cat /var/log/nova/nova-conductor.log

15-06-10 20:31:43.789 2479 INFO oslo.messaging._drivers.impl_rabbit [req-8dc23654-ae42-4101-a8ce-0969b7d428e9 ] Connected to AMQP server on 13.1.0.10:5673
2015-06-10 20:32:42.757 2479 ERROR nova.scheduler.driver [req-8dc23654-ae42-4101-a8ce-0969b7d428e9 None] Exception during scheduler.run_instance
2015-06-10 20:32:42.757 2479 TRACE nova.scheduler.driver Traceback (most recent call last):
2015-06-10 20:32:42.757 2479 TRACE nova.scheduler.driver File "/usr/lib/python2.7/dist-packages/nova/conductor/manager.py", line 616, in build_instances
2015-06-10 20:32:42.757 2479 TRACE nova.scheduler.driver request_spec, filter_properties)
2015-06-10 20:32:42.757 2479 TRACE nova.scheduler.driver File "/usr/lib/python2.7/dist-packages/nova/scheduler/client/__init__.py", line 49, in select_destinations
2015-06-10 20:32:42.757 2479 TRACE nova.scheduler.driver context, request_spec, filter_properties)
2015-06-10 20:32:42.757 2479 TRACE nova.scheduler.driver File "/usr/lib/python2.7/dist-packages/nova/scheduler/client/__init__.py", line 35, in __run_method
2015-06-10 20:32:42.757 2479 TRACE nova.scheduler.driver return getattr(self.instance, __name)(*args, **kwargs)
2015-06-10 20:32:42.757 2479 TRACE nova.scheduler.driver File "/usr/lib/python2.7/dist-packages/nova/scheduler/client/query.py", line 34, in select_destinations
2015-06-10 20:32:42.757 2479 TRACE nova.scheduler.driver context, request_spec, filter_properties)
2015-06-10 20:32:42.757 2479 TRACE nova.scheduler.driver File "/usr/lib/python2.7/dist-packages/nova/scheduler/rpcapi.py", line 108, in select_destinations
2015-06-10 20:32:42.757 2479 TRACE nova.scheduler.driver request_spec=request_spec, filter_properties=filter_properties)
2015-06-10 20:32:42.757 2479 TRACE nova.scheduler.driver File "/usr/lib/python2.7/dist-packages/oslo/messaging/rpc/client.py", line 152, in call
2015-06-10 20:32:42.757 2479 TRACE nova.scheduler.driver retry=self.retry)
2015-06-10 20:32:42.757 2479 TRACE nova.scheduler.driver File "/usr/lib/python2.7/dist-packages/oslo/messaging/transport.py", line 90, in _send
2015-06-10 20:32:42.757 2479 TRACE nova.scheduler.driver timeout=timeout, retry=retry)
2015-06-10 20:32:42.757 2479 TRACE nova.scheduler.driver File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 416, in send
2015-06-10 20:32:42.757 2479 TRACE nova.scheduler.driver retry=retry)
2015-06-10 20:32:42.757 2479 TRACE nova.scheduler.driver File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 405, in _send
2015-06-10 20:32:42.757 2479 TRACE nova.scheduler.driver result = self._waiter.wait(msg_id, timeout)
2015-06-10 20:32:42.757 2479 TRACE nova.scheduler.driver File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 293, in wait
2015-06-10 20:32:42.757 2479 TRACE nova.scheduler.driver reply, ending = self._poll_connection(msg_id, timer)
2015-06-10 20:32:42.757 2479 TRACE nova.scheduler.driver File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 240, in _poll_connection
2015-06-10 20:32:42.757 2479 TRACE nova.scheduler.driver self._raise_timeout_exception(msg_id)
2015-06-10 20:32:42.757 2479 TRACE nova.scheduler.driver File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 209, in _raise_timeout_exception
2015-06-10 20:32:42.757 2479 TRACE nova.scheduler.driver _('Timed out waiting for a reply to message ID %s.') % msg_id)
2015-06-10 20:32:42.757 2479 TRACE nova.scheduler.driver MessagingTimeout: Timed out waiting for a reply to message ID edbecdc84c10401283e7274338454de8.
2015-06-10 20:32:42.757 2479 TRACE nova.scheduler.driver
2015-06-10 20:32:42.761 2479 WARNING nova.scheduler.driver [req-8dc23654-ae42-4101-a8ce-0969b7d428e9 None] [instance: 019b9b36-efc1-4c40-aff9-38eaab074f4a] Setting instance to ERROR state.

rabbitmq cluster is ok, so is nova services

root@cs-scale-2:/store# rabbitmqctl cluster_status
Cluster status of node 'rabbit@cs-scale-2-ctrl' ...
[{nodes,[{disc,['rabbit@cs-scale-1-ctrl','rabbit@cs-scale-2-ctrl',
                'rabbit@cs-scale-3-ctrl']}]},
 {running_nodes,['rabbit@cs-scale-1-ctrl','rabbit@cs-scale-3-ctrl',
                 'rabbit@cs-scale-2-ctrl']},
 {cluster_name,<<"rabbit@cs-scale-1">>},
 {partitions,[]}]

root@cs-scale-2:/store# nova service-list
+----+------------------+------------+----------+---------+-------+----------------------------+-----------------+
| Id | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason |
+----+------------------+------------+----------+---------+-------+----------------------------+-----------------+
| 3 | nova-scheduler | cs-scale-1 | internal | enabled | up | 2015-06-11T03:47:27.000000 | - |
| 6 | nova-console | cs-scale-1 | internal | enabled | up | 2015-06-11T03:47:28.000000 | - |
| 9 | nova-consoleauth | cs-scale-1 | internal | enabled | up | 2015-06-11T03:47:29.000000 | - |
| 12 | nova-conductor | cs-scale-1 | internal | enabled | up | 2015-06-11T03:47:28.000000 | - |
| 15 | nova-scheduler | cs-scale-2 | internal | enabled | up | 2015-06-11T03:47:23.000000 | - |
| 18 | nova-console | cs-scale-2 | internal | enabled | up | 2015-06-11T03:47:30.000000 | - |
| 21 | nova-consoleauth | cs-scale-2 | internal | enabled | up | 2015-06-11T03:47:22.000000 | - |
| 24 | nova-conductor | cs-scale-2 | internal | enabled | up | 2015-06-11T03:47:22.000000 | - |
| 27 | nova-scheduler | cs-scale-3 | internal | enabled | up | 2015-06-11T03:47:29.000000 | - |
| 30 | nova-console | cs-scale-3 | internal | enabled | up | 2015-06-11T03:47:29.000000 | - |
| 33 | nova-consoleauth | cs-scale-3 | internal | enabled | up | 2015-06-11T03:47:28.000000 | - |
| 36 | nova-conductor | cs-scale-3 | internal | enabled | up | 2015-06-11T03:47:34.000000 | - |
| 42 | nova-compute | cs-scale-4 | nova | enabled | up | 2015-06-11T03:47:33.000000 | None |
| 45 | nova-compute | cs-scale-5 | nova | enabled | up | 2015-06-11T03:47:27.000000 | None |
| 48 | nova-compute | cs-scale-6 | nova | enabled | up | 2015-06-11T03:47:33.000000 | None |
| 51 | nova-compute | cs-scale-7 | nova | enabled | up | 2015-06-11T03:47:30.000000 | None |
| 54 | nova-compute | cs-scale-8 | nova | enabled | up | 2015-06-11T03:47:27.000000 | None |
+----+------------------+------------+----------+---------+-------+----------------------------+-----------------+
root@cs-scale-2:/store#

Tags: ha
Vinod Nair (vinodnair)
tags: added: ha
Revision history for this message
Sanju Abraham (asanju) wrote :

There are 3 openstack controllers in the setup.

1- 10.87.141.1
2- 10.87.141.2
3- 10.87.141.3

On 1 - nova conductor was restarted after the olso messaging changes were applied..

nova 5361 2046 1 00:23 ? 00:15:32 /usr/bin/python /usr/bin/nova-conductor --config-file=/etc/nova/nova.conf

On 2 - nova conductor was not restart

nova 2331 19293 1 Jun09 ? 00:20:35 /usr/bin/python /usr/bin/nova-conductor --config-file=/etc/nova/nova.conf

On 3 - nova conductor was restarted.

nova 3510 2701 0 00:24 ? 00:02:01 /usr/bin/python /usr/bin/nova-conductor --config-file=/etc/nova/nova.conf
=========================================

The reason why there is the traceback in the log from the 2nd node is because the patch application procedure was not complete.

Please restart nova-conductor and it should be OK.

Revision history for this message
Vinod Nair (vinodnair) wrote :

CS-SCALE-2 nova conductor was infact restarted as seen from the logs

9031 INFO oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on 13.1.0.10:5673
2015-06-09 17:27:53.632 2331 INFO nova.openstack.common.service [-] Starting 40 workers
2015-06-09 17:27:53.634 2331 INFO nova.openstack.common.service [-] Started child 2391
2015-06-09 17:27:53.636 2331 INFO nova.openstack.common.service [-] Started child 2392
2015-06-09 17:27:53.638 2331 INFO nova.openstack.common.service [-] Started child 2393
2015-06-09 17:27:53.640 2331 INFO nova.openstack.common.service [-] Started child 2394
2015-06-09 17:27:53.641 2392 AUDIT nova.service [-] Starting conductor node (version 2014.2.3)

Revision history for this message
Vinod Nair (vinodnair) wrote :
Download full text (13.2 KiB)

The issue seems to be happening after the system has been up for few days.. I had a system with build 2.20 build 46 and was up for few days now with the oslo patches. Initially every thing was fine but after a few days show the same symptoms .. Few of the VM's error out

Conductor trace is as below

2015-06-18 15:28:10.851 4960 ERROR oslo.messaging._drivers.impl_rabbit [req-7b1d3e78-9b21-4d82-831a-2a0488d5ab2d ] Failed to consume message from queue: Socket closed
2015-06-18 15:28:10.851 4960 TRACE oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2015-06-18 15:28:10.851 4960 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 659, in ensure
2015-06-18 15:28:10.851 4960 TRACE oslo.messaging._drivers.impl_rabbit return method()
2015-06-18 15:28:10.851 4960 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 747, in _consume
2015-06-18 15:28:10.851 4960 TRACE oslo.messaging._drivers.impl_rabbit return self.connection.drain_events(timeout=poll_timeout)
2015-06-18 15:28:10.851 4960 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/connection.py", line 279, in drain_events
2015-06-18 15:28:10.851 4960 TRACE oslo.messaging._drivers.impl_rabbit return self.transport.drain_events(self.connection, **kwargs)
2015-06-18 15:28:10.851 4960 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqp.py", line 90, in drain_events
2015-06-18 15:28:10.851 4960 TRACE oslo.messaging._drivers.impl_rabbit return connection.drain_events(**kwargs)
2015-06-18 15:28:10.851 4960 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/connection.py", line 299, in drain_events
2015-06-18 15:28:10.851 4960 TRACE oslo.messaging._drivers.impl_rabbit chanmap, None, timeout=timeout,
2015-06-18 15:28:10.851 4960 TRACE oslo.messaging._drivers.impl_rabbit File "/u...

information type: Proprietary → Public
Revision history for this message
Vinod Nair (vinodnair) wrote :
Download full text (4.7 KiB)

This is happening even in build 2.20 Build 59 Juno

2015-06-21 11:47:35.028 21339 TRACE nova.scheduler.driver _('Timed out waiting for a reply to message ID %s.') % msg_id)
2015-06-21 11:47:35.028 21339 TRACE nova.scheduler.driver MessagingTimeout: Timed out waiting for a reply to message ID b4f230bb9df44b6ca0e54b60068370f6.
2015-06-21 11:47:35.028 21339 TRACE nova.scheduler.driver
2015-06-21 11:47:35.033 21339 WARNING nova.scheduler.driver [req-29aef659-93cc-4e50-bb32-27ff513a141a None] [instance: 28676b25-55b5-43ea-8d50-679e8cade63e] Setting instance to ERROR state.
2015-06-21 11:47:35.211 21345 ERROR nova.scheduler.driver [req-ae9e7f03-4093-4aa8-91d8-b3a137129c15 None] Exception during scheduler.run_instance
2015-06-21 11:47:35.211 21345 TRACE nova.scheduler.driver Traceback (most recent call last):
2015-06-21 11:47:35.211 21345 TRACE nova.scheduler.driver File "/usr/lib/python2.7/dist-packages/nova/conductor/manager.py", line 616, in build_instances
2015-06-21 11:47:35.211 21345 TRACE nova.scheduler.driver request_spec, filter_properties)
2015-06-21 11:47:35.211 21345 TRACE nova.scheduler.driver File "/usr/lib/python2.7/dist-packages/nova/scheduler/client/__init__.py", line 49, in select_destinations
2015-06-21 11:47:35.211 21345 TRACE nova.scheduler.driver context, request_spec, filter_properties)
2015-06-21 11:47:35.211 21345 TRACE nova.scheduler.driver File "/usr/lib/python2.7/dist-packages/nova/scheduler/client/__init__.py", line 35, in __run_method
2015-06-21 11:47:35.211 21345 TRACE nova.scheduler.driver return getattr(self.instance, __name)(*args, **kwargs)
2015-06-21 11:47:35.211 21345 TRACE nova.scheduler.driver File "/usr/lib/python2.7/dist-packages/nova/scheduler/client/query.py", line 34, in select_destinations
2015-06-21 11:47:35.211 21345 TRACE nova.scheduler.driver context, request_spec, filter_properties)
2015-06-21 11:47:35.211 21345 TRACE nova.scheduler.driver File "/usr/lib/python2.7/dist-packages/nova/scheduler/rpcapi.py", line 108, in select_destinations
2015-06-21 11:47:35.211 21345 TRACE nova.scheduler.driver request_spec=request_spec, filter_properties=filter_properties)
2015-06-21 11:47:35.211 21345 TRACE nova.scheduler.driver File "/usr/lib/python2.7/dist-packages/oslo/messaging/rpc/client.py", line 152, in call
2015-06-21 11:47:35.211 21345 TRACE nova.scheduler.driver retry=self.retry)
2015-06-21 11:47:35.211 21345 TRACE nova.scheduler.driver File "/usr/lib/python2.7/dist-packages/oslo/messaging/transport.py", line 90, in _send
2015-06-21 11:47:35.211 21345 TRACE nova.scheduler.driver timeout=timeout, retry=retry)
2015-06-21 11:47:35.211 21345 TRACE nova.scheduler.driver File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py",...

Read more...

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] contrail/juno_2014_2_3

Review in progress for https://review.opencontrail.org/11955
Submitter: Sanju (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/11956
Submitter: Sanju (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/11957
Submitter: Sanju (<email address hidden>)

Revision history for this message
Sanju Abraham (asanju) wrote :

Cherry picked the fix for the bug https://bugs.launchpad.net/nova/+bug/1402574 (fault-tolerance in nova-scheduler) to our juno branch (contrail/juno_2014_2_3 branch).

Review in https://review.opencontrail.org/#/c/11955/

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/11955
Committed: http://github.org/Juniper/nova/commit/7e6973cdf72740eeb4fdb22bfc6e72bdbb9fd92c
Submitter: Zuul
Branch: contrail/juno_2014_2_3

commit 7e6973cdf72740eeb4fdb22bfc6e72bdbb9fd92c
Author: Sanju Abraham <email address hidden>
Date: Mon Jun 22 20:49:02 2015 -0700

Closes-Bug: #1464089. Cherry picking from the fix for nova-scheduler from https://review.openstack.org/#/c/142124/ to contrail/juno_2014_2_3 branch

Change-Id: I71c5067776fe3c61ccf16cb34068f75a643ba414

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/11956
Committed: http://github.org/Juniper/contrail-packaging/commit/140ac3b211509739f2c96b20fa27470e5f4a3df4
Submitter: Zuul
Branch: R2.20

commit 140ac3b211509739f2c96b20fa27470e5f4a3df4
Author: Sanju Abraham <email address hidden>
Date: Mon Jun 22 21:17:49 2015 -0700

Closes-Bug: #1464089. Cherry picking from the fix for nova-scheduler from https://review.openstack.org/#/c/142124/ to contrail/juno_2014_2_3 branch

Change-Id: I11bcc9b0c820802cad029ab8fb1388e6d93f4a67

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/11957
Committed: http://github.org/Juniper/contrail-packaging/commit/f5dc6f45f0d7966655408616bb7e085c753ee6d6
Submitter: Zuul
Branch: master

commit f5dc6f45f0d7966655408616bb7e085c753ee6d6
Author: Sanju Abraham <email address hidden>
Date: Mon Jun 22 21:20:55 2015 -0700

Closes-Bug: #1464089. Cherry picking from the fix for nova-scheduler from https://review.openstack.org/#/c/142124/ to contrail/juno_2014_2_3 branch

Change-Id: Ib1b6d844021c7eaf49b84285f8270aa9d24ab4dc

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] contrail/juno_2014_2_3

Review in progress for https://review.opencontrail.org/12023
Submitter: Sanju (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/12023
Committed: http://github.org/Juniper/nova/commit/2b2668652253f8baf3e4be52a3a3ca1a04b1682a
Submitter: Zuul
Branch: contrail/juno_2014_2_3

commit 2b2668652253f8baf3e4be52a3a3ca1a04b1682a
Author: Sanju Abraham <email address hidden>
Date: Wed Jun 24 19:57:38 2015 -0700

Related-Bug: #1464089. Added an import that was missing in the cherry pick

Change-Id: I54817101079c6cef4699407430e0a9132268eadf

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.