[yoga] Tempest Octavia test failed with 500 error

Bug #2022949 reported by Jeffrey Chang

This bug report was marked for expiration 215 days ago. (find out why)

12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Octavia Charm
Incomplete
Undecided
Unassigned
python-ovsdbapp (Ubuntu)
Incomplete
Undecided
Unassigned

Bug Description

SQA found several load balancer test in tempest failed with 500 error.

Traceback (most recent call last):
  File "/home/ubuntu/snap/fcbtest/50/.rally/verification/verifier-ef3678e1-4f48-402c-876f-6b73dcec2bc0/repo/tempest/test.py", line 206, in setUpClass
    raise value.with_traceback(trace)
  File "/home/ubuntu/snap/fcbtest/50/.rally/verification/verifier-ef3678e1-4f48-402c-876f-6b73dcec2bc0/repo/tempest/test.py", line 199, in setUpClass
    cls.resource_setup()
  File "/snap/fcbtest/50/lib/python3.10/site-packages/octavia_tempest_plugin/tests/test_base.py", line 225, in resource_setup
    cls.api_version = cls.mem_lb_client.get_max_api_version()
  File "/snap/fcbtest/50/lib/python3.10/site-packages/octavia_tempest_plugin/services/load_balancer/v2/base_client.py", line 452, in get_max_api_version
    response, body = self.get('/')
  File "/home/ubuntu/snap/fcbtest/50/.rally/verification/verifier-ef3678e1-4f48-402c-876f-6b73dcec2bc0/repo/tempest/lib/common/rest_client.py", line 322, in get
    return self.request('GET', url, extra_headers, headers,
  File "/home/ubuntu/snap/fcbtest/50/.rally/verification/verifier-ef3678e1-4f48-402c-876f-6b73dcec2bc0/repo/tempest/lib/common/rest_client.py", line 742, in request
    self._error_checker(resp, resp_body)
  File "/home/ubuntu/snap/fcbtest/50/.rally/verification/verifier-ef3678e1-4f48-402c-876f-6b73dcec2bc0/repo/tempest/lib/common/rest_client.py", line 836, in _error_checker
    raise exceptions.UnexpectedContentType(str(resp.status),
tempest.lib.exceptions.UnexpectedContentType: Unexpected content type provided
Details: 500

And Octavia worker log shows
2023-06-03 01:52:08.648 18454 ERROR cotyledon._utils [-] Unhandled exception: oslo_messaging.exceptions.InvalidTarget: A server's target must have topic and server names specified:<Target server=juju-b68c1b-5-lxd-9>
2023-06-03 01:52:08.648 18454 ERROR cotyledon._utils Traceback (most recent call last):
2023-06-03 01:52:08.648 18454 ERROR cotyledon._utils File "/usr/lib/python3/dist-packages/cotyledon/_utils.py", line 95, in exit_on_exception
2023-06-03 01:52:08.648 18454 ERROR cotyledon._utils yield
2023-06-03 01:52:08.648 18454 ERROR cotyledon._utils File "/usr/lib/python3/dist-packages/cotyledon/_service.py", line 139, in _run
2023-06-03 01:52:08.648 18454 ERROR cotyledon._utils self.run()
2023-06-03 01:52:08.648 18454 ERROR cotyledon._utils File "/usr/lib/python3/dist-packages/octavia/controller/queue/v1/consumer.py", line 47, in run
2023-06-03 01:52:08.648 18454 ERROR cotyledon._utils self.message_listener.start()
2023-06-03 01:52:08.648 18454 ERROR cotyledon._utils File "/usr/lib/python3/dist-packages/oslo_messaging/server.py", line 267, in wrapper
2023-06-03 01:52:08.648 18454 ERROR cotyledon._utils states[state].run_once(lambda: fn(self, *args, **kwargs),
2023-06-03 01:52:08.648 18454 ERROR cotyledon._utils File "/usr/lib/python3/dist-packages/oslo_messaging/server.py", line 188, in run_once
2023-06-03 01:52:08.648 18454 ERROR cotyledon._utils post_fn = fn()
2023-06-03 01:52:08.648 18454 ERROR cotyledon._utils File "/usr/lib/python3/dist-packages/oslo_messaging/server.py", line 267, in <lambda>
2023-06-03 01:52:08.648 18454 ERROR cotyledon._utils states[state].run_once(lambda: fn(self, *args, **kwargs),
2023-06-03 01:52:08.648 18454 ERROR cotyledon._utils File "/usr/lib/python3/dist-packages/oslo_messaging/server.py", line 413, in start
2023-06-03 01:52:08.648 18454 ERROR cotyledon._utils self.listener = self._create_listener()
2023-06-03 01:52:08.648 18454 ERROR cotyledon._utils File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/server.py", line 150, in _create_listener
2023-06-03 01:52:08.648 18454 ERROR cotyledon._utils return self.transport._listen(self._target, 1, None)
2023-06-03 01:52:08.648 18454 ERROR cotyledon._utils File "/usr/lib/python3/dist-packages/oslo_messaging/transport.py", line 139, in _listen
2023-06-03 01:52:08.648 18454 ERROR cotyledon._utils raise exceptions.InvalidTarget('A server\'s target must have '
2023-06-03 01:52:08.648 18454 ERROR cotyledon._utils oslo_messaging.exceptions.InvalidTarget: A server's target must have topic and server names specified:<Target server=juju-b68c1b-5-lxd-9>
2023-06-03 01:52:08.648 18454 ERROR cotyledon._utils
2023-06-03 01:52:08.655 18178 INFO cotyledon._service_manager [-] Child 18454 exited with status 2
2023-06-03 01:52:08.654 18462 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection failed: [Errno 111] Connection refused (retrying in 1.0 seconds): ConnectionRefusedError: [Errno 111] Connection refused

Crashdump could be found in https://oil-jenkins.canonical.com/artifacts/a6c8ea10-fdf4-4928-a5d3-54c5568cb5d3/index.html

Revision history for this message
Felipe Reyes (freyes) wrote :

relevant piece of the bundle:

  octavia:
    bindings:
      ? ''
      : oam-space
      admin: internal-space
      amqp: internal-space
      certificates: internal-space
      identity-service: internal-space
      internal: internal-space
      public: public-space
      shared-db: internal-space
    channel: yoga/stable
    charm: octavia
    num_units: 3
    options:
      haproxy-client-timeout: 1000000
      haproxy-connect-timeout: 100000
      haproxy-queue-timeout: 100000
      haproxy-server-timeout: 1000000
      lb-mgmt-controller-cacert: <redacted>
      lb-mgmt-controller-cert: <redacted>
      lb-mgmt-issuing-ca-key-passphrase: <redacted>
      lb-mgmt-issuing-ca-private-key: <redacted>
      lb-mgmt-issuing-cacert: <redacted>
      loadbalancer-topology: ACTIVE_STANDBY
      openstack-origin: cloud:focal-yoga
      region: RegionOne
      use-internal-endpoints: true
    to:
    - lxd:1
    - lxd:3
    - lxd:5

summary: - Tempest Octavia test failed with 500 error
+ [yoga] Tempest Octavia test failed with 500 error
Revision history for this message
Felipe Reyes (freyes) wrote :
Revision history for this message
Felipe Reyes (freyes) wrote :
Download full text (3.7 KiB)

The InvalidTarget exception was being logged from 02:16 until 02:55, during that same time window can be observed that rabbitmq was dropping connections, although after 02:55 the octavia-worker running in octavia/0 was running until 05:00 which it's the last logged line.

The chunk of rally testin failing has no timestamp, where can I find the associated log?, I would like to correlate with the time window identified.

$ grep oslo_messaging.exceptions.InvalidTarget octavia-worker.log.txt | wc -l
2760
-----------
$ grep oslo_messaging.exceptions.InvalidTarget octavia-worker.log.txt | head -n 2
2023-06-03 02:16:11.771 18362 ERROR cotyledon._utils [-] Unhandled exception: oslo_messaging.exceptions.InvalidTarget: A server's target must have topic and server names specified:<Target server=juju-b68c1b-1-lxd-8>
2023-06-03 02:16:11.771 18362 ERROR cotyledon._utils oslo_messaging.exceptions.InvalidTarget: A server's target must have topic and server names specified:<Target server=juju-b68c1b-1-lxd-8>
$ grep oslo_messaging.exceptions.InvalidTarget octavia-worker.log.txt | tail -n 2
2023-06-03 02:55:37.218 122396 ERROR cotyledon._utils [-] Unhandled exception: oslo_messaging.exceptions.InvalidTarget: A server's target must have topic and server names specified:<Target server=juju-b68c1b-1-lxd-8>
2023-06-03 02:55:37.218 122396 ERROR cotyledon._utils oslo_messaging.exceptions.InvalidTarget: A server's target must have topic and server names specified:<Target server=juju-b68c1b-1-lxd-8>
-----------
$ grep "oslo.messaging._drivers.impl_rabbit" octavia-worker.log.txt | head -n 5
2023-06-03 02:16:11.779 18366 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection failed: [Errno 111] Connection refused (retrying in 1.0 seconds): ConnectionRefusedError: [Errno 111] Connection refused
2023-06-03 02:16:12.781 18366 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection failed: [Errno 111] Connection refused (retrying in 3.0 seconds): ConnectionRefusedError: [Errno 111] Connection refused
2023-06-03 02:16:15.784 18366 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection failed: [Errno 111] Connection refused (retrying in 5.0 seconds): ConnectionRefusedError: [Errno 111] Connection refused
2023-06-03 02:16:20.790 18366 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection failed: [Errno 111] Connection refused (retrying in 7.0 seconds): ConnectionRefusedError: [Errno 111] Connection refused
2023-06-03 02:16:27.796 18366 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection failed: [Errno 111] Connection refused (retrying in 9.0 seconds): ConnectionRefusedError: [Errno 111] Connection refused
 $ grep "oslo.messaging._drivers.impl_rabbit" octavia-worker.log.txt | tail -n 5
2023-06-03 02:53:33.778 18366 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection failed: [Errno 111] Connection refused (retrying in 31.0 seconds): ConnectionRefusedError: [Errno 111] Connection refused
2023-06-03 02:54:04.806 18366 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection failed: [Errno 111] Connection refused (retrying in 31.0 seconds): ConnectionRefusedError: [Errno 111] Connection refused
2023-06-03 02:54:35.834 18366 ERROR oslo.messaging._drivers.impl...

Read more...

Changed in charm-octavia:
status: New → Incomplete
Revision history for this message
Alexander Litvinov (alitvinov) wrote (last edit ):

Started seeing the same tempest failures on the last 2 redeployments of previously fully passing cloud.
link to logs: https://pastebin.canonical.com/p/VS8VZckNds/ (customer info)
octavia.conf: https://pastebin.canonical.com/p/s3f49GP3Pw/
bundles: https://git.launchpad.net/cpe-deployments/tree/?h=2022-04-01-OP-332677-AXS-PCB-Bootstack-1
subscribing ~field-critical as blocking handover

timestamps of failures:
2023-06-05-23:18:47 root DEBUG 2023-06-05 23:18:47.417 3987126 INFO tempestverifier [-] {104} tearDownClass (octavia_tempest_plugin.tests.api.v2.test_load_balancer.LoadBalancerAPITest) ... fail

2023-06-05-23:07:22 root DEBUG 2023-06-05 23:07:22.490 3987126 INFO tempestverifier [-] {104} octavia_tempest_plugin.tests.api.v2.test_load_balancer.LoadBalancerAPITest.test_load_balancer_delete ... fail [906.323s]

Changed in charm-octavia:
status: Incomplete → New
Revision history for this message
Jeffrey Chang (modern911) wrote :
Revision history for this message
Alexander Litvinov (alitvinov) wrote :

I just ran tempest second time and it's fully passing, then 3rd time I have loadbalancer_delete fail with same error. So it's unstable but issue exists

Revision history for this message
Felipe Reyes (freyes) wrote :
Download full text (4.8 KiB)

I'm seeing this other exception which seems to be more interesting from a troubleshooting perspective:

$ grep nova.exception_Remote.PortBindingFailed_Remote octavia-worker.log
<class 'nova.exception_Remote.PortBindingFailed_Remote'> (HTTP 500) (Request-ID: req-cb067766-8e5c-4c61-9582-d08f1766d9f4).
2023-06-05 22:53:41.089 199015 ERROR octavia.network.drivers.neutron.allowed_address_pairs <class 'nova.exception_Remote.PortBindingFailed_Remote'> (HTTP 500) (Request-ID: req-cb067766-8e5c-4c61-9582-d08f1766d9f4)
2023-06-05 22:53:41.089 199015 ERROR octavia.network.drivers.neutron.allowed_address_pairs <class 'nova.exception_Remote.PortBindingFailed_Remote'> (HTTP 500) (Request-ID: req-cb067766-8e5c-4c61-9582-d08f1766d9f4).
2023-06-05 22:53:41.091 199015 ERROR octavia.network.drivers.neutron.allowed_address_pairs <class 'nova.exception_Remote.PortBindingFailed_Remote'> (HTTP 500) (Request-ID: req-cb067766-8e5c-4c61-9582-d08f1766d9f4)
2023-06-05 22:53:41.091 199015 ERROR octavia.network.drivers.neutron.allowed_address_pairs <class 'nova.exception_Remote.PortBindingFailed_Remote'> (HTTP 500) (Request-ID: req-cb067766-8e5c-4c61-9582-d08f1766d9f4).
2023-06-05 22:53:41.682 199015 ERROR octavia.controller.worker.v2.controller_worker <class 'nova.exception_Remote.PortBindingFailed_Remote'> (HTTP 500) (Request-ID: req-cb067766-8e5c-4c61-9582-d08f1766d9f4)
2023-06-05 22:53:41.682 199015 ERROR octavia.controller.worker.v2.controller_worker <class 'nova.exception_Remote.PortBindingFailed_Remote'> (HTTP 500) (Request-ID: req-cb067766-8e5c-4c61-9582-d08f1766d9f4).

After that there are failures connecting to the amphorae, which probably makes sense since the list of allowed address pairs couldn't be updated

$ grep taskflow.exceptions.WrappedFailure octavia-worker.log
2023-06-05 22:53:43.184 199015 ERROR oslo_messaging.rpc.server [-] Exception during message handling: taskflow.exceptions.WrappedFailure: WrappedFailure: [Failure: octavia.amphorae.driver_exceptions.exceptions.AmpConnectionRetry: Could not connect to amphora, exception caught: HTTPSConnectionPool(host='fc00:5a76:ae64:bd5a:f816:3eff:fe8f:7a67', port=9443): Max retries exceeded with url: // (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fb07d081b40>, 'Connection to fc00:5a76:ae64:bd5a:f816:3eff:fe8f:7a67 timed out. (connect timeout=10.0)')), Failure: octavia.amphorae.driver_exceptions.exceptions.AmpConnectionRetry: Could not connect to amphora, exception caught: HTTPSConnectionPool(host='fc00:5a76:ae64:bd5a:f816:3eff:fe34:d41e', port=9443): Max retries exceeded with url: // (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fb07d03c5b0>: Failed to establish a new connection: [Errno 111] Connection refused')), Failure: octavia.network.base.PlugVIPException: Error plugging amphora (compute_id: c8acf9f4-15f2-4a5e-ac7b-92c957de620a) into vip network e94cf707-e23c-405b-a582-6e1df99ca271., Failure: octavia.amphorae.driver_exceptions.exceptions.AmpConnectionRetry: Could not connect to amphora, exception caught: HTTPSConnectionPool(host='fc00:5a76:ae64:bd5a:f816:3eff:fe8f:7a67', port=9443): Max retries exceeded with url: // (Caused by NewConn...

Read more...

Revision history for this message
Felipe Reyes (freyes) wrote :

@alitvinov , can you provide us with the neutron-api logs?, so we can get an undestanding why neutron-server returned a Error 500 when updating the list allowed addresses by Octavia.

Changed in charm-octavia:
status: New → Incomplete
Revision history for this message
Alexander Litvinov (alitvinov) wrote :
Changed in charm-octavia:
status: Incomplete → New
Revision history for this message
Alexander Litvinov (alitvinov) wrote :

2023-06-05 22:53:43.883 205108 ERROR neutron.plugins.ml2.managers [req-97cf9779-9bb6-410a-a584-198f00f98c4f - - - - -] Mechanism driver 'ovn' failed in update_port_postcommit: ovsdbapp.backend.ovs_idl.idlutils.RowNotFound: Cannot find Logical_Switch_Port with name=5ed74ce4-bea9-48a2-9055-e8a95cb3162d
2023-06-05 22:53:43.883 205108 ERROR neutron.plugins.ml2.managers Traceback (most recent call last):
2023-06-05 22:53:43.883 205108 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/managers.py", line 498, in _call_on_drivers
2023-06-05 22:53:43.883 205108 ERROR neutron.plugins.ml2.managers getattr(driver.obj, method_name)(context)
2023-06-05 22:53:43.883 205108 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py", line 854, in update_port_postcommit
2023-06-05 22:53:43.883 205108 ERROR neutron.plugins.ml2.managers self._ovn_update_port(context._plugin_context, port, original_port,
2023-06-05 22:53:43.883 205108 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py", line 740, in _ovn_update_port
2023-06-05 22:53:43.883 205108 ERROR neutron.plugins.ml2.managers self._ovn_client.update_port(plugin_context, port,
2023-06-05 22:53:43.883 205108 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py", line 724, in update_port
2023-06-05 22:53:43.883 205108 ERROR neutron.plugins.ml2.managers ovn_port = self._nb_idl.lookup('Logical_Switch_Port', port['id'])
2023-06-05 22:53:43.883 205108 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/impl_idl_ovn.py", line 207, in lookup
2023-06-05 22:53:43.883 205108 ERROR neutron.plugins.ml2.managers return super().lookup(table, record, default=default, timeout=timeout,
2023-06-05 22:53:43.883 205108 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3/dist-packages/ovsdbapp/backend/ovs_idl/__init__.py", line 208, in lookup
2023-06-05 22:53:43.883 205108 ERROR neutron.plugins.ml2.managers return self._lookup(table, record)
2023-06-05 22:53:43.883 205108 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3/dist-packages/ovsdbapp/backend/ovs_idl/__init__.py", line 268, in _lookup
2023-06-05 22:53:43.883 205108 ERROR neutron.plugins.ml2.managers row = idlutils.row_by_value(self, rl.table, rl.column, record)
2023-06-05 22:53:43.883 205108 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3/dist-packages/ovsdbapp/backend/ovs_idl/idlutils.py", line 114, in row_by_value
2023-06-05 22:53:43.883 205108 ERROR neutron.plugins.ml2.managers raise RowNotFound(table=table, col=column, match=match)
2023-06-05 22:53:43.883 205108 ERROR neutron.plugins.ml2.managers ovsdbapp.backend.ovs_idl.idlutils.RowNotFound: Cannot find Logical_Switch_Port with name=5ed74ce4-bea9-48a2-9055-e8a95cb3162d

Revision history for this message
Felipe Reyes (freyes) wrote :

same kind of errors in the log provided at 22:53, there is this old bug 1777466 which exhibits the same kind of exceptions (RowNotFound)

$ grep "Cannot find Logical_Switch_Port" neutron-server.log
2023-06-05 22:53:23.172 205109 ERROR neutron.plugins.ml2.managers [req-a54a4176-904f-464c-8479-c9b2f8edae84 - - - - -] Mechanism driver 'ovn' failed in update_port_postcommit: ovsdbapp.backend.ovs_idl.idlutils.RowNotFound: Cannot find Logical_Switch_Port with name=6f950852-3634-4d84-9072-81b9c31e2484
2023-06-05 22:53:23.172 205109 ERROR neutron.plugins.ml2.managers ovsdbapp.backend.ovs_idl.idlutils.RowNotFound: Cannot find Logical_Switch_Port with name=6f950852-3634-4d84-9072-81b9c31e2484
2023-06-05 22:53:43.883 205108 ERROR neutron.plugins.ml2.managers [req-97cf9779-9bb6-410a-a584-198f00f98c4f - - - - -] Mechanism driver 'ovn' failed in update_port_postcommit: ovsdbapp.backend.ovs_idl.idlutils.RowNotFound: Cannot find Logical_Switch_Port with name=5ed74ce4-bea9-48a2-9055-e8a95cb3162d
2023-06-05 22:53:43.883 205108 ERROR neutron.plugins.ml2.managers ovsdbapp.backend.ovs_idl.idlutils.RowNotFound: Cannot find Logical_Switch_Port with name=5ed74ce4-bea9-48a2-9055-e8a95cb3162d
ovsdbapp.backend.ovs_idl.idlutils.RowNotFound: Cannot find Logical_Switch_Port with name=cf166813-c4c9-4421-b41e-f31538fa4166

I'm adding a task for ovsdbapp since it's the component that takes care of interactions between neutron and ovn.

Revision history for this message
Jadon Naas (jadonn) wrote :

@alitvinov Are you still seeing this issue? Is this still blocking you? It looks like it has not occurred in any more SQA tests. I do not see any updates on this in the last few weeks, and I wanted to check in with you to see if you were still blocked by this.

Jadon Naas (jadonn)
Changed in charm-octavia:
status: New → Incomplete
Changed in python-ovsdbapp (Ubuntu):
status: New → Incomplete
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.