periodic-tripleo-ci-centos-8-scenario010-kvm-standalone-octavia-master/victoria/ussuri failing on tempest tests with SSHTimeout issue with error message "socket.timeout: timed out"

Bug #1909574 reported by Sandeep Yadav
32
This bug affects 3 people
Affects Status Importance Assigned to Milestone
tripleo
Invalid
High
Unassigned

Bug Description

Description:-

periodic-tripleo-ci-centos-8-scenario010-kvm-standalone-octavia-master/victoria/ussuri failing on tempest tests with SSHTimeout issue with error message "socket.timeout: timed out" since 11th Dec'20

Build history:-

https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-8-scenario010-kvm-standalone-octavia-master
https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-8-scenario010-kvm-standalone-octavia-victoria
https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-8-scenario010-kvm-standalone-octavia-ussuri

Logs:-

https://logserver.rdoproject.org/openstack-component-octavia/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-scenario010-kvm-standalone-octavia-master/b2c4f77/logs/undercloud/var/log/tempest/tempest_run.log.txt.gz
~~~
{1} tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_connectivity_between_vms_on_different_networks [375.992784s] ... FAILED

Captured traceback:
~~~~~~~~~~~~~~~~~~~
    Traceback (most recent call last):
      File "/usr/lib/python3.6/site-packages/tempest/lib/common/ssh.py", line 112, in _get_ssh_connection
        sock=proxy_chan)
      File "/usr/lib/python3.6/site-packages/paramiko/client.py", line 349, in connect
        retry_on_signal(lambda: sock.connect(addr))
      File "/usr/lib/python3.6/site-packages/paramiko/util.py", line 283, in retry_on_signal
        return function()
      File "/usr/lib/python3.6/site-packages/paramiko/client.py", line 349, in <lambda>
        retry_on_signal(lambda: sock.connect(addr))
    socket.timeout: timed out

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "/usr/lib/python3.6/site-packages/tempest/common/utils/__init__.py", line 89, in wrapper
        return f(*func_args, **func_kwargs)
      File "/usr/lib/python3.6/site-packages/tempest/scenario/test_network_basic_ops.py", line 490, in test_connectivity_between_vms_on_different_networks
        self._check_public_network_connectivity(should_connect=True)
      File "/usr/lib/python3.6/site-packages/tempest/scenario/test_network_basic_ops.py", line 213, in _check_public_network_connectivity
        message, server, mtu=mtu)
      File "/usr/lib/python3.6/site-packages/tempest/scenario/manager.py", line 833, in check_vm_connectivity
        server=server)
      File "/usr/lib/python3.6/site-packages/tempest/scenario/manager.py", line 596, in get_remote_client
        linux_client.validate_authentication()
      File "/usr/lib/python3.6/site-packages/tempest/lib/common/utils/linux/remote_client.py", line 59, in wrapper
        six.reraise(*original_exception)
      File "/usr/local/lib/python3.6/site-packages/six.py", line 703, in reraise
        raise value
      File "/usr/lib/python3.6/site-packages/tempest/lib/common/utils/linux/remote_client.py", line 32, in wrapper
        return function(self, *args, **kwargs)
      File "/usr/lib/python3.6/site-packages/tempest/lib/common/utils/linux/remote_client.py", line 115, in validate_authentication
        self.ssh_client.test_connection_auth()
      File "/usr/lib/python3.6/site-packages/tempest/lib/common/ssh.py", line 216, in test_connection_auth
        connection = self._get_ssh_connection()
      File "/usr/lib/python3.6/site-packages/tempest/lib/common/ssh.py", line 128, in _get_ssh_connection
        password=self.password)
    tempest.lib.exceptions.SSHTimeout: Connection to the 192.168.24.123 via SSH timed out.
    User: cirros, Password: None
~~~

https://logserver.rdoproject.org/openstack-component-octavia/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-scenario010-kvm-standalone-octavia-ussuri/ca4e088/logs/undercloud/var/log/tempest/tempest_run.log.txt.gz
~~~
{0} neutron_tempest_plugin.scenario.admin.test_floatingip.FloatingIpTestCasesAdmin.test_two_vms_fips [335.920921s] ... FAILED

Captured traceback:
~~~~~~~~~~~~~~~~~~~
    Traceback (most recent call last):
      File "/usr/lib/python3.6/site-packages/tempest/lib/common/ssh.py", line 112, in _get_ssh_connection
        sock=proxy_chan)
      File "/usr/lib/python3.6/site-packages/paramiko/client.py", line 349, in connect
        retry_on_signal(lambda: sock.connect(addr))
      File "/usr/lib/python3.6/site-packages/paramiko/util.py", line 283, in retry_on_signal
        return function()
      File "/usr/lib/python3.6/site-packages/paramiko/client.py", line 349, in <lambda>
        retry_on_signal(lambda: sock.connect(addr))
    socket.timeout: timed out

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "/usr/lib/python3.6/site-packages/neutron_tempest_plugin/scenario/admin/test_floatingip.py", line 105, in test_two_vms_fips
        servers=servers)
      File "/usr/lib/python3.6/site-packages/neutron_tempest_plugin/scenario/base.py", line 373, in check_remote_connectivity
        timeout=timeout, pattern=pattern))
      File "/usr/lib/python3.6/site-packages/neutron_tempest_plugin/scenario/base.py", line 362, in _check_remote_connectivity
        ping_remote, timeout or CONF.validation.ping_timeout, 1)
      File "/usr/lib/python3.6/site-packages/tempest/lib/common/utils/test_utils.py", line 107, in call_until_true
        if func(*args, **kwargs):
      File "/usr/lib/python3.6/site-packages/neutron_tempest_plugin/scenario/base.py", line 346, in ping_remote
        pattern=pattern)
      File "/usr/lib/python3.6/site-packages/neutron_tempest_plugin/scenario/base.py", line 340, in ping_host
        return source.exec_command(cmd)
      File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 292, in wrapped_f
        return self.call(f, *args, **kw)
      File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 358, in call
        do = self.iter(retry_state=retry_state)
      File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 319, in iter
        return fut.result()
      File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 425, in result
        return self.__get_result()
      File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 384, in __get_result
        raise self._exception
      File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 361, in call
        result = fn(*args, **kwargs)
      File "/usr/lib/python3.6/site-packages/neutron_tempest_plugin/common/ssh.py", line 178, in exec_command
        return super(Client, self).exec_command(cmd=cmd, encoding=encoding)
      File "/usr/lib/python3.6/site-packages/tempest/lib/common/ssh.py", line 158, in exec_command
        ssh = self._get_ssh_connection()
      File "/usr/lib/python3.6/site-packages/tempest/lib/common/ssh.py", line 128, in _get_ssh_connection
        password=self.password)
    tempest.lib.exceptions.SSHTimeout: Connection to the 192.168.24.145 via SSH timed out.
    User: cirros, Password: None
~~~

Revision history for this message
Sagi (Sergey) Shnaidman (sshnaidm) wrote :

There is segfaulting ovn-controller with dump core:

Dec 27 15:26:41 standalone.localdomain systemd-coredump[443682]: Process 416559 (ovn-controller) of user 0 dumped core.

                                                                 Stack trace of thread 8:
                                                                 #0 0x00005595ae3cc3bf n/a (/usr/bin/ovn-controller)

https://logserver.rdoproject.org/openstack-component-octavia/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-scenario010-kvm-standalone-octavia-ussuri/ca4e088/logs/undercloud/var/log/extra/journal.txt.gz

https://logserver.rdoproject.org/openstack-component-octavia/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-scenario010-kvm-standalone-octavia-ussuri/ca4e088/logs/undercloud/var/log/containers/stdouts/ovn_controller.log.txt.gz

2020-12-27T15:26:40.528773825+00:00 stderr F 2020-12-27T15:26:40Z|00197|binding|INFO|Releasing lport 5e483a0b-09d4-4bda-a914-d473b45e34d5 from this chassis.
2020-12-27T15:26:40.532723714+00:00 stderr F 2020-12-27T15:26:40Z|00198|patch|WARN|Bridge 'br-tenant' not found for network 'tenant'
2020-12-27T15:26:40.564462915+00:00 stderr F 2020-12-27T15:26:40Z|00001|fatal_signal(stopwatch2)|WARN|terminating with signal 11 (Segmentation fault)

Also errrors in mysql:
2020-12-27 15:26:45 701 [ERROR] InnoDB: WSREP: referenced FK check fail: Lock wait index `subnet_id` table `ovs_neutron`.`ipallocations`

https://logserver.rdoproject.org/openstack-component-octavia/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-scenario010-kvm-standalone-octavia-ussuri/ca4e088/logs/undercloud/var/log/containers/mysql/mysqld.log.txt.gz

tags: added: promotion-blocker
Changed in tripleo:
milestone: wallaby-2 → wallaby-3
Revision history for this message
Sandeep Yadav (sandeepyadav93) wrote :

Still failing with:-

https://logserver.rdoproject.org/openstack-component-octavia/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-scenario010-kvm-standalone-octavia-master/9c7ef29/logs/undercloud/var/log/tempest/tempest_run.log.txt.gz

~~~
{1} tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_connectivity_between_vms_on_different_networks [204.691583s] ... FAILED

Captured traceback:
~~~~~~~~~~~~~~~~~~~
    Traceback (most recent call last):
      File "/usr/lib/python3.6/site-packages/tempest/common/utils/__init__.py", line 90, in wrapper
        return f(*func_args, **func_kwargs)
      File "/usr/lib/python3.6/site-packages/tempest/scenario/test_network_basic_ops.py", line 490, in test_connectivity_between_vms_on_different_networks
        self._check_public_network_connectivity(should_connect=True)
      File "/usr/lib/python3.6/site-packages/tempest/scenario/test_network_basic_ops.py", line 213, in _check_public_network_connectivity
        message, server, mtu=mtu)
      File "/usr/lib/python3.6/site-packages/tempest/scenario/manager.py", line 837, in check_vm_connectivity
        msg=msg)
      File "/usr/lib/python3.6/site-packages/unittest2/case.py", line 705, in assertTrue
        raise self.failureException(msg)
    AssertionError: False is not true : Public network connectivity check failed
    Timed out waiting for 192.168.24.146 to become reachable

~~~

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Today I was investigating that issue on rdo node. It seems for me that for some reason when virt_type=kvm in nova-compute vms aren't spawned properly and are like "hang". Due to that they aren't reachable.
When I manually changed virt_type to qemu, vms are spawned and rechable properly.
I think that someone from compute dfg should take a look at that issue now.

Changed in tripleo:
milestone: wallaby-3 → wallaby-rc1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/788678

Changed in tripleo:
milestone: wallaby-rc1 → xena-1
Changed in tripleo:
milestone: xena-1 → xena-2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (master)

Change abandoned by "Slawek Kaplonski <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/788678

Changed in tripleo:
milestone: xena-2 → xena-3
Revision history for this message
Alan Pevec (apevec) wrote :

nested KVM is now working in the RDO Vexxhost cloud

Changed in tripleo:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.