periodic-tripleo-ci-centos-9-ovb-1ctlr_2comp-featureset020-wallaby is failing tempest.scenario.test_network_advanced_server_ops.TestNetworkAdvancedServerOps -failed to reach VERIFY_RESIZE status and task state "None" within the required time

Bug #2019507 reported by Ronelle Landy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

periodic-tripleo-ci-centos-9-ovb-1ctlr_2comp-featureset020-wallaby has bee failing tempest tests:

test_server_connectivity_cold_migration[compute,id-a4858f6c-401e-4155-9a49-d5cd053d1a2f,network,slow]
 fail
test_server_connectivity_cold_migration_revert[compute,id-25b188d7-0183-4b1e-a11d-15840c8e2fd6,network,slow]
 fail
test_server_connectivity_resize[compute,id-719eb59d-2f42-4b66-b8b1-bb1254473967,network,slow]

with traces:

Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/tempest/common/utils/__init__.py", line 70, in wrapper
    return f(*func_args, **func_kwargs)
  File "/usr/lib/python3.9/site-packages/tempest/scenario/test_network_advanced_server_ops.py", line 199, in test_server_connectivity_resize
    waiters.wait_for_server_status(self.servers_client, server['id'],
  File "/usr/lib/python3.9/site-packages/tempest/common/waiters.py", line 101, in wait_for_server_status
    raise lib_exc.TimeoutException(message)
tempest.lib.exceptions.TimeoutException: Request timed out
Details: (TestNetworkAdvancedServerOps:test_server_connectivity_resize) Server 5ca7ac10-01c6-4b74-9e91-63cc6ea888dd failed to reach VERIFY_RESIZE status and task state "None" within the required time (300 s). Current status: ACTIVE. Current task state: None.

Related logs:

https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-ovb-1ctlr_2comp-featureset020-wallaby/6af54cd/logs/undercloud/var/log/tempest/stestr_results.html.gz

https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-ovb-1ctlr_2comp-featureset020-wallaby/63e1119/logs/undercloud/var/log/tempest/stestr_results.html.gz

https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-ovb-1ctlr_2comp-featureset020-wallaby/dc2d34e/logs/undercloud/var/log/tempest/stestr_results.html.gz

Revision history for this message
Ronelle Landy (rlandy) wrote :
Changed in tripleo:
milestone: none → antelope-1
importance: Undecided → Critical
status: New → Triaged
tags: added: promotion-blocker
Revision history for this message
Marios Andreou (marios-b) wrote (last edit ):

i see some neutron server error logs in the controller errors.txt [1]. I also sanity checked with a green run at [2] and they aren't there so it seems related:

2023-05-14 00:36:28.911 ERROR /var/log/containers/neutron/server.log: 15 ERROR neutron.plugins.ml2.managers [req-af23228f-547b-4023-b933-5c0fceb32c66 - - - - -] Mechanism driver 'ovn' failed in update_port_postcommit: oslo_db.exception.DBReferenceError: (pymysql.err.IntegrityError) (1452, 'Cannot add or update a child row: a foreign key constraint fails (`ovs_neutron`.`ovn_revision_numbers`, CONSTRAINT `ovn_revision_numbers_ibfk_1` FOREIGN KEY (`standard_attr_id`) REFERENCES `standardattributes` (`id`) ON DELETE SET NULL)')

looking on the compute node [3] you can see the resize error is indeed due to ssh problem so might be network related:

2023-05-13 23:51:46.256 ERROR /var/log/containers/nova/nova-compute.log: 2 ERROR oslo_messaging.rpc.server [req-d36d6991-2599-4b40-9586-31bbe1126f61 d4dee3e61d514d0cb74e49b3e799164b d93ab13606cb468d98bff5635c82be00 - default default] Exception during message handling: nova.exception.ResizeError: Resize error: not able to execute ssh command: Unexpected error while running command.

[1] https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-ovb-1ctlr_2comp-featureset020-wallaby/6af54cd/logs/overcloud-controller-0/var/log/extra/errors.txt.gz

[2] (green run for sanity check) https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-ovb-1ctlr_2comp-featureset020-wallaby/0fa8583/logs/overcloud-controller-0/var/log/extra/errors.txt.gz

[3] https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-ovb-1ctlr_2comp-featureset020-wallaby/6af54cd/logs/overcloud-novacompute-0/var/log/extra/errors.txt.gz

Revision history for this message
yatin (yatinkarel) wrote :

Actual Error seems to be:-
2023-05-14 23:53:59.322 2 INFO nova.compute.manager [req-6a2a14ed-8e72-4667-8343-b78bced3f35e 4999fb34bfb24deba84baf1dcafa7c81 9ce6b58fff5f4dcaaea1e9dca253eb00 - default default] [instance: e73c2f3a-a4e8-43f6-a328-4ce2364b1269] Setting instance back to active after: Instance rollback performed due to: Resize error: not able to execute ssh command: Unexpected error while running command.
Command: ssh -o BatchMode=yes 172.17.0.31 mkdir -p /var/lib/nova/instances/e73c2f3a-a4e8-43f6-a328-4ce2364b1269
Exit code: 1
Stdout: 'This account is currently not available.\n'
Stderr: 'Could not chdir to home directory /home/nova_migration: No such file or directory\n'

Found https://review.opendev.org/c/openstack/tripleo-common/+/882335 which recently added the user in container images and that looks related as home directory is not created as part of useradd:-
useradd -l -M --shell /usr/sbin/nologin --uid 989 --gid 989 nova_migration

@Bogdan can you please check it.

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-common (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/tripleo-common/+/883156

Revision history for this message
Marios Andreou (marios-b) wrote :

trying to test Bogdan's fix at https://review.rdoproject.org/r/c/testproject/+/48598
it includes the container build to pickup the fix

Revision history for this message
Marios Andreou (marios-b) wrote :

oh ... i see it was already being tested at https://review.rdoproject.org/r/c/testproject/+/38646 but the bug was not updated... I will abandon my test

Revision history for this message
Ronelle Landy (rlandy) wrote :

Latest result still shows the error:

https://logserver.rdoproject.org/56/36356/86/check/periodic-tripleo-ci-centos-9-ovb-1ctlr_2comp-featureset020-wallaby/605a4fb/logs/undercloud/var/log/tempest/stestr_results.html.gz

If the change is still not correctly included, let's merge and retry in the line. We need to unblock 17.1. If the change is detected, we have another issue.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-common (stable/wallaby)

Change abandoned by "Ronelle Landy <email address hidden>" on branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/tripleo-common/+/883156

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/tripleo-common/+/883156
Committed: https://opendev.org/openstack/tripleo-common/commit/cbb03c005727a6226757951df4cbc341acf7baa7
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit cbb03c005727a6226757951df4cbc341acf7baa7
Author: Bogdan Dobrelya <email address hidden>
Date: Mon May 15 15:16:15 2023 +0200

    Follow-up nova_migration user creation

    Follow the same pattern as other users specific for nova use.
    Make sure the home dir is specified (and is the same as in RDO).

    As the nova_migration user is given a shell /bin/bash
    in RDO packaging, support custom shell argument for TCIB as well.

    Co-Authored-By: Takashi Kajinami <email address hidden>
    Change-Id: Icc323212222ac1edd0edd221336adc424000e50e
    Closes-bug: #2019507
    Signed-off-by: Bogdan Dobrelya <email address hidden>

tags: added: in-stable-wallaby
Revision history for this message
Sandeep Yadav (sandeepyadav93) wrote :
Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.