grenade job fails randomly with Failed to shutdown nova

Bug #2020643 reported by yatin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
devstack
Fix Released
Undecided
Unassigned
grenade
Invalid
Undecided
Unassigned

Bug Description

Fails as below:-
2023-05-22 14:02:17.379 | + /opt/stack/new/grenade/inc/upgrade:ensure_services_stopped:159 : die 159 'The following services are still running: nova-api'
2023-05-22 14:02:17.379 | + /opt/stack/new/devstack/functions-common:die:264 : local exitcode=0
2023-05-22 14:02:17.379 | [Call Trace]
2023-05-22 14:02:17.379 | /opt/stack/new/grenade/projects/60_nova/shutdown.sh:30:ensure_services_stopped
2023-05-22 14:02:17.379 | /opt/stack/new/grenade/inc/upgrade:159:die
2023-05-22 14:02:17.383 | [ERROR] /opt/stack/new/grenade/inc/upgrade:159 The following services are still running: nova-api
2023-05-22 14:02:18.386 | 1 die /opt/stack/new/devstack/functions-common
2023-05-22 14:02:18.386 | 159 ensure_services_stopped /opt/stack/new/grenade/inc/upgrade
2023-05-22 14:02:18.386 | 30 main /opt/stack/new/grenade/projects/60_nova/shutdown.sh
2023-05-22 14:02:18.386 | Exit code: 1
2023-05-22 14:02:18.427 | /opt/stack/new/devstack/tools/worlddump.py:22: DeprecationWarning: The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives
2023-05-22 14:02:18.427 | from distutils import spawn
2023-05-22 14:02:22.763 | + inc/plugin:shutdown_services:87 : die 87 'Failed to shutdown nova'
2023-05-22 14:02:22.765 | + /opt/stack/new/devstack/functions-common:die:264 : local exitcode=1
2023-05-22 14:02:22.768 | [Call Trace]
2023-05-22 14:02:22.768 | ./grenade.sh:322:shutdown_services
2023-05-22 14:02:22.768 | /opt/stack/new/grenade/inc/plugin:87:die
2023-05-22 14:02:22.772 | [ERROR] /opt/stack/new/grenade/inc/plugin:87 Failed to shutdown nova
2023-05-22 14:02:23.775 | 1 die /opt/stack/new/devstack/functions-common
2023-05-22 14:02:23.775 | 87 shutdown_services /opt/stack/new/grenade/inc/plugin
2023-05-22 14:02:23.775 | 322 main ./grenade.sh
2023-05-22 14:02:23.775 | Exit code: 1
2023-05-22 14:02:23.816 | /opt/stack/new/devstack/tools/worlddump.py:22: DeprecationWarning: The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives
2023-05-22 14:02:23.816 | from distutils import spawn

As per opensearch[1] seen 28 times in last 6 days.

From nova-api logs:-
May 22 13:57:57.907653 np0034096442 <email address hidden>[51767]: Mon May 22 13:57:57 2023 - graceful shutdown triggered...
May 22 13:57:57.908142 np0034096442 <email address hidden>[51768]: Gracefully killing worker 1 (pid: 51768)...
May 22 13:57:57.908545 np0034096442 <email address hidden>[51769]: Gracefully killing worker 2 (pid: 51769)...
May 22 13:57:57.908568 np0034096442 systemd[1]: Stopping Devstack <email address hidden>...
May 22 13:57:58.100867 np0034096442 <email address hidden>[51769]: INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
May 22 13:57:58.100867 np0034096442 <email address hidden>[51769]: Mon May 22 13:57:58 2023 - SIGPIPE: writing to a closed pipe/socket/fd (probably the client disconnected) !!!
May 22 13:57:58.908695 np0034096442 <email address hidden>[51767]: worker 2 buried after 1 seconds
May 22 13:59:28.028620 np0034096442 systemd[1]: <email address hidden>: State 'stop-sigterm' timed out. Killing.
May 22 13:59:28.028725 np0034096442 systemd[1]: <email address hidden>: Killing process 51767 (uwsgi) with signal SIGKILL.
May 22 13:59:28.030085 np0034096442 systemd[1]: <email address hidden>: Main process exited, code=killed, status=9/KILL
May 22 13:59:28.030115 np0034096442 systemd[1]: <email address hidden>: Failed with result 'timeout'.
May 22 13:59:28.030184 np0034096442 systemd[1]: <email address hidden>: Unit process 51768 (uwsgi) remains running after unit stopped.
May 22 13:59:28.030600 np0034096442 systemd[1]: Stopped Devstack <email address hidden>.
May 22 13:59:28.031055 np0034096442 systemd[1]: <email address hidden>: Consumed 26.991s CPU time.

Example logs:-
https://954d3ddb67e757934983-a9cc155153d08dd30dfffbbf1d71d234.ssl.cf5.rackcdn.com/882865/2/check/neutron-ovs-grenade-multinode/fb542a8/controller/logs/grenade.sh_log.txt

This looks side effect of other bug https://bugs.launchpad.net/neutron/+bug/2015065 where nova-api worker get's stuck. Same symptoms from other bugs are seen when n-api fails to stop, ex below request is stuck:-
May 22 13:48:25.458107 np0034096442 <email address hidden>[51768]: DEBUG nova.api.openstack.wsgi [None req-15513090-c7b3-434e-94ba-f23d7fc40032 tempest-SecurityGroupRulesTestJSON-1483302824 tempest-SecurityGroupRulesTestJSON-1483302824-project-member] Calling method '<function Controller.__getattribute__.<locals>.version_select at 0x7fdfd652e710>' {{(pid=51768) _process_stack /opt/stack/old/nova/nova/api/openstack/wsgi.py:513}}

[1] https://opensearch.logs.openstack.org/_dashboards/app/discover/?security_tenant=global#/?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-30d,to:now))&_a=(columns:!(_source),filters:!(),index:'94869730-aea8-11ec-9e6a-83741af3fdcd',interval:auto,query:(language:kuery,query:'message:%22Failed%20to%20shutdown%20nova%22'),sort:!())

Revision history for this message
Lajos Katona (lajos-katona) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to devstack (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/devstack/+/886250

Changed in devstack:
status: New → In Progress
Changed in devstack:
status: In Progress → Fix Released
Changed in grenade:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.