grenade job fails randomly with Failed to shutdown nova
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
devstack |
Fix Released
|
Undecided
|
Unassigned | ||
grenade |
Invalid
|
Undecided
|
Unassigned |
Bug Description
Fails as below:-
2023-05-22 14:02:17.379 | + /opt/stack/
2023-05-22 14:02:17.379 | + /opt/stack/
2023-05-22 14:02:17.379 | [Call Trace]
2023-05-22 14:02:17.379 | /opt/stack/
2023-05-22 14:02:17.379 | /opt/stack/
2023-05-22 14:02:17.383 | [ERROR] /opt/stack/
2023-05-22 14:02:18.386 | 1 die /opt/stack/
2023-05-22 14:02:18.386 | 159 ensure_
2023-05-22 14:02:18.386 | 30 main /opt/stack/
2023-05-22 14:02:18.386 | Exit code: 1
2023-05-22 14:02:18.427 | /opt/stack/
2023-05-22 14:02:18.427 | from distutils import spawn
2023-05-22 14:02:22.763 | + inc/plugin:
2023-05-22 14:02:22.765 | + /opt/stack/
2023-05-22 14:02:22.768 | [Call Trace]
2023-05-22 14:02:22.768 | ./grenade.
2023-05-22 14:02:22.768 | /opt/stack/
2023-05-22 14:02:22.772 | [ERROR] /opt/stack/
2023-05-22 14:02:23.775 | 1 die /opt/stack/
2023-05-22 14:02:23.775 | 87 shutdown_services /opt/stack/
2023-05-22 14:02:23.775 | 322 main ./grenade.sh
2023-05-22 14:02:23.775 | Exit code: 1
2023-05-22 14:02:23.816 | /opt/stack/
2023-05-22 14:02:23.816 | from distutils import spawn
As per opensearch[1] seen 28 times in last 6 days.
From nova-api logs:-
May 22 13:57:57.907653 np0034096442 <email address hidden>[51767]: Mon May 22 13:57:57 2023 - graceful shutdown triggered...
May 22 13:57:57.908142 np0034096442 <email address hidden>[51768]: Gracefully killing worker 1 (pid: 51768)...
May 22 13:57:57.908545 np0034096442 <email address hidden>[51769]: Gracefully killing worker 2 (pid: 51769)...
May 22 13:57:57.908568 np0034096442 systemd[1]: Stopping Devstack <email address hidden>...
May 22 13:57:58.100867 np0034096442 <email address hidden>[51769]: INFO oslo.messaging.
May 22 13:57:58.100867 np0034096442 <email address hidden>[51769]: Mon May 22 13:57:58 2023 - SIGPIPE: writing to a closed pipe/socket/fd (probably the client disconnected) !!!
May 22 13:57:58.908695 np0034096442 <email address hidden>[51767]: worker 2 buried after 1 seconds
May 22 13:59:28.028620 np0034096442 systemd[1]: <email address hidden>: State 'stop-sigterm' timed out. Killing.
May 22 13:59:28.028725 np0034096442 systemd[1]: <email address hidden>: Killing process 51767 (uwsgi) with signal SIGKILL.
May 22 13:59:28.030085 np0034096442 systemd[1]: <email address hidden>: Main process exited, code=killed, status=9/KILL
May 22 13:59:28.030115 np0034096442 systemd[1]: <email address hidden>: Failed with result 'timeout'.
May 22 13:59:28.030184 np0034096442 systemd[1]: <email address hidden>: Unit process 51768 (uwsgi) remains running after unit stopped.
May 22 13:59:28.030600 np0034096442 systemd[1]: Stopped Devstack <email address hidden>.
May 22 13:59:28.031055 np0034096442 systemd[1]: <email address hidden>: Consumed 26.991s CPU time.
This looks side effect of other bug https:/
May 22 13:48:25.458107 np0034096442 <email address hidden>[51768]: DEBUG nova.api.
Changed in devstack: | |
status: | In Progress → Fix Released |
Changed in grenade: | |
status: | New → Invalid |
Opensearch result to have fresh logs: /opensearch. logs.openstack. org/_dashboards /app/discover/ ?security_ tenant= global# /?_g=(filters: !(),refreshInte rval:(pause: !t,value: 0),time: (from:now- 30d,to: now))&_ a=(columns: !(build_ status, build_name) ,filters: !(),index: '94869730- aea8-11ec- 9e6a-83741af3fd cd',interval: auto,query: (language: kuery,query: 'message: %22The% 20following% 20services% 20are%20still% 20running: %20%20nova- api%22' ),sort: !())
https:/