AIO-SX: armada pod stuck in Unknown after host-lock/unlock
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Angie Wang |
Bug Description
Brief Description
-----------------
After a reboot or lock/unlock of an AIO-SX, Armada pod stuck in an unknown state and does not recover.
Same issue with the following LPs but this impacts Armada pod
https:/
https:/
Severity
--------
Medium
Steps to Reproduce
------------------
Apply stx-openstack application to an AIO-SX
system host-lock controller-0
system host-unlock controller-0
Expected Behavior
------------------
All pods should recover and be in a ready/running state shortly after the controller recovers.
Actual Behavior
----------------
Armada pod stuck in unknown state
Reproducibility
---------------
Intermittent - seen rarely
System Configuration
-------
AIO-SX
Branch/Pull Time/Commit
-------
stx master
Timestamp/Logs
--------------
[2021-04-21 19:50:21,796] 314 DEBUG MainThread ssh.send :: Send 'kubectl get pod --all-namespaces --field-
[2021-04-21 19:50:22,133] 436 DEBUG MainThread ssh.expect :: Output:
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
armada armada-
Warning FailedMount 105m kubelet, controller-0 Unable to attach or mount volumes: unmounted volumes=
Warning FailedMount 103m kubelet, controller-0 Unable to attach or mount volumes: unmounted volumes=
Warning FailedMount 97m kubelet, controller-0 Unable to attach or mount volumes: unmounted volumes=
Warning FailedMount 37m (x22 over 101m) kubelet, controller-0 Unable to attach or mount volumes: unmounted volumes=
Warning FailedMount 32m (x43 over 108m) kubelet, controller-0 MountVolume.SetUp failed for volume "armada-etc" : stat /var/lib/
Warning FailedMount 18m kubelet, controller-0 Unable to attach or mount volumes: unmounted volumes=
Warning FailedMount 8m11s (x3 over 14m) kubelet, controller-0 Unable to attach or mount volumes: unmounted volumes=
Warning FailedMount 4m4s (x3 over 16m) kubelet, controller-0 Unable to attach or mount volumes: unmounted volumes=
Warning FailedMount 2m (x3 over 20m) kubelet, controller-0 Unable to attach or mount volumes: unmounted volumes=
Warning FailedMount 103s (x18 over 22m) kubelet, controller-0 MountVolume.SetUp failed for volume "armada-etc" : stat /var/lib/
Test Activity
-------------
Sanity
Workaround
----------
Delete the unknown pod
CVE References
Changed in starlingx: | |
assignee: | nobody → Angie Wang (angiewang) |
description: | updated |
Changed in starlingx: | |
status: | New → In Progress |
Changed in starlingx: | |
importance: | Undecided → Medium |
tags: | added: stx.6.0 stx.containers |
Reviewed: https:/ /review. opendev. org/c/starlingx /integ/ +/790530 /opendev. org/starlingx/ integ/commit/ 03665ae745babb4 524e2b9b9cc0f76 8eaf1e8781
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit 03665ae745babb4 524e2b9b9cc0f76 8eaf1e8781
Author: Angie Wang <email address hidden>
Date: Mon May 10 18:54:07 2021 -0400
Add armada namespace in k8s pod recovery
Update k8s pod recovery service to include armada namespace
so armada pod that stuck in an unknown state after host
lock/unlock or reboot could be recovered by the service.
Change-Id: Iacd92637a9b4fc af4c0076e922e1b d739f69a584
Closes-Bug: 1928018
Signed-off-by: Angie Wang <email address hidden>