When restarting stateful set deployment, Pods get stuck waiting for Cinder backed PVC
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
CDK Addons |
Incomplete
|
High
|
Unassigned |
Bug Description
Certain deployments are not able restart their pcv-attached pods without incurring long wait times (between 15 minutes and 8 hours) due to waiting for the cinder backed PVC.
This is reproducible using helm chart of next cloud:
1- First install helm
2- Then add the official stable helm repository
$ helm repo add stable https:/
3- Install nextcloud
$ helm install nextcloud stable/nextcloud --set persistence.
4- wait for the pod to be ready, then run the following (the deployment name should be the same as the release name above):
$ kubectl rollout restart deployment nextcloud
5- The old pod will be killed, then the new one gets stuck at ContainerCreating, describing it should show the error "Unable to attach or mount volumes: unmounted volumes=[html], unattached volumes=
In this example, we observe that the remains stuck for ~15 minutes however our customer has other stateful sets which experience this issue and much have longer wait times.
# From $kubectl cluster-info dump - and searching for 'nextcloud' we see this:
I0407 15:04:28.304113 1 event.go:221] Event(v1.
ResourceVersion
I0407 15:04:37.295060 1 controller.go:671] successfully created PV {GCEPersistentD
nil FlexVolume:nil AzureFile:nil VsphereVolume:nil Quobyte:nil AzureDisk:nil PhotonPersisten
tack.org,
llerPublishSecr
I0407 15:04:37.295547 1 controller.go:1026] provision "canonical/
I0407 15:04:37.295927 1 controller.go:1040] provision "canonical/
I0407 15:04:37.304034 1 controller.go:1047] provision "canonical/
I0407 15:04:37.304308 1 controller.go:1088] provision "canonical/
I0407 15:04:37.304670 1 event.go:221] Event(v1.
ResourceVersion
I0407 15:07:41.677034 1 controller.go:1097] delete "pvc-031bbaec-
E0407 15:07:41.908855 1 controller.go:1120] delete "pvc-031bbaec-
9ef03607f", it's still attached to a node
W0407 15:07:41.909044 1 controller.go:726] Retrying syncing volume "pvc-031bbaec-
E0407 15:07:41.909167 1 controller.go:741] error syncing volume "pvc-031bbaec-
it's still attached to a node
I0407 15:07:41.909515 1 event.go:221] Event(v1.
:"v1", ResourceVersion
ed to a node
Although seemingly similar in effect, this does NOT appear related to LP1853566 because we confirmed that the volume is mounted on the kubernetes worker, as well as contains the supported symlink in /dev/drive/by-id.
$ lsblk |grep vdk
vdk 252:160 0 8G 0 disk /var/lib/
$ ls -al /dev/disk/by-id/ | grep vdk
lrwxrwxrwx 1 root root 9 Apr 7 16:42 virtio-
Changed in cdk-addons: | |
importance: | Undecided → High |
status: | New → Triaged |
Changed in cdk-addons: | |
assignee: | nobody → Cory Johns (johnsca) |
Changed in cdk-addons: | |
importance: | High → Medium |
Changed in cdk-addons: | |
assignee: | Cory Johns (johnsca) → nobody |
~field-high is subscribed to this