Version check is needed before injecting DevicePlugins feature gate

Bug #2038970 reported by Dagmawi Biru
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Kubernetes Control Plane Charm
In Progress
High
Chris Johnston
Kubernetes Worker Charm
In Progress
High
Chris Johnston

Bug Description

Charm: kubernetes-worker
Channel: 1.28/stable

Problem:
Newly deployed kubnernetes-worker unit/node is not starting up correctly, kubelet is crashing
with this error:

---
"command failed" err="failed to set feature gates from initial flags-based config: unrecognized feature gate: DevicePlugins"
snap.kubelet.daemon.service: Main process exited, code=exited, status=1/FAILURE
snap.kubelet.daemon.service: Failed with result 'exit-code'.
---

The cause for this appears to be the following

1. The "DevicePlugins" feature gate has been incorporated into k8s as of 1.28 as a generally available feature
https://github.com/kubernetes/kubernetes/pull/117656

2. However, the kubernetes-worker charm still inserts an invalid DevicePlugins gate in the kubelet config, which is leading to the kubelet daemon failing to start

3. This configuration is passed here
https://github.com/charmed-kubernetes/layer-kubernetes-common/blob/e329ebda02b38d94a0ef8eebc07c88b4bda79b9b/lib/charms/layer/kubernetes_common.py#L1172

The only check for this config to be added is for it to be a gpu enabled worker node. No check of the version is being made, hence the errors when trying to start kubelet on the node.

The charm needs to check that
----
A) The worker node has GPU enabled
B) The k8s version is at a version where the feature gate would be needed to be passed

Dagmawi Biru (dagbiru)
summary: - Version checked is needed before injecting DevicePlugins feature gate
+ Version check is needed before injecting DevicePlugins feature gate
Revision history for this message
Chris Johnston (cjohnston) wrote :
tags: added: review-needed
Changed in charm-kubernetes-worker:
status: New → Confirmed
Adam Dyess (addyess)
Changed in charm-kubernetes-master:
status: New → Confirmed
Changed in charm-kubernetes-worker:
milestone: none → 1.29
Changed in charm-kubernetes-master:
milestone: none → 1.29
tags: added: backport-needed
removed: review-needed
Adam Dyess (addyess)
tags: removed: backport-needed
Changed in charm-kubernetes-master:
status: Confirmed → Fix Committed
Changed in charm-kubernetes-worker:
status: Confirmed → Fix Committed
Changed in charm-kubernetes-master:
assignee: nobody → Chris Johnston (cjohnston)
Changed in charm-kubernetes-worker:
assignee: nobody → Chris Johnston (cjohnston)
Adam Dyess (addyess)
Changed in charm-kubernetes-master:
status: Fix Committed → In Progress
Changed in charm-kubernetes-worker:
status: Fix Committed → In Progress
George Kraft (cynerva)
Changed in charm-kubernetes-master:
importance: Undecided → High
Changed in charm-kubernetes-worker:
importance: Undecided → High
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.