CDK 1.28 control plane on lxd running Calico needs access to /sys/fs/bpf

Bug #2034448 reported by Gustavo Sanchez
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Kubernetes Control Plane Charm
Fix Released
High
Mateo Florido
Kubernetes Worker Charm
Fix Released
Medium
Kevin W Monroe

Bug Description

In CK 1.27 there was no calico-node pod. The charm ran calico-node as a systemd service. In CK 1.28 it switched to hosting Calico via DaemonSet with pods, which requires access to /sys/fs

https://github.com/charmed-kubernetes/charm-calico/blob/main/upstream/calico/manifests/3.25.1/calico-etcd.yaml#L319

So, in case k8s-control-plane is running on LXD it will show error logs since it has no write access to that mount point

========== Error logs
Every 2.0s: kubectl get po -n kube-system -owide --sort-by .metadata.creationTimestamp infra1.cloud.rc.uab.edu: Tue Sep 5 16:23:35 2023

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
k8s-keystone-auth-54765687bb-zsv7l 1/1 Running 0 3d21h 10.128.118.66 k8s-worker-02 <none> <none>
kube-state-metrics-5b95b4459c-dv5mp 1/1 Running 2 (51m ago) 40h 10.128.95.166 dgx06 <none> <none>
coredns-6c8cf5f87b-2kmxx 1/1 Running 0 13h 10.128.95.172 dgx06 <none> <none>
metrics-server-v0.5.2-6bfd958b56-6hb8t 2/2 Running 0 13h 10.128.95.178 dgx06 <none> <none>
k8s-keystone-auth-54765687bb-dd882 1/1 Running 0 11h 10.128.155.86 dgx08 <none> <none>
calico-node-dshdc 0/1 Init:0/2 0 33m 192.168.20.125 dgx07 <none> <none>
calico-node-n7l6f 1/1 Running 0 33m 192.168.20.73 dgx06 <none> <none>
calico-node-rg72d 0/1 Init:1/2 0 33m 192.168.20.202 k8s-worker-04 <none> <none>
calico-node-rnspn 0/1 Init:0/2 0 33m 192.168.20.138 dgx08 <none> <none>
calico-node-v9mf9 0/1 Init:1/2 0 33m 192.168.20.32 k8s-worker-01 <none> <none>
calico-node-krnhj 0/1 Init:1/2 0 33m 192.168.20.168 juju-712203-2-lxd-2 <none> <none>
calico-kube-controllers-5d68647db9-d6ldl 0/1 ContainerCreating 0 33m 192.168.20.125 dgx07 <none> <none>
calico-node-bnnhb 1/1 Running 0 33m 192.168.20.103 dgx05 <none> <none>
calico-node-7zkss 1/1 Running 0 33m 192.168.20.203 k8s-worker-03 <none> <none>
calico-node-6r7c8 0/1 Init:0/2 0 33m 192.168.20.136 k8s-worker-02 <none> <none>
calico-node-wzr8r 0/1 Init:0/2 0 33m 192.168.20.93 juju-712203-1-lxd-2 <none> <none>
calico-node-nnwsm 0/1 Init:CreateContainerError 0 2m7s 192.168.20.167 juju-712203-0-lxd-3 <none> <none>

$ kubectl describe pod calico-node-nnwsm -n kube-system
# [..]
Events:
  Type Reason Age From Message
  ---- ------ ---- ---- -------
  Normal Scheduled 26s default-scheduler Successfully assigned kube-system/calico-node-nnwsm to juju-712203-0-lxd-3
  Normal Pulled 25s kubelet Container image "rocks.canonical.com:443/cdk/calico/cni:v3.25.1" already present on machine
  Normal Created 25s kubelet Created container install-cni
  Normal Started 25s kubelet Started container install-cni
  Warning Failed 23s kubelet Error: failed to generate container "82ccbbba0e5f5acda72b3e6fde90a966027eb021577eef02c33d8e8e8c2c3339" spec: failed to generate spec: path "/sys/fs/" is mounted on "/sys" but it is not a shared mount
  Warning DNSConfigForming 12s (x5 over 25s) kubelet Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 192.168.20.11 192.168.20.14 192.168.20.15
  Normal Pulled 12s (x2 over 23s) kubelet Container image "rocks.canonical.com:443/cdk/calico/node:v3.25.1" already present on machine
  Warning Failed 12s kubelet Error: failed to generate container "ba3a8598b2f954b96d90ce520382876770a85b8b5f88ffc56d0c4ab4bc1d394d" spec: failed to generate spec: path "/sys/fs/" is mounted on "/sys" but it is not a shared mount

summary: - CDK 1.28 control plane on lxd running Calico needs access to /sys/fs
+ CDK 1.28 control plane on lxd running Calico needs access to /sys/fs/bpf
Revision history for this message
Gustavo Sanchez (gustavosr98) wrote :

This is the workaround that worked for me
https://github.com/charmed-kubernetes/charm-kubernetes-control-plane/pull/301

Not sure if this is the best approach to expose /sys/fs/bpf to the container of if it would be better to change the mount point of the Calico pod
https://github.com/charmed-kubernetes/charm-calico/blob/main/upstream/calico/manifests/3.25.1/calico-etcd.yaml#L319

Revision history for this message
Aleksandr Mikhalitsyn (mihalicyn) wrote :

Small remark. If you expose /sys/fs/bpf from the host is can be a security issue. For example if we have a bpf iterator program loaded on the host and then pinned to a file in /sys/fs/bpf then this file will be accessible from inside the container and container can read data that should not supposed to be accessible by anyone except the host root user.

Changed in charm-kubernetes-master:
status: New → Triaged
importance: Undecided → High
milestone: none → 1.28+ck1
Changed in charm-kubernetes-master:
assignee: nobody → Mateo Florido (mateoflorido)
Revision history for this message
Mateo Florido (mateoflorido) wrote :

Our current LXD profile for KCP[1] and KW[2] has been adjusted to be more permissive in order to accommodate the specific needs of Calico (and other CNIs). While we prioritize offering the best performance, our CNIs often requires the use of privileged containers. Our commitment to delivering the latest features in our CNIs sometimes means relaxing on strict security measures in our profiles. However, we acknowledge the potential security implications and will ensure that our documentation clearly highlights any concerns related to CK operating in LXD privileged environments

[1] https://github.com/charmed-kubernetes/charm-kubernetes-control-plane/blob/main/lxd-profile.yaml
[2] https://github.com/charmed-kubernetes/charm-kubernetes-worker/blob/main/lxd-profile.yaml

Revision history for this message
Kevin W Monroe (kwmonroe) wrote :

PR for k8s-worker to match pr from comment #1. Both k8s-worker and k-c-p should stay in-sync with respect to their lxd profiles.

Changed in charm-kubernetes-master:
status: Triaged → Fix Committed
Changed in charm-kubernetes-worker:
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Kevin W Monroe (kwmonroe)
milestone: none → 1.28+ck1
Revision history for this message
Kevin W Monroe (kwmonroe) wrote :

Unfortunately we need to back this out. Juju doesn't allow disk devices in lxd-profile:

https://juju.is/docs/juju/use-lxd-profiles

And we won't require --force to deploy by default. We'll either need to find an alternative to /sys/fs/bpf mounting or note custom profiles + force as a post-deployment workaround.

https://github.com/charmed-kubernetes/charm-kubernetes-control-plane/pull/302

Changed in charm-kubernetes-worker:
status: In Progress → Triaged
Changed in charm-kubernetes-master:
status: Fix Committed → Triaged
Revision history for this message
John A Meinel (jameinel) wrote :

For Juju something like /sys/fs/bpf probably falls into a trust category. It is plausible that you would want to deploy something into a container that still needs *some* amount of elevated privileges to operate, but you don't want to give it full root on the host. Juju doesn't yet model finer grained privileges, but we have absolutely been discussing it. (Equivalent to the slots/plugs model of snaps where they are contained, but still given enough permissions to do what they need to get the job done, with that vetted by a Canonical assertion that this snap is safe to automatically connect its plugs.)

That is probably at least a year off. It also is something that we should discuss whether it actually does make sense to run in a container, given that you can mess up the host machine with bad rules here.

Revision history for this message
John A Meinel (jameinel) wrote :

The main reason that juju doesn't allow mounting filesystems and block devices is because we don't want charms to assume host disk configurations (we don't want the Charm to have a profile that assumes /dev/sdb is the right place to get storage). But /sys/fs/bpf is going to exist everywhere.

Revision history for this message
Adam Dyess (addyess) wrote :
Adam Dyess (addyess)
Changed in charm-kubernetes-master:
status: Triaged → In Progress
Changed in charm-kubernetes-worker:
status: Triaged → In Progress
Adam Dyess (addyess)
tags: added: backport-needed
Adam Dyess (addyess)
tags: removed: backport-needed
Changed in charm-kubernetes-master:
status: In Progress → Fix Committed
Changed in charm-kubernetes-worker:
status: In Progress → Fix Committed
Adam Dyess (addyess)
Changed in charm-kubernetes-master:
status: Fix Committed → Fix Released
Changed in charm-kubernetes-worker:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.