[k8s][calico][fedora-atomic] Periodically lost connection from pod to apiserver
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Magnum | Status tracked in Rocky | |||||
Queens |
Fix Released
|
High
|
Feilong Wang | |||
Rocky |
Fix Released
|
High
|
Feilong Wang |
Bug Description
In my local devstack environment with k8s+calico running on fedora-atomic, I can see many restarts of kubernetes-
[fedora@
2018/03/21 01:42:25 Starting overwatch
2018/03/21 01:42:25 Using in-cluster config to connect to apiserver
2018/03/21 01:42:25 Using service account token for csrf signing
2018/03/21 01:42:25 No request provided. Skipping authorization
2018/03/21 01:42:35 Error while initializing connection to Kubernetes apiserver. This most likely means that the cluster is misconfigured (e.g., it has invalid apiserver certificates or service accounts configuration) or the --apiserver-host param points to a server that does not exist. Reason: Get https:/
Refer to our FAQ and wiki pages for more information: https:/
And when using tcpdump -i <calico interface of k8s dashboard> I can see the k8s dashboard loses connection per 8 mins.
Then I used ip monitor and got below output:
9: cali2241f02b2c3
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 5
Deleted ff00::/8 dev cali2241f02b2c3 table local metric 256 pref medium
ff00::/8 dev cali2241f02b2c3 table local metric 256 pref medium
10: cali7c86e9585c6 inet6 fe80::8496:
valid_lft forever preferred_lft forever
192.168.25.199 dev cali68b21e04cb8 scope link
192.168.25.200 dev cali80465958db0 scope link
192.168.25.201 dev caliae3fbe26c95 scope link
192.168.25.202 dev calif7cbaf34e8b scope link
192.168.25.203 dev calif2e3ad1ce01 scope link
192.168.25.204 dev cali2241f02b2c3 scope link
192.168.25.205 dev cali7c86e9585c6 scope link
see more logs at here http://
So obviously, the pod routes are being deleted and recreated. And after checked with calico developer, this is caused by NetworkManager which is doing some 'magic' dynamic interface reconfig for desktop environment and it has side effects in server environments. Though I don't really get why fedora atomic has NetworkManager because IIUC, NetworkManager is only for desktop env.
So the fix will be letting NetworkManager skip the controlling for calico interfaces until we can get rid of it in Fedora atomic image.
Changed in magnum: | |
assignee: | nobody → Feilong Wang (flwang) |
Is there a patch for this?