When the openvswitch service is restarted the "/var/run/openvswitch" folder which was mounted on the CNI container gets unmounted and hence the cni is unable to execute any of the ovs-vsctl commands which are required by VIFOpenSwitchDriver.
Manual solution: deleting the cni container fixes the issue as Kubernetes recreates the container and it has the "/var/run/openvswitch" properly mounted.
The automatic solution is to check for the existence of the directory "/var/run/openvswitch" and mark the cni as unhealty if the directory is missing.
The cni logs for the error:
2019-04-23 02:56:31.712 203 INFO kuryr_kubernetes.cni.main [-] Using 'CNIDaemonizedRunner'
2019-04-23 02:56:31.713 203 DEBUG kuryr_kubernetes.cni.api [-] Making request to CNI Daemon. POST http://127.0.0.1:50036/addNetwork
{'CNI_IFNAME': 'eth0', 'CNI_NETNS': '/proc/5204/ns/net', 'CNI_PATH': '/opt/cni/bin', 'CNI_ARGS': 'IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=multi-port-test-84c56cd4bd-r824b;K8S_POD_INFRA_CONTAINER_ID=9c5768414ffda81c555017d23686b57dec581a07093f5e0c4857b9bb4b02ce27', 'CNI_DAEMON': 'True', 'CNI_CONFIG_DIR_PATH': '/etc/cni/net.d', 'CNI_COMMAND': 'ADD', 'CNI_CONTAINERID': '9c5768414ffda81c555017d23686b57dec581a07093f5e0c4857b9bb4b02ce27', 'CNI_BIN_DIR_PATH': '/opt/cni/bin', 'config_kuryr': {u'debug': True, u'cniVersion': u'0.3.1', u'type': u'kuryr-cni', u'kuryr_conf': u'/etc/kuryr/kuryr.conf', u'name': u'kuryr'}} _make_request /usr/lib/python2.7/site-packages/kuryr_kubernetes/cni/api.py:179
2019-04-23 02:56:31.728 220 DEBUG kuryr_kubernetes.cni.daemon.service [-] Received ADD request. CNI Params: {'CNI_IFNAME': u'eth0', 'CNI_NETNS': u'/proc/5204/ns/net', 'CNI_PATH': u'/opt/cni/bin', 'CNI_ARGS': u'IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=multi-port-test-84c56cd4bd-r824b;K8S_POD_INFRA_CONTAINER_ID=9c5768414ffda81c555017d23686b57dec581a07093f5e0c4857b9bb4b02ce27', 'CNI_DAEMON': u'True', 'CNI_CONFIG_DIR_PATH': u'/etc/cni/net.d', 'CNI_COMMAND': u'ADD', 'CNI_BIN_DIR_PATH': u'/opt/cni/bin', 'CNI_CONTAINERID': u'9c5768414ffda81c555017d23686b57dec581a07093f5e0c4857b9bb4b02ce27'} _prepare_request /usr/lib/python2.7/site-packages/kuryr_kubernetes/cni/daemon/service.py:69
2019-04-23 02:56:32.639 29 DEBUG kuryr_kubernetes.handlers.dispatch [-] 1 handler(s) available for event MODIFIED Pod:multi-port-test-84c56cd4bd-r824b __call__ /usr/lib/python2.7/site-packages/kuryr_kubernetes/handlers/dispatch.py:66
2019-04-23 02:56:32.642 29 DEBUG kuryr_kubernetes.cni.handlers [-] Got VIFs from annotation: {'eth0': VIFOpenVSwitch(active=False,address=fa:16:3e:19:34:9d,bridge_name='br-int',has_traffic_filtering=True,id=5f00664c-5ce3-4d6b-b038-eaeffe34dbef,network=Network(364eafde-63fb-4afe-9d05-8d806277e98c),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=False,vif_name='tap5f00664c-5c')} _get_vifs /usr/lib/python2.7/site-packages/kuryr_kubernetes/cni/handlers.py:81
2019-04-23 02:56:32.645 29 DEBUG oslo_concurrency.lockutils [-] Acquired lock "default/multi-port-test-84c56cd4bd-r824b" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:265
2019-04-23 02:56:32.646 29 DEBUG oslo_concurrency.lockutils [-] Acquired external semaphore "default/multi-port-test-84c56cd4bd-r824b" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:272
2019-04-23 02:56:32.648 29 DEBUG oslo_concurrency.lockutils [-] Releasing lock "default/multi-port-test-84c56cd4bd-r824b" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:281
2019-04-23 02:56:32.859 220 DEBUG os_vif [-] Plugging vif VIFOpenVSwitch(active=False,address=fa:16:3e:19:34:9d,bridge_name='br-int',has_traffic_filtering=True,id=5f00664c-5ce3-4d6b-b038-eaeffe34dbef,network=Network(364eafde-63fb-4afe-9d05-8d806277e98c),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=False,vif_name='tap5f00664c-5c') plug /usr/lib/python2.7/site-packages/os_vif/__init__.py:76
2019-04-23 02:56:32.863 220 INFO oslo.privsep.daemon [-] Running privsep helper: ['privsep-helper', '--privsep_context', 'vif_plug_ovs.privsep.vif_plug', '--privsep_sock_path', '/tmp/tmpGeKC6o/privsep.sock']
2019-04-23 02:56:33.628 220 INFO oslo.privsep.daemon [-] Spawned new privsep daemon via rootwrap
2019-04-23 02:56:33.630 220 DEBUG oslo.privsep.daemon [-] Accepted privsep connection to /tmp/tmpGeKC6o/privsep.sock __init__ /usr/lib/python2.7/site-packages/oslo_privsep/daemon.py:332
2019-04-23 02:56:33.578 235 INFO oslo.privsep.daemon [-] privsep daemon starting
2019-04-23 02:56:33.585 235 INFO oslo.privsep.daemon [-] privsep process running with uid/gid: 0/0
2019-04-23 02:56:33.591 235 INFO oslo.privsep.daemon [-] privsep process running with capabilities (eff/prm/inh): CAP_NET_ADMIN/CAP_NET_ADMIN/none
2019-04-23 02:56:33.592 235 INFO oslo.privsep.daemon [-] privsep daemon running as pid 235
2019-04-23 02:56:34.274 220 INFO os_vif [-] Successfully plugged vif VIFOpenVSwitch(active=False,address=fa:16:3e:19:34:9d,bridge_name='br-int',has_traffic_filtering=True,id=5f00664c-5ce3-4d6b-b038-eaeffe34dbef,network=Network(364eafde-63fb-4afe-9d05-8d806277e98c),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=False,vif_name='tap5f00664c-5c')
2019-04-23 02:56:34.795 220 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): ovs-vsctl -- --if-exists del-port tap5f00664c-5c -- add-port br-int tap5f00664c-5c -- set Interface tap5f00664c-5c external-ids:iface-id=5f00664c-5ce3-4d6b-b038-eaeffe34dbef external-ids:iface-status=active external-ids:attached-mac=fa:16:3e:19:34:9d external-ids:vm-uuid=kuryr execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:372
2019-04-23 02:56:35.267 220 DEBUG oslo_concurrency.processutils [-] CMD "ovs-vsctl -- --if-exists del-port tap5f00664c-5c -- add-port br-int tap5f00664c-5c -- set Interface tap5f00664c-5c external-ids:iface-id=5f00664c-5ce3-4d6b-b038-eaeffe34dbef external-ids:iface-status=active external-ids:attached-mac=fa:16:3e:19:34:9d external-ids:vm-uuid=kuryr" returned: 1 in 0.472s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:409
2019-04-23 02:56:35.268 220 DEBUG oslo_concurrency.processutils [-] u'ovs-vsctl -- --if-exists del-port tap5f00664c-5c -- add-port br-int tap5f00664c-5c -- set Interface tap5f00664c-5c external-ids:iface-id=5f00664c-5ce3-4d6b-b038-eaeffe34dbef external-ids:iface-status=active external-ids:attached-mac=fa:16:3e:19:34:9d external-ids:vm-uuid=kuryr' failed. Not Retrying. execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:457
2019-04-23 02:56:35.269 220 ERROR kuryr_kubernetes.linux_net_utils [-] Unable to execute ['ovs-vsctl', '--', '--if-exists', 'del-port', u'tap5f00664c-5c', '--', 'add-port', u'br-int', u'tap5f00664c-5c', '--', 'set', 'Interface', u'tap5f00664c-5c', u'external-ids:iface-id=5f00664c-5ce3-4d6b-b038-eaeffe34dbef', 'external-ids:iface-status=active', u'external-ids:attached-mac=fa:16:3e:19:34:9d', 'external-ids:vm-uuid=kuryr']. Exception: Unexpected error while running command.
Command: ovs-vsctl -- --if-exists del-port tap5f00664c-5c -- add-port br-int tap5f00664c-5c -- set Interface tap5f00664c-5c external-ids:iface-id=5f00664c-5ce3-4d6b-b038-eaeffe34dbef external-ids:iface-status=active external-ids:attached-mac=fa:16:3e:19:34:9d external-ids:vm-uuid=kuryr
Exit code: 1
Stdout: u''
Stderr: u'ovs-vsctl: unix:/var/run/openvswitch/db.sock: database connection failed (No such file or directory)\n': ProcessExecutionError: Unexpected error while running command.
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service [-] Error when processing addNetwork request. CNI Params: {'CNI_IFNAME': u'eth0', 'CNI_NETNS': u'/proc/5204/ns/net', 'CNI_PATH': u'/opt/cni/bin', 'CNI_ARGS': u'IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=multi-port-test-84c56cd4bd-r824b;K8S_POD_INFRA_CONTAINER_ID=9c5768414ffda81c555017d23686b57dec581a07093f5e0c4857b9bb4b02ce27', 'CNI_DAEMON': u'True', 'CNI_CONFIG_DIR_PATH': u'/etc/cni/net.d', 'CNI_COMMAND': u'ADD', 'CNI_BIN_DIR_PATH': u'/opt/cni/bin', 'CNI_CONTAINERID': u'9c5768414ffda81c555017d23686b57dec581a07093f5e0c4857b9bb4b02ce27'}: ProcessExecutionError: Unexpected error while running command.
Command: ovs-vsctl -- --if-exists del-port tap5f00664c-5c -- add-port br-int tap5f00664c-5c -- set Interface tap5f00664c-5c external-ids:iface-id=5f00664c-5ce3-4d6b-b038-eaeffe34dbef external-ids:iface-status=active external-ids:attached-mac=fa:16:3e:19:34:9d external-ids:vm-uuid=kuryr
Exit code: 1
Stdout: u''
Stderr: u'ovs-vsctl: unix:/var/run/openvswitch/db.sock: database connection failed (No such file or directory)\n'
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service Traceback (most recent call last):
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/cni/daemon/service.py", line 81, in add
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service vif = self.plugin.add(params)
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/cni/plugins/k8s_cni_registry.py", line 51, in add
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service vifs = self._do_work(params, b_base.connect)
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/cni/plugins/k8s_cni_registry.py", line 138, in _do_work
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service container_id=params.CNI_CONTAINERID)
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/cni/binding/base.py", line 100, in connect
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service driver.connect(vif, ifname, netns, container_id)
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/cni/binding/bridge.py", line 105, in connect
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service vif.address, instance_id)
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/linux_net_utils.py", line 53, in create_ovs_vif_port
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service _ovs_vsctl(_create_ovs_vif_cmd(bridge, dev, iface_id, mac, instance_id))
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/linux_net_utils.py", line 34, in _ovs_vsctl
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service return processutils.execute(*full_args, run_as_root=True)
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py", line 424, in execute
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service cmd=sanitized_cmd)
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service ProcessExecutionError: Unexpected error while running command.
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service Command: ovs-vsctl -- --if-exists del-port tap5f00664c-5c -- add-port br-int tap5f00664c-5c -- set Interface tap5f00664c-5c external-ids:iface-id=5f00664c-5ce3-4d6b-b038-eaeffe34dbef external-ids:iface-status=active external-ids:attached-mac=fa:16:3e:19:34:9d external-ids:vm-uuid=kuryr
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service Exit code: 1
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service Stdout: u''
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service Stderr: u'ovs-vsctl: unix:/var/run/openvswitch/db.sock: database connection failed (No such file or directory)\n'
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service
2019-04-23 02:56:35.277 220 INFO werkzeug [-] 127.0.0.1 - - [23/Apr/2019 02:56:35] "POST /addNetwork HTTP/1.1" 500 -
2019-04-23 02:56:35.280 203 DEBUG kuryr_kubernetes.cni.api [-] CNI Daemon returned "500 INTERNAL SERVER ERROR". _make_request /usr/lib/python2.7/site-packages/kuryr_kubernetes/cni/api.py:187
2019-04-23 02:56:35.280 203 ERROR kuryr_kubernetes.cni.api [-] CNI daemon returned error "500 INTERNAL SERVER ERROR".
2019-04-23 02:56:35.283 203 DEBUG kuryr_kubernetes.cni.api [-] CNI output: {'msg': 'Got invalid status code from CNI daemon.', 'cniVersion': '0.3.1', 'code': 100, 'details': 'Traceback (most recent call last):\n File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/cni/api.py", line 81, in run\n vif = self._add(params)\n File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/cni/api.py", line 154, in _add\n resp = self._make_request(\'addNetwork\', params, httplib.ACCEPTED)\n File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/cni/api.py", line 191, in _make_request\n raise k_exc.CNIError(\'Got invalid status code from CNI daemon.\')\nCNIError: Got invalid status code from CNI daemon.\n'} _write_dict /usr/lib/python2.7/site-packages/kuryr_kubernetes/cni/api.py:55
2019-04-23 02:56:35.285 203 DEBUG kuryr_kubernetes.cni.main [-] Exiting with status 1 run /usr/lib/python2.7/site-packages/kuryr_kubernetes/cni/main.py:78
2019-04-23 02:56:35.605 37 WARNING pyroute2.ipdb.main [-] shutdown in progress
2019-04-23 02:56:35.605 37 DEBUG kuryr_kubernetes.cni.health [-] Kuryr CNI Liveness verified. liveness_status /usr/lib/python2.7/site-packages/kuryr_kubernetes/cni/health.py:151
2019-04-23 02:56:35.606 37 INFO werkzeug [-] 172.214.0.6 - - [23/Apr/2019 02:56:35] "GET /alive HTTP/1.1" 200 -
Fix proposed to branch: master /review. opendev. org/657909
Review: https:/