CNI fails on KeyError: versioned_object.namespace

Bug #1792549 reported by Luis Tomas Bolivar
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kuryr-kubernetes
In Progress
Undecided
Unassigned

Bug Description

Sometimes when creating pods they get stuck on the ContainerCreating status forever.

When checking the pods description, we see the next error:
Events:
  Type Reason Age From Message
  ---- ------ ---- ---- -------
  Normal Scheduled 34m default-scheduler Successfully assigned test-py3/demo-1-ndqks to app-node-0.openshift.example.com
  Warning FailedCreatePodSandBox 32m kubelet, app-node-0.openshift.example.com Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container
"5960a8feb3f8f806cdf9376d303fbe4139cdcb31dd49181c00edfb8cc1197340" network for pod "demo-1-ndqks": NetworkPlugin cni failed to set up pod "demo-1-ndqks_test-py3" network: Got inva
lid status code from CNI daemon.; Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/kuryr_kubernetes/cni/api.py", line 81, in run
    vif = self._add(params)
  File "/usr/local/lib/python3.6/site-packages/kuryr_kubernetes/cni/api.py", line 154, in _add
    resp = self._make_request('addNetwork', params, httplib.ACCEPTED)
  File "/usr/local/lib/python3.6/site-packages/kuryr_kubernetes/cni/api.py", line 191, in _make_request
    raise k_exc.CNIError('Got invalid status code from CNI daemon.')
kuryr_kubernetes.exceptions.CNIError: Got invalid status code from CNI daemon.

And on the kuryr-cni pod we see that the problem is when extracting the pod annotations:
2018-09-14 09:32:41.138 33 WARNING kuryr_kubernetes.watcher [-] Restarting(0) watching '/api/v1/pods?fieldSelector=spec.nodeName=app-node-0.openshift.example.com'.
2018-09-14 09:32:42.634 41 INFO kuryr_kubernetes.cni.health [-] CNI driver readiness verified.
2018-09-14 09:32:42.635 41 INFO werkzeug [-] 192.168.100.14 - - [14/Sep/2018 09:32:42] "GET /ready HTTP/1.1" 200 -
2018-09-14 09:32:44.141 33 INFO kuryr_kubernetes.watcher [-] Started watching '/api/v1/pods?fieldSelector=spec.nodeName=app-node-0.openshift.example.com'
2018-09-14 09:32:44.197 33 ERROR kuryr_kubernetes.watcher [-] Caught exception while watching.: KeyError: 'versioned_object.namespace'
2018-09-14 09:32:44.197 33 ERROR kuryr_kubernetes.watcher Traceback (most recent call last):
2018-09-14 09:32:44.197 33 ERROR kuryr_kubernetes.watcher File "/usr/local/lib/python3.6/site-packages/kuryr_kubernetes/watcher.py", line 190, in _watch
2018-09-14 09:32:44.197 33 ERROR kuryr_kubernetes.watcher self._handler(event)
2018-09-14 09:32:44.197 33 ERROR kuryr_kubernetes.watcher File "/usr/local/lib/python3.6/site-packages/kuryr_kubernetes/handlers/dispatch.py", line 113, in __call__
2018-09-14 09:32:44.197 33 ERROR kuryr_kubernetes.watcher self._handler(event)
2018-09-14 09:32:44.197 33 ERROR kuryr_kubernetes.watcher File "/usr/local/lib/python3.6/site-packages/kuryr_kubernetes/handlers/dispatch.py", line 64, in __call__
2018-09-14 09:32:44.197 33 ERROR kuryr_kubernetes.watcher handler(event)
2018-09-14 09:32:44.197 33 ERROR kuryr_kubernetes.watcher File "/usr/local/lib/python3.6/site-packages/kuryr_kubernetes/handlers/k8s_base.py", line 75, in __call__
2018-09-14 09:32:44.197 33 ERROR kuryr_kubernetes.watcher self.on_present(obj)
2018-09-14 09:32:44.197 33 ERROR kuryr_kubernetes.watcher File "/usr/local/lib/python3.6/site-packages/kuryr_kubernetes/cni/handlers.py", line 42, in on_present
2018-09-14 09:32:44.197 33 ERROR kuryr_kubernetes.watcher vifs = self._get_vifs(pod)
2018-09-14 09:32:44.197 33 ERROR kuryr_kubernetes.watcher File "/usr/local/lib/python3.6/site-packages/kuryr_kubernetes/cni/handlers.py", line 79, in _get_vifs
2018-09-14 09:32:44.197 33 ERROR kuryr_kubernetes.watcher state = utils.extract_pod_annotation(state_annotation)
2018-09-14 09:32:44.197 33 ERROR kuryr_kubernetes.watcher File "/usr/local/lib/python3.6/site-packages/kuryr_kubernetes/utils.py", line 164, in extract_pod_annotation
2018-09-14 09:32:44.197 33 ERROR kuryr_kubernetes.watcher obj = objects.base.VersionedObject.obj_from_primitive(annotation)
2018-09-14 09:32:44.197 33 ERROR kuryr_kubernetes.watcher File "/usr/local/lib/python3.6/site-packages/oslo_versionedobjects/base.py", line 406, in obj_from_primitive
2018-09-14 09:32:44.197 33 ERROR kuryr_kubernetes.watcher objns = cls._obj_primitive_field(primitive, 'namespace')
2018-09-14 09:32:44.197 33 ERROR kuryr_kubernetes.watcher File "/usr/local/lib/python3.6/site-packages/oslo_versionedobjects/base.py", line 355, in _obj_primitive_field
2018-09-14 09:32:44.197 33 ERROR kuryr_kubernetes.watcher return primitive[key]
2018-09-14 09:32:44.197 33 ERROR kuryr_kubernetes.watcher KeyError: 'versioned_object.namespace'
2018-09-14 09:32:44.197 33 ERROR kuryr_kubernetes.watcher
2018-09-14 09:32:44.199 33 WARNING kuryr_kubernetes.watcher [-] Restarting(0) watching '/api/v1/pods?fieldSelector=spec.nodeName=app-node-0.openshift.example.com'.
2018-09-14 09:32:44.604 212 ERROR kuryr_kubernetes.cni.daemon.service [-] Timed out waiting for requested pod to appear in registry: Resource not ready: 'test-py3/demo-1-ndqks'.:
kuryr_kubernetes.exceptions.ResourceNotReady: Resource not ready: 'test-py3/demo-1-ndqks'
2018-09-14 09:32:44.614 212 INFO werkzeug [-] 127.0.0.1 - - [14/Sep/2018 09:32:44] "POST /addNetwork HTTP/1.1" 504 -
2018-09-14 09:32:47.203 33 INFO kuryr_kubernetes.watcher [-] Started watching '/api/v1/pods?fieldSelector=spec.nodeName=app-node-0.openshift.example.com'

Revision history for this message
Luis Tomas Bolivar (ltomasbo) wrote :

I forgot to mention that the pod has the right notation:

oc get pod demo-1-rc5bd -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    openshift.io/deployment-config.latest-version: "1"
    openshift.io/deployment-config.name: demo
    openshift.io/deployment.name: demo-1
    openshift.io/scc: restricted
    openstack.org/kuryr-vif: '{"versioned_object.changes": ["default_vif"], "versioned_object.data":
      {"additional_vifs": {}, "default_vif": {"versioned_object.changes": ["preserve_on_delete",
      "plugin", "id", "network", "address", "vif_name", "has_traffic_filtering", "active",
      "vlan_id"], "versioned_object.data": {"active": true, "address": "fa:16:3e:a7:3a:52",
      "has_traffic_filtering": false, "id": "6c328f9b-410c-4120-b7a8-37245ddba0fd",
      "network": {"versioned_object.changes": ["mtu", "id", "should_provide_bridge",
      "should_provide_vlan", "subnets", "label", "multi_host"], "versioned_object.data":
      {"id": "e570605e-b568-4749-b7f8-935fce247e98", "label": "ns/test-py3-net", "mtu":
      1350, "multi_host": false, "should_provide_bridge": false, "should_provide_vlan":
      false, "subnets": {"versioned_object.changes": ["objects"], "versioned_object.data":
      {"objects": [{"versioned_object.changes": ["gateway", "cidr", "ips", "dns",
      "routes"], "versioned_object.data": {"cidr": "10.11.10.0/24", "dns": [], "gateway":
      "10.11.10.1", "ips": {"versioned_object.changes": ["objects"], "versioned_object.data":
      {"objects": [{"versioned_object.changes": ["address"], "versioned_object.data":
      {"address": "10.11.10.5"}, "versioned_object.name": "FixedIP", "versioned_object.namespace":
      "os_vif", "versioned_object.version": "1.0"}]}, "versioned_object.name": "FixedIPList",
      "versioned_object.namespace": "os_vif", "versioned_object.version": "1.0"},
      "routes": {"versioned_object.changes": ["objects"], "versioned_object.data":
      {"objects": []}, "versioned_object.name": "RouteList", "versioned_object.namespace":
      "os_vif", "versioned_object.version": "1.0"}}, "versioned_object.name": "Subnet",
      "versioned_object.namespace": "os_vif", "versioned_object.version": "1.0"}]},
      "versioned_object.name": "SubnetList", "versioned_object.namespace": "os_vif",
      "versioned_object.version": "1.0"}}, "versioned_object.name": "Network", "versioned_object.namespace":
      "os_vif", "versioned_object.version": "1.1"}, "plugin": "noop", "preserve_on_delete":
      false, "vif_name": "tap6c328f9b-41", "vlan_id": 1374}, "versioned_object.name":
      "VIFVlanNested", "versioned_object.namespace": "os_vif", "versioned_object.version":
      "1.0"}}, "versioned_object.name": "PodState", "versioned_object.namespace":
      "os_vif", "versioned_object.version": "1.0"}'
  creationTimestamp: 2018-09-14T10:04:43Z

Revision history for this message
Sunday Mgbogu (digitalsimboja) wrote :

Hi Luis,

I experienced this sometime last week, was thinking it was my network connectivity.
I would like to take a look.

Could you share more details how the API should handle this as expected so I can work on it.

Changed in kuryr-kubernetes:
assignee: nobody → Sunday Mgbogu (digitalsimboja)
status: New → In Progress
Revision history for this message
Luis Tomas Bolivar (ltomasbo) wrote :

Hi Sunday,

This is a pretty old bug report. We are no longer using annotations on the pod as we rely on KuryrPort CRD object instead. So what you hit is a different problem for sure. And most probably this bug should be closed as most probably it got fixed due to the movement to CRDs instead of annotations

Changed in kuryr-kubernetes:
assignee: Sunday Mgbogu (digitalsimboja) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.