Error when processing addNetwork

Bug #1909111 reported by liujinxin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kuryr-kubernetes
Confirmed
Medium
Unassigned

Bug Description

**What happened**:
```
Error when processing addNetwork

The kuryr-controller has completed the creation of the kuryrport and updated the kuryrport CRD.
By looking at the kuryr-cni log, it looks like the kuryr-cni watcher process did not receive the update information for the kuryrport CRD, causing DaemonServer.run() return GATEWAY_TIMEOUT
```

**What you expected to happen**:
```
Normally, `kuryr-daemon: watcher worker` will `got CRD` of this kuryrport from apiserver, and it is certain that everything is fine with the current apiserver (because other nodes are able to create pods successfully).
```
**How to reproduce it (as minimally and precisely as possible)**:
```
I don't know how to reproduce the problem. But this problem has occurred a few times in my environment. Several times the problem has occurred after kuryr-cni has been running for a long time
```

**Environment**:
- Kubernetes version:
`v1.14.3`
- Kuryr-Kubernetes version:
`stable/train`
- Openstack version:
`stable/Rocky`

- Cluster information:
```
NAME STATUS ROLES AGE VERSION
master-0 Ready l3-node,master,monitor,node,openvswitch 47d v1.14.3
master-1 Ready l3-node,master,node,openstack-compute,openvswitch 47d v1.14.3
master-2 Ready l3-node,master,node,openvswitch 47d v1.14.3
slave-0 Ready l3-node,node,openvswitch 23d v1.14.3

NAME READY STATUS RESTARTS AGE IP NODE
kuryr-cni-ds-bngz6 1/1 Running 10 16d 192.168.15.86 master-1
kuryr-cni-ds-c7zp7 1/1 Running 1 16d 192.168.15.93 master-2
kuryr-cni-ds-cww29 1/1 Running 0 16d 192.168.15.91 master-0
kuryr-cni-ds-xz688 1/1 Running 0 7h53m 192.168.15.100 slave-0
kuryr-controller-5c964f789f-ndpk6 1/1 Running 1 6h19m 192.168.15.91 master-0
```
**Anything else we need to know?**:
```
nginx-5b98f4fcf8-bcsgk 0/1 ContainerCreating 0 12m <none> master-0 <none> <none>

---
apiVersion: openstack.org/v1
kind: KuryrPort
metadata:
  creationTimestamp: "2020-12-23T10:48:11Z"
  finalizers:
  - kuryr.openstack.org/kuryrport-finalizer
  generation: 3
  labels:
    kuryr.openstack.org/nodeName: master-0
  name: nginx-5b98f4fcf8-bcsgk
  namespace: default
  resourceVersion: "9924655"
  selfLink: /apis/openstack.org/v1/namespaces/default/kuryrports/nginx-5b98f4fcf8-bcsgk
  uid: 59fd71fc-450c-11eb-9ac9-fa163ebaf135
spec:
  podNodeName: master-0
  podUid: 55b97114-450c-11eb-9bab-fa163e903454
status:
  projectId: adfb76685924419a88d92d768c3d7bd1
  vifs:
    eth0:
      default: true
      vif:
        versioned_object.data:
          active: true
          address: fa:16:3e:5f:99:ba
          bridge_name: br-int
          has_traffic_filtering: false
          id: e0f04a15-c8e0-4a2d-9bc5-9ee6a041e221
...
---
apiVersion: v1
kind: Pod
metadata:
  annotations:
    routerId: 81760e19-4c4e-41c5-9c5f-ef18b832496b
  creationTimestamp: "2020-12-23T10:48:04Z"

---
kuryr-cni.log
DEBUG kuryr_kubernetes.cni.daemon.service [-] Received ADD request. CNI Params: {'CNI_IFNAME': 'eth0', 'CNI_NETNS': '/proc/3401884/ns/net', 'CNI_PATH': '/opt/cni/bin', 'CNI_COMMAND': 'ADD', 'CNI_CONTAINERID': 'fa15ea5d899657d48ee8759a5cb2c1e63f7995b85bd2187a53ec179195df293a', 'CNI_ARGS': 'IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=nginx-5b98f4fcf8-bcsgk;K8S_POD_INFRA_CONTAINER_ID=fa15ea5d899657d48ee8759a5cb2c1e63f7995b85bd2187a53ec179195df293a'} _prepare_request /usr/local/lib/python3.6/site-packages/kuryr_kubernetes/cni/daemon/service.py:70
DEBUG oslo_concurrency.lockutils [-] Acquired lock "default/nginx-5b98f4fcf8-bcsgk" lock /usr/local/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:266
DEBUG oslo_concurrency.lockutils [-] Acquired external semaphore "default/nginx-5b98f4fcf8-bcsgk" lock /usr/local/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:273
DEBUG oslo_concurrency.lockutils [-] Releasing lock "default/nginx-5b98f4fcf8-bcsgk" lock /usr/local/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:282
ERROR kuryr_kubernetes.cni.daemon.service [-] Error when processing addNetwork request: kuryr_kubernetes.exceptions.ResourceNotReady: Resource not ready: 'default/nginx-5b98f4fcf8-bcsgk'
```

liujinxin (scilla)
description: updated
Revision history for this message
Michal Dulko (michal-dulko-f) wrote :

I think this might happen in rare cases when etcd or kubernetes-api itself is suffering due to high load. Nevertheless we've mostly seen such problems with kuryr-controller and not kuryr-daemon because on failures kuryr-daemon gets restarted and the issue should fix itself. Is that the case in your instance?

Revision history for this message
liujinxin (scilla) wrote :

Yes, kuryr-daemon will restart after kuryr-daemon failures reach CONF.cni_daemon.cni_failures_count and should fix itself

Changed in kuryr-kubernetes:
status: New → Confirmed
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.