kuryr-kubernetes

Error when processing addNetwork

Bug #1909111 reported by liujinxin on 2020-12-23

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	kuryr-kubernetes	Confirmed	Medium	Unassigned

Bug Description

**What happened**:
```
Error when processing addNetwork

The kuryr-controller has completed the creation of the kuryrport and updated the kuryrport CRD.
By looking at the kuryr-cni log, it looks like the kuryr-cni watcher process did not receive the update information for the kuryrport CRD, causing DaemonServer.run() return GATEWAY_TIMEOUT
```

**What you expected to happen**:
```
Normally, `kuryr-daemon: watcher worker` will `got CRD` of this kuryrport from apiserver, and it is certain that everything is fine with the current apiserver (because other nodes are able to create pods successfully).
```
**How to reproduce it (as minimally and precisely as possible)**:
```
I don't know how to reproduce the problem. But this problem has occurred a few times in my environment. Several times the problem has occurred after kuryr-cni has been running for a long time
```

**Environment**:
- Kubernetes version:
`v1.14.3`
- Kuryr-Kubernetes version:
`stable/train`
- Openstack version:
`stable/Rocky`

- Cluster information:
```
NAME STATUS ROLES AGE VERSION
master-0 Ready l3-node,master,monitor,node,openvswitch 47d v1.14.3
master-1 Ready l3-node,master,node,openstack-compute,openvswitch 47d v1.14.3
master-2 Ready l3-node,master,node,openvswitch 47d v1.14.3
slave-0 Ready l3-node,node,openvswitch 23d v1.14.3

NAME READY STATUS RESTARTS AGE IP NODE
kuryr-cni-ds-bngz6 1/1 Running 10 16d 192.168.15.86 master-1
kuryr-cni-ds-c7zp7 1/1 Running 1 16d 192.168.15.93 master-2
kuryr-cni-ds-cww29 1/1 Running 0 16d 192.168.15.91 master-0
kuryr-cni-ds-xz688 1/1 Running 0 7h53m 192.168.15.100 slave-0
kuryr-controller-5c964f789f-ndpk6 1/1 Running 1 6h19m 192.168.15.91 master-0
```
**Anything else we need to know?**:
```
nginx-5b98f4fcf8-bcsgk 0/1 ContainerCreating 0 12m <none> master-0 <none> <none>

---
apiVersion: openstack.org/v1
kind: KuryrPort
metadata:
  creationTimestamp: "2020-12-23T10:48:11Z"
  finalizers:
  - kuryr.openstack.org/kuryrport-finalizer
  generation: 3
  labels:
    kuryr.openstack.org/nodeName: master-0
  name: nginx-5b98f4fcf8-bcsgk
  namespace: default
  resourceVersion: "9924655"
  selfLink: /apis/openstack.org/v1/namespaces/default/kuryrports/nginx-5b98f4fcf8-bcsgk
  uid: 59fd71fc-450c-11eb-9ac9-fa163ebaf135
spec:
  podNodeName: master-0
  podUid: 55b97114-450c-11eb-9bab-fa163e903454
status:
  projectId: adfb76685924419a88d92d768c3d7bd1
  vifs:
    eth0:
      default: true
      vif:
        versioned_object.data:
          active: true
          address: fa:16:3e:5f:99:ba
          bridge_name: br-int
          has_traffic_filtering: false
          id: e0f04a15-c8e0-4a2d-9bc5-9ee6a041e221
...
---
apiVersion: v1
kind: Pod
metadata:
  annotations:
    routerId: 81760e19-4c4e-41c5-9c5f-ef18b832496b
  creationTimestamp: "2020-12-23T10:48:04Z"

---
kuryr-cni.log
DEBUG kuryr_kubernetes.cni.daemon.service [-] Received ADD request. CNI Params: {'CNI_IFNAME': 'eth0', 'CNI_NETNS': '/proc/3401884/ns/net', 'CNI_PATH': '/opt/cni/bin', 'CNI_COMMAND': 'ADD', 'CNI_CONTAINERID': 'fa15ea5d899657d48ee8759a5cb2c1e63f7995b85bd2187a53ec179195df293a', 'CNI_ARGS': 'IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=nginx-5b98f4fcf8-bcsgk;K8S_POD_INFRA_CONTAINER_ID=fa15ea5d899657d48ee8759a5cb2c1e63f7995b85bd2187a53ec179195df293a'} _prepare_request /usr/local/lib/python3.6/site-packages/kuryr_kubernetes/cni/daemon/service.py:70
DEBUG oslo_concurrency.lockutils [-] Acquired lock "default/nginx-5b98f4fcf8-bcsgk" lock /usr/local/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:266
DEBUG oslo_concurrency.lockutils [-] Acquired external semaphore "default/nginx-5b98f4fcf8-bcsgk" lock /usr/local/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:273
DEBUG oslo_concurrency.lockutils [-] Releasing lock "default/nginx-5b98f4fcf8-bcsgk" lock /usr/local/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:282
ERROR kuryr_kubernetes.cni.daemon.service [-] Error when processing addNetwork request: kuryr_kubernetes.exceptions.ResourceNotReady: Resource not ready: 'default/nginx-5b98f4fcf8-bcsgk'
```

See original description

liujinxin (scilla) on 2020-12-23

description:

updated

Revision history for this message

Michal Dulko (michal-dulko-f) wrote on 2021-01-05:

I think this might happen in rare cases when etcd or kubernetes-api itself is suffering due to high load. Nevertheless we've mostly seen such problems with kuryr-controller and not kuryr-daemon because on failures kuryr-daemon gets restarted and the issue should fix itself. Is that the case in your instance?

Revision history for this message

liujinxin (scilla) wrote on 2021-01-06:

Yes, kuryr-daemon will restart after kuryr-daemon failures reach CONF.cni_daemon.cni_failures_count and should fix itself

Michal Dulko (michal-dulko-f) on 2021-01-07

Changed in kuryr-kubernetes:
status:	New → Confirmed
importance:	Undecided → Medium

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.