Error when processing addNetwork
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
kuryr-kubernetes |
Confirmed
|
Medium
|
Unassigned |
Bug Description
**What happened**:
```
Error when processing addNetwork
The kuryr-controller has completed the creation of the kuryrport and updated the kuryrport CRD.
By looking at the kuryr-cni log, it looks like the kuryr-cni watcher process did not receive the update information for the kuryrport CRD, causing DaemonServer.run() return GATEWAY_TIMEOUT
```
**What you expected to happen**:
```
Normally, `kuryr-daemon: watcher worker` will `got CRD` of this kuryrport from apiserver, and it is certain that everything is fine with the current apiserver (because other nodes are able to create pods successfully).
```
**How to reproduce it (as minimally and precisely as possible)**:
```
I don't know how to reproduce the problem. But this problem has occurred a few times in my environment. Several times the problem has occurred after kuryr-cni has been running for a long time
```
**Environment**:
- Kubernetes version:
`v1.14.3`
- Kuryr-Kubernetes version:
`stable/train`
- Openstack version:
`stable/Rocky`
- Cluster information:
```
NAME STATUS ROLES AGE VERSION
master-0 Ready l3-node,
master-1 Ready l3-node,
master-2 Ready l3-node,
slave-0 Ready l3-node,
NAME READY STATUS RESTARTS AGE IP NODE
kuryr-cni-ds-bngz6 1/1 Running 10 16d 192.168.15.86 master-1
kuryr-cni-ds-c7zp7 1/1 Running 1 16d 192.168.15.93 master-2
kuryr-cni-ds-cww29 1/1 Running 0 16d 192.168.15.91 master-0
kuryr-cni-ds-xz688 1/1 Running 0 7h53m 192.168.15.100 slave-0
kuryr-controlle
```
**Anything else we need to know?**:
```
nginx-5b98f4fcf
---
apiVersion: openstack.org/v1
kind: KuryrPort
metadata:
creationTimes
finalizers:
- kuryr.openstack
generation: 3
labels:
kuryr.
name: nginx-5b98f4fcf
namespace: default
resourceVersion: "9924655"
selfLink: /apis/openstack
uid: 59fd71fc-
spec:
podNodeName: master-0
podUid: 55b97114-
status:
projectId: adfb76685924419
vifs:
eth0:
default: true
vif:
active: true
address: fa:16:3e:5f:99:ba
id: e0f04a15-
...
---
apiVersion: v1
kind: Pod
metadata:
annotations:
routerId: 81760e19-
creationTimes
---
kuryr-cni.log
DEBUG kuryr_kubernete
DEBUG oslo_concurrenc
DEBUG oslo_concurrenc
DEBUG oslo_concurrenc
ERROR kuryr_kubernete
```
description: | updated |
Changed in kuryr-kubernetes: | |
status: | New → Confirmed |
importance: | Undecided → Medium |
I think this might happen in rare cases when etcd or kubernetes-api itself is suffering due to high load. Nevertheless we've mostly seen such problems with kuryr-controller and not kuryr-daemon because on failures kuryr-daemon gets restarted and the issue should fix itself. Is that the case in your instance?