Bug 1936342: kuryr-controller restarting after 3 days cluster running - pools without members

Bug #1920178 reported by Sarka Scavnicka
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kuryr-kubernetes
Fix Committed
Undecided
Sarka Scavnicka

Bug Description

Created attachment 1761526 [details]
kuryr controller logs

Description of problem:

After 4.8.0-0.nightly-2021-03-05-015511 cluster running 3 days, kuryr-controller is restarting because there are pools with empty members:

$ openstack loadbalancer pool show a8bf9c07-4cf5-4932-9182-829c16537ed9
+----------------------+-------------------------------------------------------------+
| Field | Value |
+----------------------+-------------------------------------------------------------+
| admin_state_up | True |
| created_at | 2021-03-05T11:16:10 |
| description | |
| healthmonitor_id | |
| id | a8bf9c07-4cf5-4932-9182-829c16537ed9 |
| lb_algorithm | ROUND_ROBIN |
| listeners | 80708599-b2f3-41a6-8e8b-f9538bdb0a7f |
| loadbalancers | b1a3ee03-d248-4f21-b870-ac9c64546335 |
| members | |
| name | openshift-marketplace/marketplace-operator-metrics:TCP:8383 |
| operating_status | ONLINE |
| project_id | cb736c7b6ada44218c8ee2d9e417368f |
| protocol | TCP |
| provisioning_status | ACTIVE |
| session_persistence | None |
| updated_at | 2021-03-05T11:16:17 |
| tls_container_ref | None |
| ca_tls_container_ref | None |
| crl_container_ref | None |
| tls_enabled | False |
+----------------------+-------------------------------------------------------------+

The issue was resolved by removing two svc (and letting kuryr-controller to recreate them):

- openshift-console-operator/metrics
- openshift-marketplace/marketplace-operator-metrics

Version-Release number of selected component (if applicable): OCP4.8.0-0.nightly-2021-03-05-015511 on OSP13 (2021-01-20.1) Amphora provider.

How reproducible: Unknown

Steps to Reproduce:
Install cluster and let it running 2-3 days.

Actual results: kuryr-controller restarting.

(shiftstack) [stack@undercloud-0 network]$ oc get pods -n openshift-kuryr
NAME READY STATUS RESTARTS AGE
kuryr-cni-4d76c 1/1 Running 1 2d21h
kuryr-cni-4j58w 1/1 Running 1 2d21h
kuryr-cni-7f6dt 1/1 Running 0 2d21h
kuryr-cni-qg9wm 1/1 Running 0 2d21h
kuryr-cni-qwfqw 1/1 Running 0 2d21h
kuryr-cni-r5ddr 1/1 Running 1 2d21h
kuryr-controller-78b7bdfdb4-tt95k 1/1 Running 914 2d18h

Expected results: kuryr-controller stable.

Additional info: kuryr-controller logs + must-gather.

Changed in kuryr-kubernetes:
assignee: nobody → Sarka Scavnicka (scavnicka)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kuryr-kubernetes 4.0.0.0rc1

This issue was fixed in the openstack/kuryr-kubernetes 4.0.0.0rc1 release candidate.

Changed in kuryr-kubernetes:
status: New → Incomplete
status: Incomplete → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.