nova DB sync failed due to DNS resolution failure of mysql-router service

Bug #2033680 reported by Bas de Bruijne
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Snap
Incomplete
Low
Unassigned
microk8s
New
Unknown

Bug Description

Test run https://solutions.qa.canonical.com/testruns/1cce9709-db0c-4bb1-80cb-a3fc9652a12c (microstack on jammy, single-node), fails with the following status: https://oil-jenkins.canonical.com/artifacts/1cce9709-db0c-4bb1-80cb-a3fc9652a12c/generated/generated/sunbeam/juju_status_openstack.txt

In the debug-log I see a lot of these messages:
=====
machine-0: 07:11:16 INFO juju.kubernetes.klog Waited for 5.723907577s due to client-side throttling, not priority and fairness, request: GET:https://10.245.130.51:16443/api/v1/namespaces/openstack/pods?labelSelector=app.kubernetes.io%2Fname%3Dcinder-mysql-router
machine-0: 07:11:16 ERROR juju.apiserver.uniter resolving "": lookup : no such host
=====

Also, the nova pod logs show:
=====
(...)
2023-08-31T08:07:25.729Z [nova-scheduler] 2023-08-31 08:07:25.715 96 ERROR nova self.dbapi_connection = connection = pool._invoke_creator(self)
2023-08-31T08:07:25.729Z [nova-scheduler] 2023-08-31 08:07:25.715 96 ERROR nova File "/usr/lib/python3/dist-packages/sqlalchemy/engine/create.py", line 590, in connect
2023-08-31T08:07:25.729Z [nova-scheduler] 2023-08-31 08:07:25.715 96 ERROR nova return dialect.connect(*cargs, **cparams)
2023-08-31T08:07:25.729Z [nova-scheduler] 2023-08-31 08:07:25.715 96 ERROR nova File "/usr/lib/python3/dist-packages/sqlalchemy/engine/default.py", line 597, in connect
2023-08-31T08:07:25.729Z [nova-scheduler] 2023-08-31 08:07:25.715 96 ERROR nova return self.dbapi.connect(*cargs, **cparams)
2023-08-31T08:07:25.729Z [nova-scheduler] 2023-08-31 08:07:25.715 96 ERROR nova File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 353, in __init__
2023-08-31T08:07:25.729Z [nova-scheduler] 2023-08-31 08:07:25.715 96 ERROR nova self.connect()
2023-08-31T08:07:25.729Z [nova-scheduler] 2023-08-31 08:07:25.715 96 ERROR nova File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 664, in connect
2023-08-31T08:07:25.729Z [nova-scheduler] 2023-08-31 08:07:25.715 96 ERROR nova raise exc
2023-08-31T08:07:25.729Z [nova-scheduler] 2023-08-31 08:07:25.715 96 ERROR nova oslo_db.exception.DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on 'nova-api-mysql-router.openstack.svc.cluster.local' ([Errno -2] Name or service not known)")
=====

I'm not sure what the cause of the name resolutions errors is.

Logs and configs can be found here: https://oil-jenkins.canonical.com/artifacts/1cce9709-db0c-4bb1-80cb-a3fc9652a12c/index.html

Revision history for this message
James Page (james-page) wrote :

The failure to resolve hostnames within K8S is most likely in the coredns service within Kubernetes - either its not working or its just not up-to-date.

FWIW I have seen this issue once before but I never got to the bottom of it.

Revision history for this message
James Page (james-page) wrote :

From the coredns pod logs:

[ERROR] plugin/errors: 2 neutron-mysql-router.openstack.svc.cluster.local.maas. A: read udp 10.1.96.129:44899->10.1.24.3:53: i/o timeout
[INFO] 10.1.96.177:56308 - 46111 "AAAA IN nova-api-mysql-router.openstack.svc.cluster.local.maas. udp 72 false 512" - - 0 2.001306915s
[ERROR] plugin/errors: 2 nova-api-mysql-router.openstack.svc.cluster.local.maas. AAAA: read udp 10.1.96.129:36760->10.1.10.2:53: i/o timeout
[INFO] 10.1.96.177:60337 - 37440 "AAAA IN nova-api-mysql-router.openstack.svc.cluster.local.maas. udp 72 false 512" - - 0 2.000504262s
[ERROR] plugin/errors: 2 nova-api-mysql-router.openstack.svc.cluster.local.maas. AAAA: read udp 10.1.96.129:51755->10.1.10.3:53: i/o timeout
[INFO] 10.1.96.141:57493 - 7682 "A IN neutron-mysql-router.openstack.svc.cluster.local.maas. udp 71 false 512" - - 0 2.001294922s
[ERROR] plugin/errors: 2 neutron-mysql-router.openstack.svc.cluster.local.maas. A: read udp 10.1.96.129:33389->10.1.10.3:53: i/o timeout

Revision history for this message
James Page (james-page) wrote :

That looks like the coredns server does not have an answer for "nova-api-mysql-router.openstack.svc.cluster.local" so passes it upstream with the .maas suffix attached - which then fails in some way due to the io timeout

Revision history for this message
James Page (james-page) wrote :

Please would it be possible to get the output of other kubectl commands so we can check the status of the information Juju passes to K8S?

kubectl get svc --all-namespaces would be a good start.

Changed in snap-openstack:
status: New → Incomplete
summary: - Nova DB sync failed
+ nova DB sync failed due to DNS resolution failure of mysql-router
+ service
Changed in snap-openstack:
importance: Undecided → Low
Changed in microk8s:
status: Unknown → New
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.