[bionic-ussuri] designate-mdns.service: Failed with result 'start-limit-hit'

Bug #1925233 reported by Alexander Balderson
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Designate Charm
Fix Committed
High
Francesco De Simone
designate (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

On a bionic ussuri deploy with charms next, both the designate units are stuck waiting for the designate-mdns service to start. The journal shows that the service goes up and down multiple times and fails to reset the device list.
Apr 19 10:04:12 juju-9669d8-0-lxd-3 systemd[1]: designate-mdns.service: Service hold-off time over, scheduling restart.
Apr 19 10:04:12 juju-9669d8-0-lxd-3 systemd[1]: designate-mdns.service: Scheduled restart job, restart counter is at 503.
Apr 19 10:04:12 juju-9669d8-0-lxd-3 systemd[1]: Stopped OpenStack Designate DNSaaS mdns.
Apr 19 10:04:12 juju-9669d8-0-lxd-3 systemd[1]: designate-mdns.service: Failed to reset devices.list: Operation not permitted
Apr 19 10:04:12 juju-9669d8-0-lxd-3 systemd[1]: Started OpenStack Designate DNSaaS mdns.
Apr 19 10:04:15 juju-9669d8-0-lxd-3 systemd[1]: designate-mdns.service: Main process exited, code=exited, status=1/FAILURE
Apr 19 10:04:15 juju-9669d8-0-lxd-3 systemd[1]: designate-mdns.service: Failed with result 'exit-code'.

Additionally the mdns log shows it failing to connect to sql:
2021-04-19 16:04:23.818 70854 INFO designate.mdns.base [-] Initialized mDNS xfr endpoint
2021-04-19 16:04:24.429 70854 CRITICAL designate [-] Unhandled error: oslo_db.exception.CantStartEngineError: No sql_connection parameter is established
2021-04-19 16:04:24.429 70854 ERROR designate Traceback (most recent call last):
2021-04-19 16:04:24.429 70854 ERROR designate File "/usr/bin/designate-mdns", line 10, in <module>
2021-04-19 16:04:24.429 70854 ERROR designate sys.exit(main())
2021-04-19 16:04:24.429 70854 ERROR designate File "/usr/lib/python3/dist-packages/designate/cmd/mdns.py", line 41, in main
2021-04-19 16:04:24.429 70854 ERROR designate server = mdns_service.Service()
2021-04-19 16:04:24.429 70854 ERROR designate File "/usr/lib/python3/dist-packages/designate/mdns/service.py", line 47, in __init__
2021-04-19 16:04:24.429 70854 ERROR designate self.dns_application, self.tg,
2021-04-19 16:04:24.429 70854 ERROR designate File "/usr/lib/python3/dist-packages/designate/utils.py", line 256, in wrapper
2021-04-19 16:04:24.429 70854 ERROR designate cache.append(function(cls_instance))
2021-04-19 16:04:24.429 70854 ERROR designate File "/usr/lib/python3/dist-packages/designate/mdns/service.py", line 78, in dns_application
2021-04-19 16:04:24.429 70854 ERROR designate application = handler.RequestHandler(self.storage, self.tg)
2021-04-19 16:04:24.429 70854 ERROR designate File "/usr/lib/python3/dist-packages/designate/mdns/service.py", line 65, in storage
2021-04-19 16:04:24.429 70854 ERROR designate CONF['service:mdns'].storage_driver
2021-04-19 16:04:24.429 70854 ERROR designate File "/usr/lib/python3/dist-packages/designate/storage/__init__.py", line 36, in get_storage
2021-04-19 16:04:24.429 70854 ERROR designate return cls()
2021-04-19 16:04:24.429 70854 ERROR designate File "/usr/lib/python3/dist-packages/designate/storage/impl_sqlalchemy/__init__.py", line 40, in __init__
2021-04-19 16:04:24.429 70854 ERROR designate super(SQLAlchemyStorage, self).__init__()
2021-04-19 16:04:24.429 70854 ERROR designate File "/usr/lib/python3/dist-packages/designate/sqlalchemy/base.py", line 72, in __init__
2021-04-19 16:04:24.429 70854 ERROR designate self.engine = session.get_engine(self.get_name())
2021-04-19 16:04:24.429 70854 ERROR designate File "/usr/lib/python3/dist-packages/designate/sqlalchemy/session.py", line 47, in get_engine
2021-04-19 16:04:24.429 70854 ERROR designate facade = _create_facade_lazily(cfg_group)
2021-04-19 16:04:24.429 70854 ERROR designate File "/usr/lib/python3/dist-packages/designate/sqlalchemy/session.py", line 40, in _create_facade_lazily
2021-04-19 16:04:24.429 70854 ERROR designate **conf
2021-04-19 16:04:24.429 70854 ERROR designate File "/usr/lib/python3/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 1295, in __init__
2021-04-19 16:04:24.429 70854 ERROR designate slave_connection=slave_connection)
2021-04-19 16:04:24.429 70854 ERROR designate File "/usr/lib/python3/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 510, in _start
2021-04-19 16:04:24.429 70854 ERROR designate engine_args, maker_args)
2021-04-19 16:04:24.429 70854 ERROR designate File "/usr/lib/python3/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 532, in _setup_for_connection
2021-04-19 16:04:24.429 70854 ERROR designate "No sql_connection parameter is established")
2021-04-19 16:04:24.429 70854 ERROR designate oslo_db.exception.CantStartEngineError: No sql_connection parameter is established
2021-04-19 16:04:24.429 70854 ERROR designate

Test run artifacts are at https://oil-jenkins.canonical.com/artifacts/9d23156f-b697-49f0-8a04-f87452fdfc18/index.html
and the crashdump is at https://oil-jenkins.canonical.com/artifacts/9d23156f-b697-49f0-8a04-f87452fdfc18/generated/generated/openstack/juju-crashdump-openstack-2021-04-19-19.23.41.tar.gz

these logs snippets were taken from designate_0, but dessignate_1 has similar logs.

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Setting to incomplete as it's not clear whether this was an infra failure or something else (e.g. a race). Subsequent test runs have passed and the new charms released.

Changed in charm-designate:
status: New → Incomplete
Revision history for this message
Michael Skalka (mskalka) wrote :
Download full text (3.8 KiB)

We have seen this again during this test run: https://solutions.qa.canonical.com/testruns/testRun/d0e3e447-d9e5-4651-827c-593e433eae21

Crashdump here: https://oil-jenkins.canonical.com/artifacts/d0e3e447-d9e5-4651-827c-593e433eae21/generated/generated/openstack/juju-crashdump-openstack-2021-06-06-18.04.08.tar.gz

designate_1/var/log/designate/designate-mdns.log:

2021-06-06 15:11:18.613 56988 CRITICAL designate [-] Unhandled error: oslo_db.exception.CantStartEngineError: No sql_connection parameter is established
2021-06-06 15:11:18.613 56988 ERROR designate Traceback (most recent call last):
2021-06-06 15:11:18.613 56988 ERROR designate File "/usr/bin/designate-mdns", line 10, in <module>
2021-06-06 15:11:18.613 56988 ERROR designate sys.exit(main())
2021-06-06 15:11:18.613 56988 ERROR designate File "/usr/lib/python3/dist-packages/designate/cmd/mdns.py", line 41, in main
2021-06-06 15:11:18.613 56988 ERROR designate server = mdns_service.Service()
2021-06-06 15:11:18.613 56988 ERROR designate File "/usr/lib/python3/dist-packages/designate/mdns/service.py", line 47, in __init__
2021-06-06 15:11:18.613 56988 ERROR designate self.dns_application, self.tg,
2021-06-06 15:11:18.613 56988 ERROR designate File "/usr/lib/python3/dist-packages/designate/utils.py", line 256, in wrapper
2021-06-06 15:11:18.613 56988 ERROR designate cache.append(function(cls_instance))
2021-06-06 15:11:18.613 56988 ERROR designate File "/usr/lib/python3/dist-packages/designate/mdns/service.py", line 78, in dns_application
2021-06-06 15:11:18.613 56988 ERROR designate application = handler.RequestHandler(self.storage, self.tg)
2021-06-06 15:11:18.613 56988 ERROR designate File "/usr/lib/python3/dist-packages/designate/mdns/service.py", line 65, in storage
2021-06-06 15:11:18.613 56988 ERROR designate CONF['service:mdns'].storage_driver
2021-06-06 15:11:18.613 56988 ERROR designate File "/usr/lib/python3/dist-packages/designate/storage/__init__.py", line 36, in get_storage
2021-06-06 15:11:18.613 56988 ERROR designate return cls()
2021-06-06 15:11:18.613 56988 ERROR designate File "/usr/lib/python3/dist-packages/designate/storage/impl_sqlalchemy/__init__.py", line 40, in __init__
2021-06-06 15:11:18.613 56988 ERROR designate super(SQLAlchemyStorage, self).__init__()
2021-06-06 15:11:18.613 56988 ERROR designate File "/usr/lib/python3/dist-packages/designate/sqlalchemy/base.py", line 72, in __init__
2021-06-06 15:11:18.613 56988 ERROR designate self.engine = session.get_engine(self.get_name())
2021-06-06 15:11:18.613 56988 ERROR designate File "/usr/lib/python3/dist-packages/designate/sqlalchemy/session.py", line 47, in get_engine
2021-06-06 15:11:18.613 56988 ERROR designate facade = _create_facade_lazily(cfg_group)
2021-06-06 15:11:18.613 56988 ERROR designate File "/usr/lib/python3/dist-packages/designate/sqlalchemy/session.py", line 40, in _create_facade_lazily
2021-06-06 15:11:18.613 56988 ERROR designate **conf
2021-06-06 15:11:18.613 56988 ERROR designate File "/usr/lib/python3/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 1295, in __init__
2021-06-06 15:11:18.613 56988 ERROR designate ...

Read more...

Changed in charm-designate:
status: Incomplete → New
Revision history for this message
Michael Skalka (mskalka) wrote :
Revision history for this message
Michael Skalka (mskalka) wrote :
Revision history for this message
Bas de Bruijne (basdbruijne) wrote :

Is there any progress on this? It has been blocking ~40 hours of testruns in the last month. All occurrences can be found here:

https://solutions.qa.canonical.com/bugs/bugs/bug/1925233

Revision history for this message
Felipe Reyes (freyes) wrote : Re: [Bug 1925233] Re: [bionic-ussuri] next charm fails to start mdns service

On Mon, 2022-02-07 at 20:30 +0000, Bas de Bruijne wrote:
> Is there any progress on this? It has been blocking ~40 hours of
> testruns in the last month. All occurrences can be found here:
>
> https://solutions.qa.canonical.com/bugs/bugs/bug/1925233
>

hi Blas, I was taking a look to the crashdump file juju-crashdump-
openstack-2021-04-19-19.23.41.tar.gz[0].

The service designate-mdns started successfully 16:04

$ grep -P '(Starting mdns service|oslo_db\.exception)'
designate_0/var/log/designate/designate-mdns.log | tail -n 4
2021-04-19 16:04:24.429 70854 CRITICAL designate [-] Unhandled error:
oslo_db.exception.CantStartEngineError: No sql_connection parameter is
established
2021-04-19 16:04:24.429 70854 ERROR designate
oslo_db.exception.CantStartEngineError: No sql_connection parameter is
established
2021-04-19 16:04:28.498 71732 INFO designate.service [-] Starting mdns
service (version: 10.0.0)
2021-04-19 16:05:38.214 72577 INFO designate.service [-] Starting mdns
service (version: 10.0.0)

Before that timestamp effectively designate-mdns was failing with error
in the bug description.

The charm created the designate database at 15:57

$ grep -C 2 shared-db.setup designate_0/var/log/juju/unit-designate-
0.log
2021-04-19 15:57:47 DEBUG jujuc server.go:211 running hook tool "juju-
log" for designate/0-shared-db-relation-changed-5362817322778822974
2021-04-19 15:57:47 DEBUG juju-log shared-db:108: tracer>
tracer: set flag shared-db.setup
tracer: -- dequeue handler
reactive/designate_handlers.py:93:setup_database
2021-04-19 15:57:47 DEBUG jujuc server.go:211 running hook tool "juju-
log" for designate/0-shared-db-relation-changed-5362817322778822974

The shared-db.setup flag is set when the mysql charm replied to the
request of creating a database -
https://github.com/openstack/charm-designate/blob/master/src/reactive/designate_handlers.py#L93-L105

Then the configuration files got rendered at 16:05

$ grep -C 2 base-config.rendered designate_0/var/log/juju/unit-
designate-0.log | head -n 6
2021-04-19 16:05:14 DEBUG jujuc server.go:211 running hook tool "juju-
log" for designate/0-identity-service-relation-changed-
452563743676873630
2021-04-19 16:05:14 DEBUG juju-log identity-service:105: tracer>
tracer: set flag base-config.rendered
tracer: ++ queue handler
reactive/designate_handlers.py:144:run_db_migration
tracer: ++ queue handler
reactive/designate_handlers.py:155:sync_pool_manager_cache
--

The base-config.rendered flag indicates when the configuration files
are rendered.
https://github.com/openstack/charm-designate/blob/master/src/reactive/designate_handlers.py#L141

Is there any other symptom that leads to believe that designate-mdns
wasn't working correctly?.

[0]
https://oil-jenkins.canonical.com/artifacts/9d23156f-b697-49f0-8a04-f87452fdfc18/generated/generated/openstack/juju-crashdump-openstack-2021-04-19-19.23.41.tar.gz

Changed in charm-designate:
status: New → Incomplete
Revision history for this message
Billy Olsen (billy-olsen) wrote : Re: [bionic-ussuri] next charm fails to start mdns service

I think the key here is that the status of the application for designate-mdns is indicating that 'Services not running that should be: designate-mdns'. Unit designate/2 is still reporting (at the time of collection) that designtate-mdns is still not running.

When checking the psaux.txt listings for each of the unit, we can see that designate/0 and designate/1 are indeed running designate-mdns but unit designate/2 is not.

$ grep designate-mdns designate_?/psaux.txt
designate_0/psaux.txt:designa+ 83726 0.1 0.0 191408 80920 ? Ss 13:57 0:12 /usr/bin/python3.6 /usr/bin/designate-mdns --config-file=/etc/designate/designate.conf --log-file=/var/log/designate/designate-mdns.log
designate_1/psaux.txt:designa+ 79274 0.1 0.0 191408 80992 ? Ss 13:56 0:12 /usr/bin/python3.6 /usr/bin/designate-mdns --config-file=/etc/designate/designate.conf --log-file=/var/log/designate/designate-mdns.log

Revision history for this message
Bas de Bruijne (basdbruijne) wrote :

> Is there any other symptom that leads to believe that designate-mdns
wasn't working correctly?.

No, the automated test stops when the unit is blocked for 30 min and we don't do extra validation when it fails. As far as I know we haven't ran into this when manually building a deployment.

Revision history for this message
Felipe Reyes (freyes) wrote : Re: [Bug 1925233] Re: [bionic-ussuri] next charm fails to start mdns service

Checking designate/2.

 ~/Downloads/f1badd08-a9a6-405a-b0be-d7f49ece023c/designate_2 $ tail
var/log/designate/designate-mdns.log -n 2
2021-04-19 16:04:24.389 64928 ERROR designate
oslo_db.exception.CantStartEngineError: No sql_connection parameter is
established
2021-04-19 16:04:24.389 64928 ERROR designate

The last line logged by designate-mdns was at 16:04. While the
configuration files were rendered at 16:05:12

$ grep -C 2 base-config.rendered var/log/juju/unit-designate-2.log |
head -n 5
2021-04-19 16:05:12 DEBUG jujuc server.go:211 running hook tool "juju-
log" for designate/2-identity-service-relation-changed-
330171960584494981
2021-04-19 16:05:12 DEBUG juju-log identity-service:105: tracer>
tracer: set flag base-config.rendered
tracer: ++ queue handler
reactive/designate_handlers.py:144:run_db_migration
tracer: ++ queue handler
reactive/designate_handlers.py:155:sync_pool_manager_cache

designate-mdns failed to start because the systemd service hit start-
limit-hit

Apr 19 16:04:24 juju-9669d8-4-lxd-3 systemd[1]: designate-mdns.service:
Failed with result 'exit-code'.
Apr 19 16:04:24 juju-9669d8-4-lxd-3 systemd[1]: designate-mdns.service:
Service hold-off time over, scheduling restart.
Apr 19 16:04:24 juju-9669d8-4-lxd-3 systemd[1]: designate-mdns.service:
Scheduled restart job, restart counter is at 481.
Apr 19 16:04:24 juju-9669d8-4-lxd-3 systemd[1]: designate-mdns.service:
Failed to reset devices.list: Operation not permitted
Apr 19 16:04:24 juju-9669d8-4-lxd-3 systemd[1]: designate-mdns.service:
Start request repeated too quickly.
Apr 19 16:04:24 juju-9669d8-4-lxd-3 systemd[1]: designate-mdns.service:
Failed with result 'start-limit-hit'.

I think this is something that should be fixed in the package, since we
may need to do adjustments to designate-mdns.service

https://www.freedesktop.org/software/systemd/man/systemd.unit.html#StartLimitIntervalSec=interval

summary: - [bionic-ussuri] next charm fails to start mdns service
+ [bionic-ussuri] designate-mdns.service Failed with result 'start-limit-
+ hit'
summary: - [bionic-ussuri] designate-mdns.service Failed with result 'start-limit-
+ [bionic-ussuri] designate-mdns.service: Failed with result 'start-limit-
hit'
Revision history for this message
Felipe Reyes (freyes) wrote :

We discussed this issue with the team and the agreement is that the charm should disable/mask the service until the configuration has been rendered, this is a pattern already used in other charms. For the reactive charms there is a handler for this - https://github.com/openstack/charm-layer-openstack/blob/master/reactive/layer_openstack.py#L161-L167

Changed in charm-designate:
status: Incomplete → Triaged
Changed in designate (Ubuntu):
status: New → Invalid
Changed in charm-designate:
importance: Undecided → High
tags: added: good-first-bug
Revision history for this message
Felipe Reyes (freyes) wrote :
Revision history for this message
Moises Emilio Benzan Mora (moisesbenzan) wrote :
Changed in charm-designate:
assignee: nobody → Francesco De Simone (fdesi)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-designate (master)
Changed in charm-designate:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-designate (master)

Reviewed: https://review.opendev.org/c/openstack/charm-designate/+/848211
Committed: https://opendev.org/openstack/charm-designate/commit/1fef3793f5b4f089969a746ba8be33972ef58f10
Submitter: "Zuul (22348)"
Branch: master

commit 1fef3793f5b4f089969a746ba8be33972ef58f10
Author: Francesco de Simone <email address hidden>
Date: Wed Jun 29 20:53:16 2022 +0000

    Disable designate-mdns service until the configuration has been rendered

    As per the bug report, designate-mdns service appear to restart multiple time.
    This patch disable the service until the configuration has been rendered.

    Closes-Bug: #1925233
    Change-Id: Ib9843bda9c6967ce50cec8d721079bd70f7f3040

Changed in charm-designate:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-designate (stable/yoga)

Fix proposed to branch: stable/yoga
Review: https://review.opendev.org/c/openstack/charm-designate/+/872123

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-designate (stable/yoga)

Reviewed: https://review.opendev.org/c/openstack/charm-designate/+/872123
Committed: https://opendev.org/openstack/charm-designate/commit/604222e924d5eb1882a72ca4ce21b77dbc530fc8
Submitter: "Zuul (22348)"
Branch: stable/yoga

commit 604222e924d5eb1882a72ca4ce21b77dbc530fc8
Author: Francesco de Simone <email address hidden>
Date: Wed Jun 29 20:53:16 2022 +0000

    Disable designate-mdns service until the configuration has been rendered

    As per the bug report, designate-mdns service appear to restart multiple time.
    This patch disable the service until the configuration has been rendered.

    Closes-Bug: #1925233
    Change-Id: Ib9843bda9c6967ce50cec8d721079bd70f7f3040
    (cherry picked from commit 1fef3793f5b4f089969a746ba8be33972ef58f10)

tags: added: in-stable-yoga
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.