Default reactive certificate handlers may crash during update-status due to dns failure

Bug #1954748 reported by Alex Kavanagh
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Base Layer
Fix Released
High
Unassigned
charm-ovn-central
Fix Released
High
Unassigned

Bug Description

As exemplified by this trace from the ovn-central charm the layer-openstack code is doing DNS lookups during the update-status hook and this crashed the charm. Although it's a reactive charm, it probably shouldn't be doing this much work, and should certainly handle a DNS lookup failure better than just crashing:

2021-12-13 16:42:36 ERROR juju-log Hook error:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-ovn-central-0/.venv/lib/python3.8/site-packages/charms/reactive/__init__.py", line 74, in main
    bus.dispatch(restricted=restricted_mode)
  File "/var/lib/juju/agents/unit-ovn-central-0/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 390, in dispatch
    _invoke(other_handlers)
  File "/var/lib/juju/agents/unit-ovn-central-0/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 359, in _invoke
    handler.invoke()
  File "/var/lib/juju/agents/unit-ovn-central-0/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 181, in invoke
    self._action(*args)
  File "/var/lib/juju/agents/unit-ovn-central-0/charm/reactive/layer_openstack.py", line 134, in default_request_certificates
    for cn, req in instance.get_certificate_requests().items():
  File "/var/lib/juju/agents/unit-ovn-central-0/.venv/lib/python3.8/site-packages/charms_openstack/charm/classes.py", line 290, in get_certificate_requests
    return cert_utils.get_certificate_request(
  File "/var/lib/juju/agents/unit-ovn-central-0/.venv/lib/python3.8/site-packages/charmhelpers/contrib/openstack/cert_utils.py", line 142, in get_certificate_request
    req.add_hostname_cn()
  File "/var/lib/juju/agents/unit-ovn-central-0/.venv/lib/python3.8/site-packages/charmhelpers/contrib/openstack/cert_utils.py", line 93, in add_hostname_cn
    'cn': get_hostname(ip),
  File "/var/lib/juju/agents/unit-ovn-central-0/.venv/lib/python3.8/site-packages/charmhelpers/contrib/network/ip.py", line 523, in get_hostname
    result = ns_query(rev)
  File "/var/lib/juju/agents/unit-ovn-central-0/.venv/lib/python3.8/site-packages/charmhelpers/contrib/network/ip.py", line 479, in ns_query
    answers = dns.resolver.query(address, rtype)
  File "/var/lib/juju/agents/unit-ovn-central-0/.venv/lib/python3.8/site-packages/dns/resolver.py", line 1100, in query
    return get_default_resolver().query(qname, rdtype, rdclass, tcp, source,
  File "/var/lib/juju/agents/unit-ovn-central-0/.venv/lib/python3.8/site-packages/dns/resolver.py", line 898, in query
    raise NoNameservers(request=request, errors=errors)
dns.resolver.NoNameservers: All nameservers failed to answer the query 158.0.16.172.in-addr.arpa. IN PTR: Server 127.0.0.53 UDP port 53 answered SERVFAIL

This happened on ServerStack which does have a known propensity for DNS lookup failures.

Revision history for this message
Frode Nordahl (fnordahl) wrote :

This looks like the default reactive handlers from layer-openstack to me?

https://github.com/openstack/charm-layer-openstack/blob/5999eeaa5d007b6ae8dc12753976ef11f76684ee/reactive/layer_openstack.py#L126-L137

Perhaps we should solve this for all of our reactive charms by guarding the default handlers with the is-update-status-hook flag?

Frode Nordahl (fnordahl)
summary: - ovn-central crashed during update-status due to dns failure
+ Default reactive certificate handlers may crash during update-status due
+ to dns failure
description: updated
Revision history for this message
Frode Nordahl (fnordahl) wrote :
Changed in layer-openstack:
status: New → In Progress
importance: Undecided → High
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

I think your review is a great idea. However, I'm fairly sure it's probably not going to break reactive charms that use the logic of hoping that the update-status hook will run a handler that should have run in some other hook but didn't due to charm logic reasons and therefore get stuck. This doesn't look like one of those cases!

Frode Nordahl (fnordahl)
Changed in charm-ovn-central:
status: New → Triaged
importance: Undecided → High
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ovn-central (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/x/charm-ovn-central/+/821692

Frode Nordahl (fnordahl)
Changed in layer-openstack:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ovn-central (master)

Reviewed: https://review.opendev.org/c/x/charm-ovn-central/+/821692
Committed: https://opendev.org/x/charm-ovn-central/commit/49e1297da538c30961417e8592ea1984103bb969
Submitter: "Zuul (22348)"
Branch: master

commit 49e1297da538c30961417e8592ea1984103bb969
Author: Frode Nordahl <email address hidden>
Date: Tue Dec 14 12:19:39 2021 +0100

    Do not execute certificate handlers in update-status hook

    The certificate handler code does a bit of work and should not
    run during the update-status hook.

    Rebuild to pull in fix merged in layer-openstack.

    Depends-On: I4a3aa544f98049c83db576f95de826038e8e1afc
    Closes-Bug: #1954748
    Change-Id: I2ee39f7a0dcd1f4a37051d8dc08e383522387f1f

Changed in charm-ovn-central:
status: In Progress → Fix Committed
Changed in charm-ovn-central:
milestone: none → 22.04
Changed in charm-ovn-central:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.