dnsmasq on Ubuntu Jammy crashes on neutron-dhcp-agent updates

Bug #2026757 reported by Julia Kreger
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ironic
New
Critical
Unassigned
neutron
New
Low
Unassigned
dnsmasq (Ubuntu)
Status tracked in Mantic
Jammy
Incomplete
Undecided
Unassigned
Kinetic
Won't Fix
Undecided
Unassigned
Lunar
Invalid
Undecided
Unassigned
Mantic
Invalid
Undecided
Unassigned

Bug Description

The Ironic project's CI has been having major blocking issues moving to utilizing Ubuntu Jammy and with some investigation we were able to isolate the issues down to the dhcp updates causing dnsmasq to crash on Ubuntu Jammy, which ships with dnsmasq 2.86. This issue sounds similar to an issue known about to the dnsmasq maintainers, where dnsmasq would crash with updates occurring due to configuration refresh[0].

This resulted in us upgrading dnsmasq to the version which ships with Ubuntu Lunar.

Which was no better. Dnsmasq still crashed upon record updates for addresses and ports getting configuration added/changed/removed.

We later downgraded to the version of dnsmasq shipped in Ubuntu Focal, and dnsmasq stopped crashing and appeared stable enough to utilize for CI purposes.

** Kernel log from Ubuntu Jammy Package **

[229798.876726] dnsmasq[81586]: segfault at 7c28 ip 00007f6e8313147e sp 00007fffb3d6f830 error 4 in libc.so.6[7f6e830b4000+195000]
[229798.876745] Code: 98 13 00 e8 04 b9 ff ff 0f 1f 40 00 f3 0f 1e fa 48 85 ff 0f 84 bb 00 00 00 55 48 8d 77 f0 53 48 83 ec 18 48 8b 1d 92 39 17 00 <48> 8b 47 f8 64 8b 2b a8 02 75 57 48 8b 15 18 39 17 00 64 48 83 3a
[229805.444912] dnsmasq[401428]: segfault at dce8 ip 00007fe63bf6a47e sp 00007ffdb105b440 error 4 in libc.so.6[7fe63beed000+195000]
[229805.444933] Code: 98 13 00 e8 04 b9 ff ff 0f 1f 40 00 f3 0f 1e fa 48 85 ff 0f 84 bb 00 00 00 55 48 8d 77 f0 53 48 83 ec 18 48 8b 1d 92 39 17 00 <48> 8b 47 f8 64 8b 2b a8 02 75 57 48 8b 15 18 39 17 00 64 48 83 3a
[230414.213448] dnsmasq[401538]: segfault at 78b8 ip 00007f12160e447e sp 00007ffed6ef2190 error 4 in libc.so.6[7f1216067000+195000]
[230414.213467] Code: 98 13 00 e8 04 b9 ff ff 0f 1f 40 00 f3 0f 1e fa 48 85 ff 0f 84 bb 00 00 00 55 48 8d 77 f0 53 48 83 ec 18 48 8b 1d 92 39 17 00 <48> 8b 47 f8 64 8b 2b a8 02 75 57 48 8b 15 18 39 17 00 64 48 83 3a
[230465.098989] dnsmasq[402665]: segfault at c378 ip 00007f81458f047e sp 00007fff0db334a0 error 4 in libc.so.6[7f8145873000+195000]
[230465.099005] Code: 98 13 00 e8 04 b9 ff ff 0f 1f 40 00 f3 0f 1e fa 48 85 ff 0f 84 bb 00 00 00 55 48 8d 77 f0 53 48 83 ec 18 48 8b 1d 92 39 17 00 <48> 8b 47 f8 64 8b 2b a8 02 75 57 48 8b 15 18 39 17 00 64 48 83 3a
[231787.247374] dnsmasq[402863]: segfault at 7318 ip 00007f3940b9147e sp 00007ffc8df4f010 error 4 in libc.so.6[7f3940b14000+195000]
[231787.247392] Code: 98 13 00 e8 04 b9 ff ff 0f 1f 40 00 f3 0f 1e fa 48 85 ff 0f 84 bb 00 00 00 55 48 8d 77 f0 53 48 83 ec 18 48 8b 1d 92 39 17 00 <48> 8b 47 f8 64 8b 2b a8 02 75 57 48 8b 15 18 39 17 00 64 48 83 3a
[231844.886399] dnsmasq[405182]: segfault at dc58 ip 00007f32a29e147e sp 00007ffddedd7480 error 4 in libc.so.6[7f32a2964000+195000]
[231844.886420] Code: 98 13 00 e8 04 b9 ff ff 0f 1f 40 00 f3 0f 1e fa 48 85 ff 0f 84 bb 00 00 00 55 48 8d 77 f0 53 48 83 ec 18 48 8b 1d 92 39 17 00 <48> 8b 47 f8 64 8b 2b a8 02 75 57 48 8b 15 18 39 17 00 64 48 83 3a
[234692.482154] dnsmasq[405289]: segfault at 67d8 ip 00007fab0c5c447e sp 00007fffd6fd8fa0 error 4 in libc.so.6[7fab0c547000+195000]
[234692.482173] Code: 98 13 00 e8 04 b9 ff ff 0f 1f 40 00 f3 0f 1e fa 48 85 ff 0f 84 bb 00 00 00 55 48 8d 77 f0 53 48 83 ec 18 48 8b 1d 92 39 17 00 <48> 8b 47 f8 64 8b 2b a8 02 75 57 48 8b 15 18 39 17 00 64 48 83 3a

** Kernel log entries from Ubuntu Lunar package **

[234724.842339] dnsmasq[409843]: segfault at fffffffffffffffd ip 00007f35a147647e sp 00007ffd536038c0 error 5 in libc.so.6[7f35a13f9000+195000]
[234724.842368] Code: 98 13 00 e8 04 b9 ff ff 0f 1f 40 00 f3 0f 1e fa 48 85 ff 0f 84 bb 00 00 00 55 48 8d 77 f0 53 48 83 ec 18 48 8b 1d 92 39 17 00 <48> 8b 47 f8 64 8b 2b a8 02 75 57 48 8b 15 18 39 17 00 64 48 83 3a
[234784.918116] dnsmasq[410019]: segfault at fffffffffffffffd ip 00007f634233947e sp 00007fff33877f20 error 5 in libc.so.6[7f63422bc000+195000]
[234784.918133] Code: 98 13 00 e8 04 b9 ff ff 0f 1f 40 00 f3 0f 1e fa 48 85 ff 0f 84 bb 00 00 00 55 48 8d 77 f0 53 48 83 ec 18 48 8b 1d 92 39 17 00 <48> 8b 47 f8 64 8b 2b a8 02 75 57 48 8b 15 18 39 17 00 64 48 83 3a
[235022.163339] dnsmasq[410151]: segfault at fffffffffffffffd ip 00007f21dd37f47e sp 00007fff9bf416d0 error 5 in libc.so.6[7f21dd302000+195000]
[235022.163362] Code: 98 13 00 e8 04 b9 ff ff 0f 1f 40 00 f3 0f 1e fa 48 85 ff 0f 84 bb 00 00 00 55 48 8d 77 f0 53 48 83 ec 18 48 8b 1d 92 39 17 00 <48> 8b 47 f8 64 8b 2b a8 02 75 57 48 8b 15 18 39 17 00 64 48 83 3a
[235024.831325] dnsmasq[410445]: segfault at fffffffffffffffd ip 00007f7edf02147e sp 00007ffc4fb19cd0 error 5 in libc.so.6[7f7edefa4000+195000]
[235024.831354] Code: 98 13 00 e8 04 b9 ff ff 0f 1f 40 00 f3 0f 1e fa 48 85 ff 0f 84 bb 00 00 00 55 48 8d 77 f0 53 48 83 ec 18 48 8b 1d 92 39 17 00 <48> 8b 47 f8 64 8b 2b a8 02 75 57 48 8b 15 18 39 17 00 64 48 83 3a
[236052.793683] dnsmasq[410630]: segfault at fffffffffffffffd ip 00007f3046ca147e sp 00007ffe5583df50 error 5 in libc.so.6[7f3046c24000+195000]
[236052.793704] Code: 98 13 00 e8 04 b9 ff ff 0f 1f 40 00 f3 0f 1e fa 48 85 ff 0f 84 bb 00 00 00 55 48 8d 77 f0 53 48 83 ec 18 48 8b 1d 92 39 17 00 <48> 8b 47 f8 64 8b 2b a8 02 75 57 48 8b 15 18 39 17 00 64 48 83 3a
[236105.451351] dnsmasq[412107]: segfault at fffffffffffffffd ip 00007f4425bcd47e sp 00007fffd5337560 error 5 in libc.so.6[7f4425b50000+195000]
[236105.451368] Code: 98 13 00 e8 04 b9 ff ff 0f 1f 40 00 f3 0f 1e fa 48 85 ff 0f 84 bb 00 00 00 55 48 8d 77 f0 53 48 83 ec 18 48 8b 1d 92 39 17 00 <48> 8b 47 f8 64 8b 2b a8 02 75 57 48 8b 15 18 39 17 00 64 48 83 3a

** The command line the process is launched with **

dnsmasq --no-hosts --pid-file=/opt/stack/data/neutron/dhcp/c1ca059e-350d-4d78-9330-600f7315c380/pid --dhcp-hostsfile=/opt/stack/data/neutron/dhcp/c1ca059e-350d-4d78-9330-600f7315c380/host --addn-hosts=/opt/stack/data/neutron/dhcp/c1ca059e-350d-4d78-9330-600f7315c380/addn_hosts --dhcp-optsfile=/opt/stack/data/neutron/dhcp/c1ca059e-350d-4d78-9330-600f7315c380/opts --dhcp-leasefile=/opt/stack/data/neutron/dhcp/c1ca059e-350d-4d78-9330-600f7315c380/leases --dhcp-match=set:ipxe,175 --dhcp-userclass=set:ipxe6,iPXE --local-service--bind-dynamic --dhcp-range=set:subnet-3c1445e7-6f7d-4e62-997f-627bc53da72c,10.1.0.0,static,255.255.255.192,86400s --dhcp-option-force=option:mtu,1380 --dhcp-lease-max=64 --conf-file=/dev/null --domain=openstacklocal

** Neutron Logging **

Jul 10 15:26:01 np0034614991 neutron-dhcp-agent[60941]: DEBUG neutron.agent.dhcp.agent [-] neutron.agent.dhcp.agent.DhcpAgentWithStateReport method _port_delete called with arguments ({'port_id': 'bdeaa43c-687c-4e60-a24e-3725d6353828', 'network_id': 'c1ca059e-350d-4d78-9330-600f7315c380', 'fixed_ips': [{'subnet_id': '3c1445e7-6f7d-4e62-997f-627bc53da72c', 'ip_address': '10.1.0.14'}, {'subnet_id': '54bc71f6-bff5-417d-9e4b-1f5f58ed6318', 'ip_address': 'fdd9:92b1:9e2c:0:5054:ff:fe44:5c9f'}], 'priority': 6},) {} {{(pid=60941) wrapper /usr/local/lib/python3.10/dist-packages/oslo_log/helpers.py:65}}
Jul 10 15:26:01 np0034614991 neutron-dhcp-agent[60941]: DEBUG neutron.agent.dhcp.agent [-] Calling driver for network: c1ca059e-350d-4d78-9330-600f7315c380/seg=None action: reload_allocations {{(pid=60941) _call_driver /opt/stack/neutron/neutron/agent/dhcp/agent.py:246}}
Jul 10 15:26:01 np0034614991 neutron-dhcp-agent[60941]: DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): ip netns exec qdhcp-c1ca059e-350d-4d78-9330-600f7315c380 dhcp_release tapbb6348d9-39 10.1.0.14 52:54:00:44:5c:9f {{(pid=78114) execute /usr/local/lib/python3.10/dist-packages/oslo_concurrency/processutils.py:384}}
Jul 10 15:26:01 np0034614991 neutron-dhcp-agent[60941]: DEBUG oslo_concurrency.processutils [-] CMD "ip netns exec qdhcp-c1ca059e-350d-4d78-9330-600f7315c380 dhcp_release tapbb6348d9-39 10.1.0.14 52:54:00:44:5c:9f" returned: 0 in 0.011s {{(pid=78114) execute /usr/local/lib/python3.10/dist-packages/oslo_concurrency/processutils.py:422}}
Jul 10 15:26:01 np0034614991 neutron-dhcp-agent[60941]: DEBUG oslo.privsep.daemon [-] privsep: reply[8a4f2794-3b63-4f8d-9604-53dd6a4a868c]: (4, ('', '')) {{(pid=78114) _call_back /usr/local/lib/python3.10/dist-packages/oslo_privsep/daemon.py:501}}
Jul 10 15:26:01 np0034614991 neutron-dhcp-agent[60941]: DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): ip netns exec qdhcp-c1ca059e-350d-4d78-9330-600f7315c380 dhcp_release tapbb6348d9-39 10.1.0.14 52:54:00:44:5c:9f 01:52:54:00:44:5c:9f {{(pid=78114) execute /usr/local/lib/python3.10/dist-packages/oslo_concurrency/processutils.py:384}}
Jul 10 15:26:01 np0034614991 neutron-dhcp-agent[60941]: DEBUG oslo_concurrency.processutils [-] CMD "ip netns exec qdhcp-c1ca059e-350d-4d78-9330-600f7315c380 dhcp_release tapbb6348d9-39 10.1.0.14 52:54:00:44:5c:9f 01:52:54:00:44:5c:9f" returned: 0 in 0.011s {{(pid=78114) execute /usr/local/lib/python3.10/dist-packages/oslo_concurrency/processutils.py:422}}
Jul 10 15:26:01 np0034614991 neutron-dhcp-agent[60941]: DEBUG oslo.privsep.daemon [-] privsep: reply[33a91aed-bc58-48dd-b673-d4a4d5da54f6]: (4, ('', '')) {{(pid=78114) _call_back /usr/local/lib/python3.10/dist-packages/oslo_privsep/daemon.py:501}}
Jul 10 15:26:02 np0034614991 neutron-dhcp-agent[60941]: DEBUG neutron.agent.linux.dhcp [-] Building host file: /opt/stack/data/neutron/dhcp/c1ca059e-350d-4d78-9330-600f7315c380/host {{(pid=60941) _output_hosts_file /opt/stack/neutron/neutron/agent/linux/dhcp.py:956}}
Jul 10 15:26:02 np0034614991 neutron-dhcp-agent[60941]: DEBUG neutron.agent.linux.utils [-] Running command: ['env', 'LC_ALL=C', 'PATH=/sbin:/usr/sbin', 'dnsmasq', '--test', '--dhcp-host=tag:foo'] {{(pid=60941) create_process /opt/stack/neutron/neutron/agent/linux/utils.py:84}}
Jul 10 15:26:02 np0034614991 neutron-dhcp-agent[60941]: DEBUG neutron.agent.linux.dhcp [-] Done building host file /opt/stack/data/neutron/dhcp/c1ca059e-350d-4d78-9330-600f7315c380/host {{(pid=60941) _output_hosts_file /opt/stack/neutron/neutron/agent/linux/dhcp.py:997}}
Jul 10 15:26:02 np0034614991 neutron-dhcp-agent[60941]: DEBUG oslo.privsep.daemon [-] privsep: reply[f3dd1224-fe8c-4fb0-8113-699e779df64e]: (4, ('', '', 0)) {{(pid=62248) _call_back /usr/local/lib/python3.10/dist-packages/oslo_privsep/daemon.py:501}}
Jul 10 15:27:00 np0034614991 neutron-dhcp-agent[60941]: DEBUG oslo_concurrency.lockutils [-] Acquiring lock "_check_child_processes" by "neutron.agent.linux.external_process.ProcessMonitor._check_child_processes" {{(pid=60941) inner /usr/local/lib/python3.10/dist-packages/oslo_concurrency/lockutils.py:404}}
Jul 10 15:27:00 np0034614991 neutron-dhcp-agent[60941]: DEBUG oslo_concurrency.lockutils [-] Lock "_check_child_processes" acquired by "neutron.agent.linux.external_process.ProcessMonitor._check_child_processes" :: waited 0.001s {{(pid=60941) inner /usr/local/lib/python3.10/dist-packages/oslo_concurrency/lockutils.py:409}}
Jul 10 15:27:00 np0034614991 neutron-dhcp-agent[60941]: ERROR neutron.agent.linux.external_process [-] dnsmasq for dhcp with uuid c1ca059e-350d-4d78-9330-600f7315c380 not found. The process should not have died
Jul 10 15:27:00 np0034614991 neutron-dhcp-agent[60941]: WARNING neutron.agent.linux.external_process [-] Respawning dnsmasq for uuid c1ca059e-350d-4d78-9330-600f7315c380
Jul 10 15:27:00 np0034614991 neutron-dhcp-agent[60941]: DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qdhcp-c1ca059e-350d-4d78-9330-600f7315c380', 'env', 'PROCESS_TAG=dnsmasq-c1ca059e-350d-4d78-9330-600f7315c380', 'dnsmasq', '--no-hosts', '', '--pid-file=/opt/stack/data/neutron/dhcp/c1ca059e-350d-4d78-9330-600f7315c380/pid', '--dhcp-hostsfile=/opt/stack/data/neutron/dhcp/c1ca059e-350d-4d78-9330-600f7315c380/host', '--addn-hosts=/opt/stack/data/neutron/dhcp/c1ca059e-350d-4d78-9330-600f7315c380/addn_hosts', '--dhcp-optsfile=/opt/stack/data/neutron/dhcp/c1ca059e-350d-4d78-9330-600f7315c380/opts', '--dhcp-leasefile=/opt/stack/data/neutron/dhcp/c1ca059e-350d-4d78-9330-600f7315c380/leases', '--dhcp-match=set:ipxe,175', '--dhcp-userclass=set:ipxe6,iPXE', '--local-service', '--bind-dynamic', '--dhcp-range=set:subnet-3c1445e7-6f7d-4e62-997f-627bc53da72c,10.1.0.0,static,255.255.255.192,86400s', '--dhcp-option-force=option:mtu,1380', '--dhcp-lease-max=64', '--conf-file=/dev/null', '--domain=openstacklocal'] {{(pid=60941) execute_rootwrap_daemon /opt/stack/neutron/neutron/agent/linux/utils.py:108}}

We don't believe this is a neutron bug, at least outright, but suspect neutron is also likely encountering this issue as well, at least with any sort of exhaustive test jobs. Most of Ironic's one job tests would pass with this dnsmasq, it was only where we continually ran new test scenarios that we would see this issue crop up and cause failures.

In the mean time, the ironic project will likely downgrade dnsmasq to unblock it's CI.

[0]: https://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2022q3/016562.html

no longer affects: dnsmasq
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in dnsmasq (Ubuntu):
status: New → Confirmed
Revision history for this message
yatin (yatinkarel) wrote :

I tried to setup dnsmasq-2.87 with https://review.opendev.org/c/openstack/ironic/+/888121 by using source install and avoiding newer package from lunar, still sometime some of tests fails but i no longer see any segfault for dnsmasq with it. May be someone from ironic Team could check and see if it's related to dnsmasq or some other known issue.

Revision history for this message
Julia Kreger (juliaashleykreger) wrote :

Greetings Yatin!

So, the failure appears to be rooted in ipxe failing to get the complete set of data from the server. My guess is that is something to do with spanning tree as iPXE for ubuntu has also changed it's behavior. My feeling is this is rooted with some spanning tree behavior, which we merged a patch after your last recheck to disable. I've re-rechecked your test patch to hopefully provide us an additional data point.

Revision history for this message
Miguel Lavalle (minsel) wrote :

I'll bring it up in the next weekly Neutron meeting for visibility purposes

Changed in neutron:
importance: Undecided → Low
Revision history for this message
yatin (yatinkarel) wrote :

<<< So, the failure appears to be rooted in ipxe failing to get the complete set of data from the server. My guess is that is something to do with spanning tree as iPXE for ubuntu has also changed it's behavior. My feeling is this is rooted with some spanning tree behavior, which we merged a patch after your last recheck to disable. I've re-rechecked your test patch to hopefully provide us an additional data point.

Thanks Julia even with spanning tree fixes, i still seen some failures in test patch. It could be some other issue though.

wrt segfaults, I validated this even with 2.89 + source install[1] and didn't see any segfault with it. May be the segfault that you noticed with Lunar dnsmasq-2.89 on jammy is due to using packages built for lunar used in jammy but not specific to dnsmasq itself. I triggered the jobs again to see if segfaults are seen.

Based on this i think would be to good to get Ubuntu jammy and kinetic to be updated to 2.87 or just backport the required fix https://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=d290630d31f4517ab26392d00753d1397f9a4114.

I could see similar segfault in neutron-linuxbridge and openvswitch jobs[4][5], reported once in syslog but didn't saw any failure due to these. But as you said issue is seen with some specific tests in ironic.

For neutron i will send a patch to add a sanity check to warn users running 2.86 version about this issue.

[1] https://review.opendev.org/c/openstack/ironic/+/888984
[2] https://9d1e095f1746de4d26ae-cb25c10c29ca7bf26ff09ad92a16fa62.ssl.cf1.rackcdn.com/888984/1/check/ironic-standalone/7debc89/controller/logs/syslog.txt
[3] https://c38d0c9156ee6cc9fd3b-d97b0a3b599d6de6d0673faefd2f08b5.ssl.cf1.rackcdn.com/888984/1/check/ironic-standalone-redfish/1cafda9/controller/logs/syslog.txt
[4] https://5d62e00bab1ce95c0ca0-ea10db30e23f6b883afe49ff4b1074ff.ssl.cf2.rackcdn.com/periodic/opendev.org/openstack/neutron/master/neutron-tempest-plugin-linuxbridge/1279e73/controller/logs/syslog.txt
[5] https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_ecb/859871/12/gate/neutron-tempest-plugin-openvswitch/ecbfc37/controller/logs/syslog.txt

Revision history for this message
Brian Haley (brian-haley) wrote :

So this also fails with version 2.89 that is in Lunar?

Revision history for this message
Julia Kreger (juliaashleykreger) wrote :

@yatin, It appears, with your newest patch to our CI jobs in Ironic, in order to just use pure upsteam source (Thanks by the way!), that the CI job failed in a specific scenario where we're attempting to validate we can boot an ISO via iPXE. That being said, the logs indicate we made it far past dhcp before it failed, and actually failed somewhere in the process of downloading the file. Why, I don't know. I can see the chunked transfers happening in the log[0] for your change [1]. You can see where ipxe fails thinking the connection timed out in the console log[2].

Anyway, tl;dr, looks unrelated to this bug, also unfortunately that is the kind of failure we would likely need to be able to reproduce to figure out further.

[0]: https://9d1e095f1746de4d26ae-cb25c10c29ca7bf26ff09ad92a16fa62.ssl.cf1.rackcdn.com/888984/1/check/ironic-standalone/7debc89/controller/logs/apache/ipxe_access_log.txt
[1]: https://review.opendev.org/c/openstack/ironic/+/888984
[2]: https://9d1e095f1746de4d26ae-cb25c10c29ca7bf26ff09ad92a16fa62.ssl.cf1.rackcdn.com/888984/1/check/ironic-standalone/7debc89/controller/logs/ironic-bm-logs/node-1_console_log.txt

Revision history for this message
Julia Kreger (juliaashleykreger) wrote :

@yatin, Also, on the prior change, the most recent run failed somewhere rooted with libvirt or uefi firmware booting. The same exact scenario test worked in your version revision. That specific test is not doing network booting, but it got the DHCP addresses as I would expect. What seemingly failed is the virutal media configuration through the emulator. Actually, looking deeper, we didn't even try. We might have just picked the wrong node which means the test could have had a bug. https://bugs.launchpad.net/ironic/+bug/2028279 has been opened for this.

It actually looks like it is a bug with the test itself, but again, entirely unrelated to dnsmasq.

Revision history for this message
yatin (yatinkarel) wrote :

<< For neutron i will send a patch to add a sanity check to warn users running 2.86 version about this issue.
Pushed https://review.opendev.org/c/openstack/neutron/+/889015?usp=search

@Julia, thanks for checking those failures and reporting 2028279. Will update and remove DNM from the test patch in ironic to use upstream source 2.87.

Revision history for this message
yatin (yatinkarel) wrote :

<< Will update and remove DNM from the test patch in ironic to use upstream source 2.87.
Proposed https://review.opendev.org/c/openstack/ironic/+/888121

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Hello and thanks for taking the time to report this bug.

I read the discussion above and would like to clarify a few things:

1) Does the segfault happen with the dnsmasq package from Lunar/Mantic? I see tasks for both systems added to this bug (and the Mantic one is set as Confirmed), but it's not clear from the messages above whether the failure really happens there.

2) Assuming that the segfault does *not* happen in Lunar/Mantic, I can prepare a PPA with the backported patch from upstream and ask you to test it.

3) If the failure *does* happen in Lunar/Mantic, we will need to investigate it further.

FWIW, Kinetic has reached its end of standard support so I will set its task as Won't Fix.

Thank you.

Changed in dnsmasq (Ubuntu Kinetic):
status: New → Won't Fix
Revision history for this message
yatin (yatinkarel) wrote (last edit ):

@sergiodj hi
<< 1) Does the segfault happen with the dnsmasq package from Lunar/Mantic? I see tasks for both systems added to this bug (and the Mantic one is set as Confirmed), but it's not clear from the messages above whether the failure really happens there.

I am not aware about any segfaults with dnsmasq packages in lunar/Mantic. The issue in Ubuntu jammy is clear with the version included, so a fix needs to be included for jammy
There were some segfaults seen when using lunar packages in jammy but i think that could be due to other issues, so unless and until we see issues with lunar packages in lunar node we could rule out this.
The bug seems to be confirmed by the bot and Mantic seems to be default tracker so that got updated

<< 2) Assuming that the segfault does *not* happen in Lunar/Mantic, I can prepare a PPA with the backported patch from upstream and ask you to test it.
Sounds good to me to have backport of the fix in Ubuntu Jammy

<< 3) If the failure *does* happen in Lunar/Mantic, we will need to investigate it further.
Unless and Until someone confirms the issue with the version included in Lunar/Mantic it we can avoid any updates in Lunar/Mantic

yatin (yatinkarel)
Changed in dnsmasq (Ubuntu Jammy):
status: New → Confirmed
Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Hello yatin,

Thanks for the reply, and apologies for the delay. I've been swamped with other work here.

Anyway, based on your feedback I went ahead and prepared an upload with the proposed patch (https://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=d290630d31f4517ab26392d00753d1397f9a4114). You can find the PPA in the following link:

https://launchpad.net/~sergiodj/+archive/ubuntu/dnsmasq

Could you please give it a try and let me know if it works for you? I still haven't had the time to try and reproduce the issue locally. BTW, if you have an easy reproducer I'd appreciate it.

Thanks.

Changed in dnsmasq (Ubuntu Lunar):
status: New → Invalid
Changed in dnsmasq (Ubuntu Mantic):
status: Confirmed → Invalid
Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Based on yatin's feedback, I am setting the status of dnsmasq's Lunar and Mantic tasks as Invalid. This bug only applies to Jammy.

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Hi,

I'm marking the status of this bug to Incomplete to reflect the fact that we're waiting for information from the reporter.

@yatin, please let me know when you are able to give my PPA a try.

Thanks.

Changed in dnsmasq (Ubuntu Jammy):
status: Confirmed → Incomplete
Revision history for this message
yatin (yatinkarel) wrote :

Thanks @Sergio, i missed those ppa links earlier, pushed test patch[1] to validate it.

[1] https://review.opendev.org/c/openstack/ironic/+/897277

Revision history for this message
yatin (yatinkarel) wrote :

@Sergio so i still see[1][2] those segfaults with those new packages:-
ii dnsmasq-base 2.86-1.1ubuntu0.4~ppa1 amd64 Small caching DNS proxy and DHCP/TFTP server
ii dnsmasq-utils 2.86-1.1ubuntu0.4~ppa1 amd64 Utilities for manipulating DHCP leases

Can you check if that patch is really applied on those packages?

[1] https://dcd105b404f93fc08fa3-82141499a48343cfb1270dc186f5ec2f.ssl.cf5.rackcdn.com/897277/1/check/ironic-standalone/1d6d1a0/controller/logs/syslog.txt
[2]
Oct 04 06:53:17 np0035409761 kernel: dnsmasq[67022]: segfault at 80c8 ip 00007f4ef40ce3fe sp 00007fff2346dab0 error 4 in libc.so.6[7f4ef4051000+195000]
Oct 04 06:53:17 np0035409761 kernel: Code: 99 13 00 e8 04 b9 ff ff 0f 1f 40 00 f3 0f 1e fa 48 85 ff 0f 84 bb 00 00 00 55 48 8d 77 f0 53 48 83 ec 18 48 8b 1d 12 3a 17 00 <48> 8b 47 f8 64 8b 2b a8 02 75 57 48 8b 15 98 39 17 00 64 48 83 3a
Oct 04 06:53:53 np0035409761 kernel: dnsmasq[77967]: segfault at d118 ip 00007f31c3b403fe sp 00007ffe7ed6cc40 error 4 in libc.so.6[7f31c3ac3000+195000]

Revision history for this message
Paride Legovini (paride) wrote :

Hello, I verified that Sergio's PPA contains the candidate upstream patch (upstream commit d290630d31f4517ab26392d00753d1397f9a4114). If the crash is still happening that probably wasn't the issue after all.

I see two possible ways forward here. One is classic git based debugging:

1. compile 2.86 from upstream git and verify that the crash happens
2. compile 2.89 from upstream and verify that the crash doesn't happen
3. use `git bisect` to find the the commit that introduced the bug.

If that's not practical [but we may have to bite the bullet here!], we could do some work guesswork after figuring out if the Ubuntu packaged dnsmasq 2.87 is buggy or not. That version is not available anymore in the Ubuntu archive, it has been in the archive at some point (in Lunar), and compiled debs are still available here:

https://launchpad.net/ubuntu/+source/dnsmasq/2.87-1.1/+build/24632779

So by testing those we'll be able to tell whether the bug has been fixed between 2.86-1.1ubuntu0.3 and 2.87-1.1 or not. You'll need to manually install those packages via `dpkg -i` (no need to mention that this is normally not recommended!).

I'd test some of this myself, but without a reproducer I won't be able to tell much.

Revision history for this message
Paride Legovini (paride) wrote (last edit ):

The bug description says "dnsmasq on Ubuntu Jammy/Lunar crashes [...]", but IIUC it's actually only the Jammy one that crashes (the lunar-on-jammy one maybe has other issues, but doesn't exhibit the crash). Am I right? In this case please update the bug description accordingly. Thanks!

Edit: I did this myself as the Lunar bug was set to Invalid already, per comment 12.

summary: - dnsmasq on Ubuntu Jammy/Lunar crashes on neutron-dhcp-agent updates
+ dnsmasq on Ubuntu Jammy crashes on neutron-dhcp-agent updates
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.