DNS Match failures from magpie

Bug #2028792 reported by Jeffrey Chang
This bug report is a duplicate of:  Bug #2024625: DNS Forward failures. Edit Remove
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Incomplete
Undecided
Unassigned

Bug Description

From https://solutions.qa.canonical.com/testruns/79a06ce5-8ae8-4348-8320-6e63f489f0af, a MAAS 3.4.0 RC1 on Arm.
An old similar bug https://bugs.launchpad.net/maas/+bug/2024625

We are seeing these DNS error from Magpie
magpie-oam-space/0* blocked idle 2 10.246.200.131 icmp ok, local hostname ok (sqa-lab2-node-3-arm), match dns failed: ['5'], iperf leader, mtu: 1500, local mtu ok, req...
magpie-oam-space/1 blocked idle 4 10.246.200.132 icmp ok, local hostname ok (sqa-lab2-node-2-arm), match dns failed: ['4', '5', '7'], net mtu ok: 1500, 946.0 mbit/s, ...
magpie-oam-space/2 blocked idle 6 10.246.200.135 icmp ok, local hostname ok (sqa-lab2-node-8-arm), match dns failed: ['5'], net mtu ok: 1500, 944.0 mbit/s, local mtu ...
magpie-oam-space/3 blocked idle 1 10.246.200.136 icmp ok, local hostname ok (sqa-lab2-node-7-arm), match dns failed: ['5'], net mtu ok: 1500, 946.0 mbit/s, local mtu ...
magpie-oam-space/4 blocked idle 5 10.246.200.134 icmp ok, local hostname ok (sqa-lab2-node-6-arm), match dns failed: ['5'], net mtu ok: 1500, 946.0 mbit/s, local mtu ...
magpie-oam-space/5 blocked idle 0 10.246.200.130 icmp ok, local hostname ok (sqa-lab2-node-1-arm), match dns failed: ['7'], net mtu ok: 1500, 946.0 mbit/s, local mtu ...
magpie-oam-space/6 blocked idle 7 10.246.200.137 icmp ok, local hostname ok (sqa-lab2-node-9-arm), match dns failed: ['5'], net mtu ok: 1500, 79.7 mbit/s, local mtu o...
magpie-oam-space/7 blocked idle 3 10.246.200.133 icmp ok, local hostname ok (sqa-lab2-node-4-arm), match dns failed: ['1', '3'], net mtu ok: 1500, 946.0 mbit/s, local...

And I can see ERROR from ccdb472c-eca4-4113-baf7-ab0143b4a427/4/baremetal/var/log/juju/unit-magpie-oam-space-1.log
'10.246.201.9\n10.246.203.6\n10.246.202.5\n10.246.200.134\n10.246.200.100'
2023-07-25 22:46:04 ERROR unit.magpie-oam-space/1.juju-log server.go:316 magpie:2: Original IP and Forward MATCH FAILED for unit_id: 5, Original: 10.246.200.130, Forward: Can not resolve hostname to IP '10.246.200.130\n10.246.201.6'
2023-07-25 22:46:05 ERROR unit.magpie-oam-space/1.juju-log server.go:316 magpie:2: Original IP and Forward MATCH FAILED for unit_id: 7, Original: 10.246.200.133, Forward: Can not resolve hostname to IP '10.246.201.8\n10.246.200.133\n10.246.202.4\n10.246.203.5'
2023-07-25 22:47:17 ERROR unit.magpie-oam-space/1.juju-log server.go:316 magpie:2: Original IP and Forward MATCH FAILED for unit_id: 2, Original: 10.246.200.135, Forward: Can not resolve hostname to IP '10.246.202.6\n10.246.200.135\n10.246.201.60\n10.246.200.120\n10.246.203.7'
2023-07-25 22:47:17 ERROR unit.magpie-oam-space/1.juju-log server.go:316 magpie:2: Original IP and Forward MATCH FAILED for unit_id: 3, Original: 10.246.200.136, Forward: Can not resolve hostname to IP '10.246.203.8\n10.246.202.7\n10.246.200.136\n10.246.201.61'
2023-07-25 22:47:17 ERROR unit.magpie-oam-space/1.juju-log server.go:316 magpie:2: Original IP and Forward MATCH FAILED for unit_id: 6, Original: 10.246.200.137, Forward: Can not resolve hostname to IP '10.246.203.9\n10.246.200.121\n10.246.202.8\n10.246.200.137\n10.246.201.62'
2023-07-25 22:47:18 ERROR unit.magpie-oam-space/1.juju-log server.go:316 magpie:2: Original IP and Forward MATCH FAILED for unit_id: 7, Original: 10.246.200.133, Forward: Can not resolve hostname to IP '10.246.201.8\n10.246.200.133\n10.246.202.4\n10.246.203.5'
2023-07-25 22:50:53 ERROR unit.magpie-oam-space/1.juju-log server.go:316 Original IP and Forward MATCH FAILED for unit_id: 4, Original: 10.246.200.134, Forward: Can not resolve hostname to IP '10.246.200.100\n10.246.203.6\n10.246.201.9\n10.246.200.134\n10.246.202.5'
2023-07-25 22:50:53 ERROR unit.magpie-oam-space/1.juju-log server.go:316 Original IP and Forward MATCH FAILED for unit_id: 5, Original: 10.246.200.130, Forward: Can not resolve hostname to IP '10.246.201.6\n10.246.200.130'
2023-07-25 22:50:53 ERROR unit.magpie-oam-space/1.juju-log server.go:316 Original IP and Forward MATCH FAILED for unit_id: 6, Original: 10.246.200.137, Forward: Can not resolve hostname to IP '10.246.202.8\n10.246.200.121\n10.246.203.9\n10.246.200.137\n10.246.201.62'
2023-07-25 22:56:39 ERROR unit.magpie-oam-space/1.juju-log server.go:316 Original IP and Forward MATCH FAILED for unit_id: 2, Original: 10.246.200.135, Forward: Can not resolve hostname to IP '10.246.200.135\n10.246.201.60\n10.246.200.120\n10.246.203.7\n10.246.202.6'
2023-07-25 22:56:39 ERROR unit.magpie-oam-space/1.juju-log server.go:316 Original IP and Forward MATCH FAILED for unit_id: 3, Original: 10.246.200.136, Forward: Can not resolve hostname to IP '10.246.200.136\n10.246.203.8\n10.246.202.7\n10.246.201.61'

description: updated
Revision history for this message
Bill Wear (billwear) wrote :

can you check if the version of magpie causing this error has the fix from the previous bug? and if that isn't it, can you send and sosreport? there might be other steps down the road (DNS log, turn on debugging in /etc/resolv.conf, etc.), but can we just try these two things first?

Changed in maas:
status: New → Incomplete
Revision history for this message
Jeffrey Chang (modern911) wrote (last edit ):

Interesting, magpie was installed from latest/edge channel, but I see different revision on different arch.
I see revision 14 on ARM, and rev 25 on AMD64, and both ran yesterday.
Juju info gives me 2023-07-19 rev 29 on latest/edge.

All 4 runs on ARM with 3.4.0 failed with identical error, and AMD64 runs are all ok.
https://solutions.qa.canonical.com/bugs/2028792 listed 4 failed runs.
One of the passing AMD64 run here, https://solutions.qa.canonical.com/testruns/e06ab8f4-0f80-478e-82ab-9d5e4660c572

Update: MAAS 3.4.0 snap test on AMD64 show rev 24, and has error similar to LP#2024625.
see https://solutions.qa.canonical.com/testruns/3cae3a51-56ae-4e60-b69e-5ba528086183

Alberto Donato (ack)
Changed in maas:
milestone: none → 3.4.0-rc1
Revision history for this message
Alberto Donato (ack) wrote :

Revision 29 of the charm should have the fix. Could you please run the tests again with rev29?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.