Long ironic timeouts because of ServFail DNS error

Bug #1572201 reported by Filip Hubík
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ironic
Triaged
Low
Unassigned
network-manager (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Description of problem:
When ironic (undercloud) is not able to get reverse DNS entry for IP assigned to br-ctlplane (doesn't even receive NXDomain error message in time, e.g. DNS server is misconfigured, connectivity issues, ...), all ironic commands take very long to execute (they will time out, but they still succeed).

[undercloud]: $ time ironic-node list
+--------------------------------------+------+---------------+-------------+--------------------+-------------+
| UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+------+---------------+-------------+--------------------+-------------+
...
real 0m55.383s
user 0m0.248s
sys 0m0.043s

Version-Release number of selected component (if applicable):
Tested on OSP director 8

How reproducible (example with IP 10.100.100.1):
[undercloud]: $ ip a
...
7: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    link/ether <macaddr> brd ff:ff:ff:ff:ff:ff
    inet 10.100.100.1/24 brd 10.100.100.255 scope global br-ctlplane
       valid_lft forever preferred_lft forever
...

Configure your DNS server to not respond (even with NXDOMAIN) for 10.100.100.1:

[undercloud]: $ time host 10.100.100.1
;; connection timed out; no servers could be reached
real 0m14.005s
user 0m0.003s
sys 0m0.003s

[undercloud]: $ time dig -x 10.100.100.1
...
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 20304
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
...
;; connection timed out; no servers could be reached
real 0m21.007s
user 0m0.003s
sys 0m0.004s

[undercloud]: $ time nslookup 10.100.100.1
;; connection timed out; trying next origin
;; connection timed out; trying next origin
;; Got SERVFAIL reply from XYZ, trying next server
;; connection timed out; trying next origin
;; connection timed out; trying next origin
;; connection timed out; no servers could be reached
real 0m50.008s
user 0m0.002s
sys 0m0.009

Actual results:
Ironic commands can take 20-60 seconds per one in this case

Expected results:
Ironic should have mechanism to deal with this, commands shouldn't take tens of seconds rather than milliseconds:
[undercloud]: $ time ironic-node list
+--------------------------------------+------+---------------+-------------+--------------------+-------------+
| UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+------+---------------+-------------+--------------------+-------------+
...
real 0m0.393s
user 0m0.244s
sys 0m0.041s

Originaly created: https://bugzilla.redhat.com/show_bug.cgi?id=1328143

Revision history for this message
Jim Rollenhagen (jim-rollenhagen) wrote :

This was mistakenly assigned to network-manager.

Changed in network-manager (Ubuntu):
status: New → Invalid
Revision history for this message
Dmitry Tantsur (divius) wrote :

Note that this issue is likely to be fixed in Mitaka.

Changed in ironic:
status: New → Triaged
importance: Undecided → Low
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.