sudo is slow (10 seconds) when hostname is not resolvable

Bug #1730744 reported by Scott Moser
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
systemd (Ubuntu)
New
Undecided
Unassigned

Bug Description

I'm seeing very slow 'sudo' times in a qemu guest when the host uses systemd-resolved.
If I change /etc/resolv.conf on the host to point directly to a dns server (8.8.8.8) then the problem goes away.

To recreate, you can download a cloud image and boot it under qemu with a command line like:
  cloud-localds seed.img user-data meta-data
  qemu-system-x86_64 -enable-kvm \
     -device virtio-net-pci,netdev=net00 \
     -netdev type=user,id=net00 \
     -drive file=root.img,id=disk00,if=none,index=0 \
     -device virtio-blk,drive=disk00,serial=root.img \
     -drive file=seed.img,id=disk01,if=none,index=1 \
     -device virtio-blk,drive=disk01,serial=seed.img \
     -m 768

Make sure that your hostname does not resolve (change it to 'bogus-host1').
Log into the guest and then just try sudo, you'll see something like:
# time sudo ls
sudo: unable to resolve host rooturl-trusty

real 0m10.029s
user 0m0.006s
sys 0m0.011s

Revision history for this message
Scott Moser (smoser) wrote :
Download full text (5.4 KiB)

Here is some information collected when I tried to debug.

==== Using resolved on host =====
## guest
$ cat /etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 10.0.2.3
search mydomain.com bar.com

ubuntu@rooturl-trusty$ time sudo /bin/true
sudo: unable to resolve host rooturl-trusty

$ time sudo /bin/true
real 0m10.021s
user 0m0.007s
sys 0m0.004s

## host
$ grep "^[^#]" /etc/resolv.conf
nameserver 127.0.0.53
search mydomain.net lan

$ sudo tcpdump -i any udp port 53
14:51:58.038103 IP localhost.55365 > localhost.domain: 36012+ A? rooturl-trusty.mydomain.com. (45)
14:51:58.038180 IP localhost.55365 > localhost.domain: 49842+ AAAA? rooturl-trusty.mydomain.com. (45)
14:51:58.038569 IP localhost.domain > localhost.55365: 36012 NXDomain 0/0/0 (45)
14:51:58.038718 IP localhost.36879 > localhost.domain: 52853+ PTR? 3.2.0.10.in-addr.arpa. (39)
14:51:58.038848 IP localhost.domain > localhost.55365: 49842 NXDomain 0/0/0 (45)
14:51:58.039125 IP localhost.54202 > localhost.domain: 32902+ A? rooturl-trusty.bar.com. (40)
14:51:58.039154 IP localhost.54202 > localhost.domain: 58975+ AAAA? rooturl-trusty.bar.com. (40)
14:51:58.039189 IP milhouse.45718 > OpenWrt.xxxxxx.net.domain: 196+ PTR? 3.2.0.10.in-addr.arpa. (39)
14:51:58.039436 IP localhost.domain > localhost.54202: 32902 0/0/0 (40)
14:51:58.039606 IP localhost.domain > localhost.54202: 58975 0/0/0 (40)
14:51:58.039828 IP localhost.39909 > localhost.domain: 14117+ A? rooturl-trusty. (32)
14:51:58.039856 IP localhost.39909 > localhost.domain: 64237+ AAAA? rooturl-trusty. (32)
14:51:58.041042 IP OpenWrt.xxxxxx.net.domain > milhouse.45718: 196 NXDomain* 0/0/0 (39)
14:51:58.137739 IP localhost.domain > localhost.36879: 52853 0/0/0 (39)
14:51:58.138622 IP localhost.37396 > localhost.domain: 9267+ PTR? 15.2.0.10.in-addr.arpa. (40)
14:51:58.139344 IP milhouse.44398 > OpenWrt.xxxxxx.net.domain: 32522+ PTR? 15.2.0.10.in-addr.arpa. (40)
14:51:58.141103 IP OpenWrt.xxxxxx.net.domain > milhouse.44398: 32522 NXDomain* 0/0/0 (40)
14:51:58.386028 IP localhost.domain > localhost.37396: 9267 0/0/0 (40)
14:51:58.633518 IP localhost.domain > localhost.39909: 14117 ServFail 0/0/0 (32)
14:52:03.043476 IP localhost.39909 > localhost.domain: 14117+ A? rooturl-trusty. (32)
14:52:03.043544 IP localhost.39909 > localhost.domain: 64237+ AAAA? rooturl-trusty. (32)
14:52:03.634060 IP localhost.domain > localhost.39909: 14117 ServFail 0/0/0 (32)

## in the guest at the time of the above, doing a tcpdump shows
# tcpdump -i any udp port 53
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
19:51:58.039452 IP 10.0.2.15.35832 > 10.0.2.3.domain: 49842+ AAAA? rooturl-trusty.mydomain.com. (45)
19:51:58.040165 IP 10.0.2.15.36362 > 10.0.2.3.domain: 52853+ PTR? 3.2.0.10.in-addr.arpa. (39)
19:51:58.040220 IP 10.0.2.3.domain > 10.0.2.15.35832: 36012 NXDomain 0/0/0 (45)
19:51:58.040469 IP 10.0.2.3.domain > 10.0.2.15.35832: 49842 NXDomain 0/0/0 (45)
19:51:58.040594 IP 10.0.2.15.55332 > 10.0.2.3.domain: 3290...

Read more...

Revision history for this message
Scott Moser (smoser) wrote :

This is more easily recreated with lxc.

Given a host with systemd-resolv:

$ lxc launch ubuntu-daily:xenial x1
$ sleep 10 # let it boot
$ lxc exec x1 -- grep '^[^#]' /etc/resolv.conf
nameserver 10.75.205.1
search lxd

$ lxc exec x1 -- hostname bogus-host
$ lxc exec x1 -- bash -c 'time sudo /bin/true'
sudo: unable to resolve host bogus-host: Connection timed out

real 0m20.042s
user 0m0.005s
sys 0m0.014s

$ lxc exec x1 -- bash -c 'time host bogus-host 10.75.205.1'
Using domain server:
Name: 10.75.205.1
Address: 10.75.205.1#53
Aliases:

Host bogus-host not found: 2(SERVFAIL)

real 0m0.780s
user 0m0.011s
sys 0m0.015s

$ lxc exec x1 -- bash -c 'time hostname --fqdn'
hostname: Temporary failure in name resolution

real 0m10.009s
user 0m0.002s
sys 0m0.000s

Revision history for this message
Scott Moser (smoser) wrote :

Somewhat interestingly, I'm not sure why I can't just reproduce this by:

$ sudo bash -c 'oname=$(hostname);
  trap "hostname $oname" EXIT;
  hostname my-bogus-hostname;
  hostname;
  time sudo /bin/true;' ; hostname

but it doesnt reproduce this way.

Revision history for this message
Steve Langasek (vorlon) wrote :

What does systemd-resolve --status show in this scenario?

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

I do wonder if we need libnss-myhostname to be installed. I hoped that with the stub resolver, we do not. But it depends if the hostname change made it into resolved state or not.

Revision history for this message
Scott Moser (smoser) wrote :
Revision history for this message
Scott Moser (smoser) wrote :

Output of recreate script:
$ /tmp/lp-1730744-reproduce-lxc
=== launch container testme50 ===
Creating testme50
Starting testme50
=== wait for boot ===
.................................1 degraded 7.00
=== set hostname to bogus value: testme50-foo ===
=== show slow sudo ===
sudo: unable to resolve host bogus-testme50: Connection timed out
real 20.02
user 0.00
sys 0.01
FAIL: 20s
=== delete-testme50 ===

Revision history for this message
Steve Langasek (vorlon) wrote :

This looks like the same as the just-diagnosed https://launchpad.net/bugs/1739672

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.