bind9 slow response after netplan apply
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
bind9 (Ubuntu) |
Confirmed
|
High
|
Unassigned | ||
netplan.io (Ubuntu) |
Incomplete
|
Undecided
|
Unassigned |
Bug Description
System:
VM running on ESXI 6.0
Description: Ubuntu 18.04.1 LTS
Release: 18.04
Package:
bind9:
Installed: 1:9.11.
Candidate: 1:9.11.
Version table:
*** 1:9.11.
500 http://
100 /var/lib/
1:
500 http://
1:
500 http://
3. Expected to happen: After issuing the command "sudo netplan apply" with no network changes bind to continue to run as it had.
This happened once to reapply config, and once during a "Daily apt upgrade and clean activities"
4. After the netplan apply completed within 2 minutes we started seeing timeouts and long response times from the server. We have 2 identical builds currently running as caching servers for a large network, the servers were built on the same day and have both experienced the issue. These servers are under heavy load and are replying to 100's of queries a second or more. When investigating the logs for bind and the syslog there are no indications of maximum number of connections, maximum open files, or any other limits reached. Though there are noticed dropped packets and external monitoring on the bind service begins to flap, and manual testing shows some queries timing out.
Issuing a "sudo systemctl restart bind9" instantly resolves the issue.
If there is any other information you need please let me know, but I am unsure of where to look as the named log, kernel log, and syslog are all clear of errors during the timeout issue.
What does /etc/resolv.conf look like? Is it using bind9 as the nameserver, or systemd-resolve (127.0.0.53)? You might not even be using bind9 directly, but via systemd-resolve, so that changes how this should be debugged.