dnsmasq fails when the ARP cache is full
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
dnsmasq (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
Test setup:
OS: Ubuntu 16.04
Hardware: D15_v2 VM on Azure
Steps to reproduce:
1) sudo apt-get install dnsmasq
2) sudo sysctl -w net.ipv4.
3) sudo sysctl -w net.ipv4.
4) sudo sysctl -w net.ipv4.
5) dig @127.0.0.1 google.com
Result:
~$ dig @127.0.0.1 google.com
../../.
../../.
../../.
; <<>> DiG 9.10.3-P4-Ubuntu <<>> @127.0.0.1 google.com
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached
However, an external DNS server still works fine (dig @8.8.8.8 google.com, for example).
We discovered this as the default max ARP cache size is 1024, and we're running a large cluster with a lot of intra-cluster network traffic. Increasing the size of the ARP cache solves this problem, but it seems like dnsmasq should still work and just be slow, like other applications (curl for example just takes longer to connect)
Hi Christopher,
thanks for the report and the nice steps to reproduce.
I can absolutely confirm your finding.
I checked up to latest dnsmasq as it is in the current development release (Artful).
That is 2.77 from 01-Jun-2017 so really not too old :-)
I appreciate the quality of this bug report and I'm sure it'll be helpful to others experiencing the same issue. But I checked and neither Ubuntu nor Debian have any patches on top of upstream dnsmasq.
Thereby this sounds like an upstream bug to me. The best route to getting it fixed in Ubuntu (and actually everywhere) in this case would be to file an upstream bug if you're able to do that.
If you do end up filing an upstream bug, please link to it from here - that would be awesome. Thanks in advance!