invoking dhclient3 with -1 causes issue if no dhcp server available
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
isc-dhcp (Ubuntu) |
Fix Released
|
High
|
Stéphane Graber | ||
Precise |
Fix Released
|
High
|
Stéphane Graber | ||
Quantal |
Fix Released
|
High
|
Stéphane Graber |
Bug Description
[rational]
A patch was designed to fix this bug back in precise but because of where it was put in debian/
In quantal, the reverting code in debian/rules is now gone, so it's been applied ever since the 4.2 merge, so far without hearing any problem with it.
[test case]
1) Start dhclient -1 <interface> on a working network
2) Unplug the network cable or stop the DHCP server
3) Wait for the lease to expire
4) Check that dhclient tries to get a new lease and when failing, keeps trying
The original behaviour of -1 would make 4) try just a single time, then give up, causing dhclient to remove all addresses and exit on a machine that was unable to reach its dhcp server for >= expiry.
[regression potential]
This change is definitely causing a slight change in behaviour, though based on this bug report and others, it's believe to be the wanted behaviour of -1 for most of our users.
The change itself has been applied to quantal without any regression and was tested on 12.04 in the past (before I messed up the ordering in the final upload ...).
The code change itself just makes "-1" use the same renewal behaviour as when called without "-1" (but still follows the standard "-1" behaviour for the first request).
In bug 838968, we modified ifupdown to invoke dhclient3 with '-1' as a parameter [1], and subsequently changed the default timeout of dhclient in isc-dhcp3 to from 60 seconds to 300 seconds [2].
The reason for this is that we now have a reliable "static-
That event is used by cloud-init and other things that depend on network.
The fallout of this is that if for some reason a server (or cloud-instance, or anything really), boots and does not obtain a dhcp address in 5 minutes, then it will give up forever. The previous behavior is that it would try forever.
This scenario isn't terribly unrealistic. A power fail could take out a dchp server, cause a fsck, while the server came up 5 minutes before the dhcp server was up.
Issue was originally raised in #openstack-dev by rmk around 2012-04-05T06:42:19 [3]
--
[1] http://
[2] http://
[3] http://
Releated bugs:
* bug 838968: static-network-up event does not wait for interfaces to have an address
The problem is actually more prevalent than just boot time. In fact, if the DHCP server goes away at any point and the lease expires, dhclient will timeout and exit without ever retrying again. The result is that the system in question basically falls off the network, with the only way to recover being manual intervention. To be clear, this means recovery requires physical access or remote management capabilities.
It's straightforward to reproduce the problem. Configure an Ubuntu system as a DHCP client and set the lease time on the server to something short, like 60 seconds. After the system has successfully booted and grabbed an IP, shut down the DHCP server. Within 2 minutes, the dhcp client will timeout, lose it's IP address and drop off the network with no chance of recovering automatically.