init.d controlled services launch before all interfaces are up, thus failing to start

Bug #580319 reported by Michael R. Head
152
This bug affects 29 people
Affects Status Importance Assigned to Milestone
dhcp3 (Ubuntu)
Invalid
Medium
Unassigned
Lucid
Won't Fix
Medium
Unassigned
upstart (Ubuntu)
Fix Released
Medium
Clint Byrum
Lucid
Won't Fix
Undecided
Unassigned

Bug Description

My server's (primarily a NAT box/802.11g hotspot) network configuration:

Ubuntu release: 10.04 LTS
External interface: eth3
Internal interface: br0 (bridging eth2 and wlan0)

I run dhcp3-server on my internal interface to supply private (192.168.*) IP addresses to wired and wireless devices throughout my apartment.

I recently upgraded this server from 8.04 LTS to 10.04 LTS. After the upgrade I've faced some problems with the startup process. The issue I'm reporting here has to do with dhcp3-server failing to start during boot.

From what I can tell, upstart begins bringing up br0, which takes perhaps 15 seconds. While br0 is coming up (and still has an address of 0.0.0.0), the dhcp3-server init script is started, but fails because 0.0.0.0 is not a declared network and dhcpd fails to start, reporting:
May 13 22:29:40 firewall dhcpd: No subnet declaration for br0 (0.0.0.0).
May 13 22:29:40 firewall dhcpd: ** Ignoring requests on br0. If this is not what
May 13 22:29:40 firewall dhcpd: you want, please write a subnet declaration
May 13 22:29:40 firewall dhcpd: in your dhcpd.conf file for the network segment
May 13 22:29:40 firewall dhcpd: to which interface br0 is attached. **
May 13 22:29:40 firewall dhcpd:
May 13 22:29:40 firewall dhcpd:
May 13 22:29:40 firewall dhcpd: Not configured to listen on any interfaces!

It seems that dhcpd should start (or restart) when upstart brings up the interfaces it is configured to listen on.

Tags: server-nro
Revision history for this message
Michael R. Head (burner) wrote :

Relevant section of /var/log/syslog showing failure of dhcp3 server to start

Revision history for this message
Chuck Short (zulcss) wrote :

Thanks for the bug report, Ill take a closer look at this.

Regards
chuck

Changed in dhcp3 (Ubuntu):
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
Ulrik Hørlyk Hjort (ulrik-hoerlyk-hjort) wrote :

I made the following (hmm ... ugly ) workaround on my system until an update for dhcp3 is available:

Install a service for following python script in init.d. The script will spin until the bridge interface is up and running (looking for the routing table) and will the start the dhcp server. Remember to change your bridge interface name in the line:

      find_res = res.find("dev br0")

if it is different from dr0

/Ulrik

========================================================================
#!/usr/bin/python

import os
import subprocess

cmd = "ip route ls"

find_res = -1

while find_res == -1:

      p = subprocess.Popen(cmd, shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE, close_fds=True)

      fin, fout = p.stdin, p.stdout

      res = fout.read()

      find_res = res.find("dev br0")

      time.sleep(1)

os.system("/etc/init.d/dhcp3-server start")

Revision history for this message
Dan Dofton (ddofton) wrote :

My work-around for this problem was to remove network-manager, then configure /etc/networking/interfaces.

Revision history for this message
Nick_Hill (nick-nickhill) wrote :

Perhaps the best way to fix this is to make a /etc/init/dhcp3-server.conf file for upstart,, and remove /etc/rc2.d/Sxxdhcp3-server.

Revision history for this message
Ian McMichael (ian-sigma-uk) wrote :

Confirmed as still being an issue on a fresh installation of Maverick (10.10) RC AM64 server.

Revision history for this message
Ian McMichael (ian-sigma-uk) wrote :

As another proposed workaround, I have found that editing /etc/init/rc-sysinit.conf to change the line:

    start on filesystem and net-device-up IFACE=lo

to:

    start on filesystem and net-device-up IFACE=br0

also appears to produce the desired result by delaying running of legacy /etc/init.d scripts until after the bridge interface comes up. I am not sure if this will have any other undesirable effects and clearly it would not work on a system without a bridge configured.

The best solution still has to be to migrate dhcp3-server to upstart but I hope this helps someone else out in the interim.

Revision history for this message
KBios (kbios) wrote :

Please note that simply making an upstart config file starting on "started networking" will NOT work, as that signal is emitted before bridges are brought up. The problem may be solved by integrating upstart conversion with Ian McMichael solution, and making dhcp3 start when net-device-up has been emitted for the interfaces specified in /etc/default/dhcp3-server.

Changed in dhcp3 (Ubuntu Maverick):
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Rubens de Souza Matos Júnior (rubens-matos) wrote :

The workaround pointed out by Ian McMichael worked for me, using Ubuntu Server 10.04 (Lucid).

Changed in dhcp3 (Ubuntu Natty):
status: Confirmed → Triaged
Changed in dhcp3 (Ubuntu Maverick):
status: Confirmed → Triaged
Revision history for this message
rafalmag (rafalmag) wrote :

I wrote such upstart script :

description "Dhcp server"
author "Rafal Magda"

start on filesystem and net-device-up IFACE=br0
stop on runlevel[!2345]

respawn

expect fork
exec dhcpd3

Put in in /etc/init/dhcp.conf
dhcpd will be started when interface br0 will be up.

Iface ("br0") should be the same as in /etc/default/dhcp3-server . Maybe there is way to put it there automatically... ?

Revision history for this message
Ben Selinger (blistovmhz) wrote :

Confirmed here as well.

Dave Walker (davewalker)
Changed in dhcp3 (Ubuntu Natty):
milestone: none → natty-alpha-3
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Seems to me that the appropriate time to start dhcpd is when all of the devices it is configured on are up.

The issue here is that /etc/rc2.d/S* are run after this condition is met:

start on filesystem and net-device-up IFACE=lo

Because ifup -a is run in parallel with mountall essentially this is a race between ifup -a bringing up its interfaces and all filesystems mounting. It probably works fine w/ a regular ethernet interface, but like the reporter says, this takes a bit longer for a bridge.

We probably do need to move dhcp server to upstart, *or* rethink the start on for rc-sysinit. In fact, since lo is always configured when ifup -a is run, and 'started networking' is emitted after ifup -a exits with success, it would make sense to change rc-sysinit to this:

start on filesystem and started networking

This would probably solve a lot of issues with server network services that need all interfaces to be up and configured. On the desktop, this could slow the boot by the time between emitting 'net-device-up' for lo and ifup -a exitting.. which I would suspect is in the ballpark of "a few milliseconds" at the most. However there are plenty of other things happening in parallel at this point so its not a total loss.

This is a somewhat radical change, and probably not going to be SRU-able. For the stable releases, it may be better for users to modify /etc/init/rc-sysinit.conf themselves to enact this behavior now.

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Adding upstart task since rc-sysinit.conf is owned by the upstart package, and IMO, its running too soon.

Revision history for this message
Martin Pitt (pitti) wrote :

If we are going to change the rc-sysinit startup condition, this needs to happen before beta-1.

Changed in upstart (Ubuntu Natty):
assignee: nobody → Canonical Foundations Team (canonical-foundations)
milestone: none → ubuntu-11.04-beta-1
Changed in dhcp3 (Ubuntu Natty):
milestone: natty-alpha-3 → ubuntu-11.04-beta-1
Revision history for this message
Brian Murray (brian-murray) wrote :

Adding a Lucid task as I've run into this on Lucid.

Changed in dhcp3 (Ubuntu Lucid):
importance: Undecided → Medium
status: New → Triaged
Changed in upstart (Ubuntu Natty):
assignee: Canonical Foundations Team (canonical-foundations) → Robbie Williamson (robbie.w)
assignee: Robbie Williamson (robbie.w) → James Hunt (jamesodhunt)
status: New → Triaged
importance: Undecided → Medium
Dave Walker (davewalker)
tags: added: server-nro
Revision history for this message
Dave Walker (davewalker) wrote :

Does this specific issue also relate to isc-dhcp ?

Revision history for this message
Clint Byrum (clint-fewbar) wrote : Re: [Bug 580319] Re: dhcp3-server launches before upstart brings all interface, thus failing to start

I spoke with Keybuk briefly about changing the startup conditions for
runlevel 2, and he warned that there may be one unforseen circumstance.

Its possible right now to have a sysvinit script that supports a network
interface coming up. Meaning that there may be an interface that will
spin while coming up until a sysvinit daemon starts, blocking ifup -a
from completing (and therefor delaying 'started networking'). This can
happen because we start runlevel 2 as soon as we have a loopback device,
so ifup continues beyond that in parallel with sysvinit.

So, we may need to review all ifup pre/post scripts to make sure they
don't wait on sysvinit scripts, and if they do, change them to fork into
the background to do so. We'd also need to document this behavioral
change in the release notes so people can test any custom setups before
upgrade.

On Mon, 2011-03-14 at 23:37 +0000, Dave Walker wrote:
> Does this specific issue also relate to isc-dhcp ?
>

Revision history for this message
carloslp (carloslp) wrote : Re: dhcp3-server launches before upstart brings all interface, thus failing to start

I am running a server with Ubuntu 10.04 LTS and I switched eth0 into br0 in order to be able to use it as a bridge for OpenVPN TAP device.

Since I did the change dhcpd refuses to start at boot time with this message but not only dhcp fails, also Samba stop working.

Both nmbd and smbd daemons (from Samba) don't bind the tcp ports to the interface correctly at boot time and a manual restart is required.

Colin Watson (cjwatson)
Changed in upstart (Ubuntu Natty):
milestone: ubuntu-11.04-beta-1 → ubuntu-11.04-beta-2
Changed in dhcp3 (Ubuntu Natty):
milestone: ubuntu-11.04-beta-1 → ubuntu-11.04-beta-2
Colin Watson (cjwatson)
Changed in dhcp3 (Ubuntu Natty):
milestone: ubuntu-11.04-beta-2 → ubuntu-11.04
Changed in upstart (Ubuntu Natty):
milestone: ubuntu-11.04-beta-2 → ubuntu-11.04
Revision history for this message
Dave Walker (davewalker) wrote :

Please can someone confirm if they have encountered this with isc-dhcp (v4)?

Thanks.

Revision history for this message
Dave Walker (davewalker) wrote :

dhcp3 is no longer in Natty archive, it's been replaced by a transitional package to isc-dhcp (v4), marking Natty task invalid for dhcp3.

Changed in dhcp3 (Ubuntu Natty):
status: Triaged → Invalid
Revision history for this message
carloslp (carloslp) wrote :

And what happens with users of Ubuntu 10.04 LTS???

I am running a server with Ubuntu 10.04 *LTS* which I thought that was supported for up to 5 years and now you are telling me that you will not fix this because you are replacing dhcp3 with dhcp4 on the next Ubuntu release? Isn't this crazy?

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Carlos, apologies if our status change has alarmed you. The status is invalid in Natty only, because its been removed from natty.

Notice that it is status == "Triaged" in Lucid which means we plan to fix it. I've also targetted it at 10.04.3, which is the next stable point release for 10.04.

This is actually good news for an update to 10.04 because it means the fix can be done since it is "resolved" in the development release. We always make sure that the dev release is resolved first so that the issue does not regress in future releases.

Note that you can work around the issue by changing /etc/init/rc-sysinit.conf to be

start on filesystem and started networking

But of course, *TEST* that configuration fully before putting it into production!

Changed in dhcp3 (Ubuntu Lucid):
milestone: none → ubuntu-10.04.3
Revision history for this message
carloslp (carloslp) wrote :

That finally fixed the issue. I had tested it and now dhcp3-server starts properly at boot. Thanks!

Nevertheless smbd daemon still fails to bind to the interface and I had to start it manually.

Revision history for this message
Colin Watson (cjwatson) wrote :

<cjwatson> jhunt: is anything happening with bug 580319 for 11.04?
<jhunt> not at the present. I spoke to Daviey earlier and since we are unsure if it occurs with isc-dhcp (v4) at this stage, we don't think it's in the running.
<jhunt> Presumably the milestone might need changing?
<cjwatson> jhunt: yes, that sounds like a good idea if it's no longer believed to occur
<cjwatson> probably simply unmilestone it

Changed in dhcp3 (Ubuntu Natty):
milestone: ubuntu-11.04 → none
Changed in upstart (Ubuntu Natty):
milestone: ubuntu-11.04 → none
summary: - dhcp3-server launches before upstart brings all interface, thus failing
- to start
+ dhcp3-server and other init.d controled services launch before upstart
+ brings all interface, thus failing to start
Changed in upstart (Ubuntu Lucid):
status: New → Confirmed
Changed in upstart (Ubuntu Maverick):
status: New → Confirmed
Revision history for this message
Clint Byrum (clint-fewbar) wrote : Re: dhcp3-server and other init.d controled services launch before upstart brings all interface, thus failing to start

So, there is some progress on this bug. A new event called 'static-network-up' was added to oneiric today, which will only be emitted once all 'auto' interfaces from /etc/network/interfaces are up.

The next step is to delay /etc/init/rc-sysinit.conf until this event is emitted, and then also provide some kind of fail-safe for this event to be emitted when some of the interfaces cannot be brought up.

Changed in upstart (Ubuntu):
assignee: James Hunt (jamesodhunt) → Clint Byrum (clint-fewbar)
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Reassigning to me, as I have been working on the fix.

Changed in upstart (Ubuntu):
status: Triaged → In Progress
milestone: none → ubuntu-11.10-beta-1
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

milestoned to beta 1 to make sure this happens before beta.

summary: - dhcp3-server and other init.d controled services launch before upstart
- brings all interface, thus failing to start
+ init.d controled services launch before all interfaces are up, thus
+ failing to start
summary: - init.d controled services launch before all interfaces are up, thus
+ init.d controlled services launch before all interfaces are up, thus
failing to start
Revision history for this message
Scott Moser (smoser) wrote :

Just for reference, there was code committed under bug 810044 to ifupdown that is expected to be used to fix this issue.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package upstart - 1.3-0ubuntu6

---------------
upstart (1.3-0ubuntu6) oneiric; urgency=low

  [ Steve Langasek ]
  * Fix maintainer field to be compliant with policy definition

  [ Clint Byrum ]
  * conf/rc.conf: document events that are emitted by sysvinit
    jobs to quiet 'initctl check-config'
  * extra/conf/upstart-udev-bridge.conf: narrow definition to
    only the events actually emitted. (LP: #819928)
  * debian/conf/failsafe.conf: new job for critical services to
    start on.
  * conf/rc-sysinit.conf: start after static-network-up or failsafe
    so that runlevel 2 is only entered with all static net interfaces
    up. (LP: #580319)
 -- Clint Byrum <email address hidden> Wed, 10 Aug 2011 08:44:43 -0500

Changed in upstart (Ubuntu):
status: In Progress → Fix Released
Changed in dhcp3 (Ubuntu Lucid):
milestone: ubuntu-10.04.3 → ubuntu-10.04.4
Revision history for this message
Martin Pitt (pitti) wrote :

Dropping 10.04.4 milestone as per discussion with Clint (not safe enough to SRU yet).

Changed in dhcp3 (Ubuntu Lucid):
milestone: ubuntu-10.04.4 → none
Revision history for this message
Adolfo Jayme Barrientos (fitojb) wrote :

(Untargetting end-of-life releases)

no longer affects: dhcp3 (Ubuntu Maverick)
no longer affects: dhcp3 (Ubuntu Natty)
no longer affects: upstart (Ubuntu Maverick)
no longer affects: upstart (Ubuntu Natty)
Revision history for this message
Rolf Leggewie (r0lf) wrote :

lucid has seen the end of its life and is no longer receiving any updates. Marking the lucid task for this ticket as "Won't Fix".

Changed in dhcp3 (Ubuntu Lucid):
status: Triaged → Won't Fix
Rolf Leggewie (r0lf)
Changed in upstart (Ubuntu Lucid):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Related blueprints

Remote bug watches

Bug watches keep track of this bug in other bug trackers.