[SRU] DHCP Cluster crashes after a few hours
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
DHCP |
New
|
Unknown
|
|||
bind9-libs (Ubuntu) |
Fix Released
|
High
|
Jorge Niedbalski | ||
Focal |
Fix Released
|
High
|
Jorge Niedbalski | ||
Groovy |
Fix Released
|
High
|
Jorge Niedbalski |
Bug Description
[Description]
isc-dhcp-server uses libisc-export1105 (coming from bind9-libs package) for handling the socket event(s) when configured in peer mode (master/secondary). It's possible that a sequence of messages dispatched by the master that requires acknowledgment from its peers holds a socket
in a pending to send state, a timer or a subsequent write request can be scheduled into this socket and the !sock->pending_send assertion
will be raised when trying to write again at the time data hasn't been flushed entirely and the pending_send flag hasn't been reset to 0 state.
If this race condition happens, the following stacktrace will be
hit:
(gdb) info threads
Id Target Id Frame
* 1 Thread 0x7fb4ddecb700 (LWP 3170) __GI_raise (sig=sig@entry=6) at ../sysdeps/
2 Thread 0x7fb4dd6ca700 (LWP 3171) __lll_lock_wait (futex=
3 Thread 0x7fb4de6cc700 (LWP 3169) futex_wake (private=<optimized out>, processes_
4 Thread 0x7fb4de74f740 (LWP 3148) futex_wait_
(gdb) frame 2
#2 0x00007fb4dec85985 in isc_assertion_
cond=
(gdb) bt
#1 0x00007fb4deaa7859 in __GI_abort () at abort.c:79
#2 0x00007fb4dec85985 in isc_assertion_
cond=
#3 0x00007fb4decc17e1 in dispatch_send (sock=0x7fb4de6
#4 process_fd (writeable=
#5 process_fds (writefds=
#6 watcher (uap=0x7fb4de6d
#7 0x00007fb4dea68609 in start_thread (arg=<optimized out>) at pthread_
#8 0x00007fb4deba4103 in clone () at ../sysdeps/
(gdb) frame 3
#3 0x00007fb4decc17e1 in dispatch_send (sock=0x7fb4de6
4041 in ../../.
(gdb) p sock->pending_send
$2 = 1
[TEST CASE]
1) Install isc-dhcp-server in 2 focal machine(s).
2) Configure peer/cluster mode as follows:
Primary configuration: https:/
Secondary configuration: https:/
2) Run dhcpd as follows in both machine(s)
# dhcpd -f -d -4 -cf /etc/dhcp/
3) Leave the cluster running for a long (2h) period until the crash/race condition is reproduced.
[REGRESSION POTENTIAL]
* The fix will prevent the assertion to happen in the dispatch_send
path, later versions of isch-dhcp upstream lack this logic and entirely removed the existence of this flag. Therefore, removing the need for this
assertion at process_fd shouldn't be problematic.
Related branches
- Bryce Harrington (community): Approve
- Canonical Server: Pending requested
-
Diff: 203 lines (+62/-7)3 files modifieddebian/changelog (+33/-1)
debian/control (+7/-1)
debian/rules (+22/-5)
tags: | added: focal rls-ff-incoming |
Changed in dhcp: | |
status: | Unknown → New |
Changed in bind9-libs (Ubuntu Focal): | |
status: | New → In Progress |
Changed in bind9-libs (Ubuntu Groovy): | |
status: | New → In Progress |
Changed in isc-dhcp (Ubuntu Focal): | |
status: | New → In Progress |
Changed in isc-dhcp (Ubuntu Groovy): | |
status: | Confirmed → In Progress |
Changed in bind9-libs (Ubuntu Focal): | |
assignee: | nobody → Jorge Niedbalski (niedbalski) |
Changed in bind9-libs (Ubuntu Groovy): | |
assignee: | nobody → Jorge Niedbalski (niedbalski) |
Changed in isc-dhcp (Ubuntu Focal): | |
assignee: | nobody → Jorge Niedbalski (niedbalski) |
Changed in isc-dhcp (Ubuntu Groovy): | |
assignee: | nobody → Jorge Niedbalski (niedbalski) |
summary: |
- DHCP Cluster crashes after a few hours + [SRU] DHCP Cluster crashes after a few hours |
description: | updated |
description: | updated |
Changed in bind9-libs (Ubuntu Focal): | |
importance: | Undecided → High |
no longer affects: | isc-dhcp (Ubuntu) |
no longer affects: | isc-dhcp (Ubuntu Focal) |
no longer affects: | isc-dhcp (Ubuntu Groovy) |
Changed in bind9-libs (Ubuntu Groovy): | |
importance: | Undecided → High |
Status changed to 'Confirmed' because the bug affects multiple users.