Activity log for bug #1872118

Date Who What changed Old value New value Message
2020-04-10 18:19:24 Andrew Welham bug added bug
2020-04-27 14:48:27 Launchpad Janitor isc-dhcp (Ubuntu): status New Confirmed
2020-05-14 18:23:50 Robert Stroetgen bug added subscriber Robert Stroetgen
2020-05-15 19:39:26 Stephan Spees bug added subscriber Stephan Spees
2020-05-31 07:02:16 Ross bug added subscriber Ross
2020-06-05 03:32:30 atom bug added subscriber atom
2020-06-15 15:40:40 Brian Murray tags focal rls-ff-incoming
2020-06-19 16:56:33 Konstantin bug added subscriber Konstantin
2020-06-21 20:51:50 Peter A. Bigot bug added subscriber Peter A. Bigot
2020-07-31 11:42:23 Richard Laager bug added subscriber Richard Laager
2020-08-01 20:31:40 Richard Laager bug watch added https://gitlab.isc.org/isc-projects/dhcp/issues/128
2020-08-01 20:31:40 Richard Laager bug task added dhcp
2020-08-01 23:17:52 Bug Watch Updater dhcp: status Unknown New
2020-08-04 01:55:43 Jorge Niedbalski nominated for series Ubuntu Groovy
2020-08-04 01:55:43 Jorge Niedbalski bug task added isc-dhcp (Ubuntu Groovy)
2020-08-04 01:55:43 Jorge Niedbalski nominated for series Ubuntu Focal
2020-08-04 01:55:43 Jorge Niedbalski bug task added isc-dhcp (Ubuntu Focal)
2020-08-04 01:56:32 Jorge Niedbalski bug task added bind9-libs (Ubuntu)
2020-08-04 03:01:05 Jorge Niedbalski bug watch added https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=430065
2020-08-04 16:14:22 Jorge Niedbalski bug added subscriber Jorge Niedbalski
2020-08-05 17:36:54 Andrew Welham attachment added _usr_sbin_dhcpd.0.crash https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1872118/+attachment/5398988/+files/_usr_sbin_dhcpd.0.crash
2020-08-06 20:02:07 Richard Laager bug watch added https://gitlab.isc.org/isc-projects/dhcp/-/issues/121
2020-08-11 17:49:59 Jorge Niedbalski bind9-libs (Ubuntu Focal): status New In Progress
2020-08-11 17:50:03 Jorge Niedbalski bind9-libs (Ubuntu Groovy): status New In Progress
2020-08-11 17:50:08 Jorge Niedbalski isc-dhcp (Ubuntu Focal): status New In Progress
2020-08-11 17:50:11 Jorge Niedbalski isc-dhcp (Ubuntu Groovy): status Confirmed In Progress
2020-08-11 17:50:14 Jorge Niedbalski bind9-libs (Ubuntu Focal): assignee Jorge Niedbalski (niedbalski)
2020-08-11 17:50:17 Jorge Niedbalski bind9-libs (Ubuntu Groovy): assignee Jorge Niedbalski (niedbalski)
2020-08-11 17:50:20 Jorge Niedbalski isc-dhcp (Ubuntu Focal): assignee Jorge Niedbalski (niedbalski)
2020-08-11 17:50:26 Jorge Niedbalski isc-dhcp (Ubuntu Groovy): assignee Jorge Niedbalski (niedbalski)
2020-08-11 19:15:47 Jorge Niedbalski summary DHCP Cluster crashes after a few hours [SRU] DHCP Cluster crashes after a few hours
2020-08-11 19:38:25 Jorge Niedbalski description I have a pair of DHCP serevrs running in a cluster on ubuntu 20.04, All worked perfectly until recently, when they started stopping with code=killed, status=6/ABRT. This is being fixed by https://bugs.launchpad.net/bugs/1870729 However now one stops after a few hours with the following errors. One can stay on line but not both. Syslog shows Apr 10 17:20:15 dhcp-primary sh[6828]: ../../../../lib/isc/unix/socket.c:3361: INSIST(!sock->pending_send) failed, back trace Apr 10 17:20:15 dhcp-primary sh[6828]: #0 0x7fbe78702a4a in ?? Apr 10 17:20:15 dhcp-primary sh[6828]: #1 0x7fbe78702980 in ?? Apr 10 17:20:15 dhcp-primary sh[6828]: #2 0x7fbe7873e7e1 in ?? Apr 10 17:20:15 dhcp-primary sh[6828]: #3 0x7fbe784e5609 in ?? Apr 10 17:20:15 dhcp-primary sh[6828]: #4 0x7fbe78621103 in ?? nothing in kern.log apport.log shows ERROR: apport (pid 6850) Fri Apr 10 17:20:15 2020: called for pid 6828, signal 6, core limit 0, dump mode 2 ERROR: apport (pid 6850) Fri Apr 10 17:20:15 2020: not creating core for pid with dump mode of 2 ERROR: apport (pid 6850) Fri Apr 10 17:20:15 2020: executable: /usr/sbin/dhcpd (command line "dhcpd -user dhcpd -group dhcpd -f -4 -pf /run/dhcp-server/dhcpd.pid -cf /etc/dhcp/dhcpd.conf") ERROR: apport (pid 6850) Fri Apr 10 17:20:15 2020: is_closing_session(): no DBUS_SESSION_BUS_ADDRESS in environment ERROR: apport (pid 6850) Fri Apr 10 17:20:15 2020: wrote report /var/crash/_usr_sbin_dhcpd.0.crash /var/crash/_usr_sbin_dhcpd.0.crash shows ProblemType: Crash Architecture: amd64 CrashCounter: 1 Date: Fri Apr 10 17:20:15 2020 DistroRelease: Ubuntu 20.04 ExecutablePath: /usr/sbin/dhcpd ExecutableTimestamp: 1586210315 ProcCmdline: dhcpd -user dhcpd -group dhcpd -f -4 -pf /run/dhcp-server/dhcpd.pid -cf /etc/dhcp/dhcpd.conf ProcEnviron: Error: [Errno 13] Permission denied: 'environ' ProcMaps: Error: [Errno 13] Permission denied: 'maps' ProcStatus: Name: dhcpd Umask: 0022 State: D (disk sleep) Tgid: 6828 Ngid: 0 Pid: 6828 PPid: 1 TracerPid: 0 Uid: 113 113 113 113 Gid: 118 118 118 118 FDSize: 128 Groups: NStgid: 6828 NSpid: 6828 NSpgid: 6828 NSsid: 6828 VmPeak: 236244 kB VmSize: 170764 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 12064 kB VmRSS: 12064 kB RssAnon: 5940 kB RssFile: 6124 kB RssShmem: 0 kB VmData: 30792 kB VmStk: 132 kB VmExe: 592 kB VmLib: 5424 kB VmPTE: 76 kB VmSwap: 0 kB HugetlbPages: 0 kB CoreDumping: 1 THP_enabled: 1 Threads: 4 SigQ: 0/7609 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: 0000000000001000 SigCgt: 0000000180000000 CapInh: 0000000000000000 CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: 0000003fffffffff CapAmb: 0000000000000000 NoNewPrivs: 0 Seccomp: 0 Speculation_Store_Bypass: thread vulnerable Cpus_allowed: 3 Cpus_allowed_list: 0-1 Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0000 0000,00000000,00000000,00000001 Mems_allowed_list: 0 voluntary_ctxt_switches: 111 nonvoluntary_ctxt_switches: 144 Signal: 6 Uname: Linux 5.4.0-21-generic x86_64 UserGroups: [Description] isc-dhcp-server uses libisc-export (coming from bin9-libs package) for handling the socket event(s) when configured in peer mode (master/secondary). It's possible that a sequence of messages dispatched by the master that requires acknowledgment from its peers holds a socket in a pending to send state, a timer or a subsequent write request can be scheduled into this socket and the !sock->pending_send assertion will be raised when trying to write again at the time data hasn't been flushed entirely and the pending_send flag hasn't been reset to 0 state. If this race condition happens, the following stacktrace will be hit: (gdb) info threads Id Target Id Frame * 1 Thread 0x7fb4ddecb700 (LWP 3170) __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 2 Thread 0x7fb4dd6ca700 (LWP 3171) __lll_lock_wait (futex=futex@entry=0x7fb4de6d2028, private=0) at lowlevellock.c:52 3 Thread 0x7fb4de6cc700 (LWP 3169) futex_wake (private=<optimized out>, processes_to_wake=1, futex_word=<optimized out>) at ../sysdeps/nptl/futex-internal.h:364 4 Thread 0x7fb4de74f740 (LWP 3148) futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fb4de6cd0d0) at ../sysdeps/nptl/futex-internal.h:183 (gdb) frame 2 #2 0x00007fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 (gdb) bt #1 0x00007fb4deaa7859 in __GI_abort () at abort.c:79 #2 0x00007fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 #3 0x00007fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 #4 process_fd (writeable=<optimized out>, readable=<optimized out>, fd=11, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4054 #5 process_fds (writefds=<optimized out>, readfds=0x7fb4de6d1090, maxfd=13, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4211 #6 watcher (uap=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4397 #7 0x00007fb4dea68609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #8 0x00007fb4deba4103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) frame 3 #3 0x00007fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 4041 in ../../../../lib/isc/unix/socket.c (gdb) p sock->pending_send $2 = 1 [TEST CASE] 1) Install isc-dhcp-server in 2 focal machine(s). 2) Configure peer/cluster mode as follows: Primary configuration: https://pastebin.ubuntu.com/p/XYj648MghK/ Secondary configuration: https://pastebin.ubuntu.com/p/PYkcshZCWK/ 2) Run dhcpd as follows in both machine(s) # dhcpd -f -d -4 -cf /etc/dhcp/dhcpd.conf --no-pid ens4 3) Leave the cluster running for a long (2h) period until the crash/race condition is reproduced. [REGRESSION POTENTIAL] * The fix will prevent the assertion to happen in the dispatch_send path, later versions of isch-dhcp upstream lack this logic and entirely removed the existence of this flag. Therefore, removing the need for this assertion at process_fd shouldn't be problematic.
2020-08-11 19:40:05 Jorge Niedbalski description [Description] isc-dhcp-server uses libisc-export (coming from bin9-libs package) for handling the socket event(s) when configured in peer mode (master/secondary). It's possible that a sequence of messages dispatched by the master that requires acknowledgment from its peers holds a socket in a pending to send state, a timer or a subsequent write request can be scheduled into this socket and the !sock->pending_send assertion will be raised when trying to write again at the time data hasn't been flushed entirely and the pending_send flag hasn't been reset to 0 state. If this race condition happens, the following stacktrace will be hit: (gdb) info threads Id Target Id Frame * 1 Thread 0x7fb4ddecb700 (LWP 3170) __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 2 Thread 0x7fb4dd6ca700 (LWP 3171) __lll_lock_wait (futex=futex@entry=0x7fb4de6d2028, private=0) at lowlevellock.c:52 3 Thread 0x7fb4de6cc700 (LWP 3169) futex_wake (private=<optimized out>, processes_to_wake=1, futex_word=<optimized out>) at ../sysdeps/nptl/futex-internal.h:364 4 Thread 0x7fb4de74f740 (LWP 3148) futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fb4de6cd0d0) at ../sysdeps/nptl/futex-internal.h:183 (gdb) frame 2 #2 0x00007fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 (gdb) bt #1 0x00007fb4deaa7859 in __GI_abort () at abort.c:79 #2 0x00007fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 #3 0x00007fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 #4 process_fd (writeable=<optimized out>, readable=<optimized out>, fd=11, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4054 #5 process_fds (writefds=<optimized out>, readfds=0x7fb4de6d1090, maxfd=13, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4211 #6 watcher (uap=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4397 #7 0x00007fb4dea68609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #8 0x00007fb4deba4103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) frame 3 #3 0x00007fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 4041 in ../../../../lib/isc/unix/socket.c (gdb) p sock->pending_send $2 = 1 [TEST CASE] 1) Install isc-dhcp-server in 2 focal machine(s). 2) Configure peer/cluster mode as follows: Primary configuration: https://pastebin.ubuntu.com/p/XYj648MghK/ Secondary configuration: https://pastebin.ubuntu.com/p/PYkcshZCWK/ 2) Run dhcpd as follows in both machine(s) # dhcpd -f -d -4 -cf /etc/dhcp/dhcpd.conf --no-pid ens4 3) Leave the cluster running for a long (2h) period until the crash/race condition is reproduced. [REGRESSION POTENTIAL] * The fix will prevent the assertion to happen in the dispatch_send path, later versions of isch-dhcp upstream lack this logic and entirely removed the existence of this flag. Therefore, removing the need for this assertion at process_fd shouldn't be problematic. [Description] isc-dhcp-server uses libisc-export (coming from bind9-libs package) for handling the socket event(s) when configured in peer mode (master/secondary). It's possible that a sequence of messages dispatched by the master that requires acknowledgment from its peers holds a socket in a pending to send state, a timer or a subsequent write request can be scheduled into this socket and the !sock->pending_send assertion will be raised when trying to write again at the time data hasn't been flushed entirely and the pending_send flag hasn't been reset to 0 state. If this race condition happens, the following stacktrace will be hit: (gdb) info threads   Id Target Id Frame * 1 Thread 0x7fb4ddecb700 (LWP 3170) __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50   2 Thread 0x7fb4dd6ca700 (LWP 3171) __lll_lock_wait (futex=futex@entry=0x7fb4de6d2028, private=0) at lowlevellock.c:52   3 Thread 0x7fb4de6cc700 (LWP 3169) futex_wake (private=<optimized out>, processes_to_wake=1, futex_word=<optimized out>) at ../sysdeps/nptl/futex-internal.h:364   4 Thread 0x7fb4de74f740 (LWP 3148) futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fb4de6cd0d0) at ../sysdeps/nptl/futex-internal.h:183 (gdb) frame 2 #2 0x00007fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist,     cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 (gdb) bt #1 0x00007fb4deaa7859 in __GI_abort () at abort.c:79 #2 0x00007fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist,     cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 #3 0x00007fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 #4 process_fd (writeable=<optimized out>, readable=<optimized out>, fd=11, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4054 #5 process_fds (writefds=<optimized out>, readfds=0x7fb4de6d1090, maxfd=13, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4211 #6 watcher (uap=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4397 #7 0x00007fb4dea68609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #8 0x00007fb4deba4103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) frame 3 #3 0x00007fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 4041 in ../../../../lib/isc/unix/socket.c (gdb) p sock->pending_send $2 = 1 [TEST CASE] 1) Install isc-dhcp-server in 2 focal machine(s). 2) Configure peer/cluster mode as follows:    Primary configuration: https://pastebin.ubuntu.com/p/XYj648MghK/    Secondary configuration: https://pastebin.ubuntu.com/p/PYkcshZCWK/ 2) Run dhcpd as follows in both machine(s) # dhcpd -f -d -4 -cf /etc/dhcp/dhcpd.conf --no-pid ens4 3) Leave the cluster running for a long (2h) period until the crash/race condition is reproduced. [REGRESSION POTENTIAL] * The fix will prevent the assertion to happen in the dispatch_send path, later versions of isch-dhcp upstream lack this logic and entirely removed the existence of this flag. Therefore, removing the need for this assertion at process_fd shouldn't be problematic.
2020-08-11 19:49:26 Jorge Niedbalski attachment added lp-1872118-groovy.debdiff https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1872118/+attachment/5400760/+files/lp-1872118-groovy.debdiff
2020-08-11 19:49:46 Jorge Niedbalski attachment added lp-1872118-focal.debdiff https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1872118/+attachment/5400761/+files/lp-1872118-focal.debdiff
2020-08-11 20:27:56 Ubuntu Foundations Team Bug Bot tags focal rls-ff-incoming focal patch rls-ff-incoming
2020-08-11 20:28:03 Ubuntu Foundations Team Bug Bot bug added subscriber Ubuntu Sponsors Team
2020-08-12 16:50:02 Gianfranco Costamagna isc-dhcp (Ubuntu Focal): status In Progress Invalid
2020-08-12 16:50:05 Gianfranco Costamagna isc-dhcp (Ubuntu Groovy): status In Progress Incomplete
2020-08-12 16:50:09 Gianfranco Costamagna isc-dhcp (Ubuntu Groovy): status Incomplete Invalid
2020-08-12 22:39:13 Launchpad Janitor bind9-libs (Ubuntu Groovy): status In Progress Fix Released
2020-08-13 11:34:57 Gianfranco Costamagna bind9-libs (Ubuntu Focal): importance Undecided High
2020-08-14 09:57:27 Timo Aaltonen bind9-libs (Ubuntu Focal): status In Progress Fix Committed
2020-08-14 09:57:30 Timo Aaltonen bug added subscriber Ubuntu Stable Release Updates Team
2020-08-14 09:57:32 Timo Aaltonen bug added subscriber SRU Verification
2020-08-14 09:57:38 Timo Aaltonen tags focal patch rls-ff-incoming focal patch rls-ff-incoming verification-needed verification-needed-focal
2020-08-17 00:02:07 Richard Laager tags focal patch rls-ff-incoming verification-needed verification-needed-focal focal patch rls-ff-incoming verification-done verification-done-focal
2020-08-17 20:28:09 Jorge Niedbalski bug task deleted isc-dhcp (Ubuntu)
2020-08-17 20:28:16 Jorge Niedbalski bug task deleted isc-dhcp (Ubuntu Focal)
2020-08-17 20:28:21 Jorge Niedbalski bug task deleted isc-dhcp (Ubuntu Groovy)
2020-08-17 20:29:50 Jorge Niedbalski bind9-libs (Ubuntu Groovy): importance Undecided High
2020-08-18 09:33:57 ghyster bug added subscriber ghyster
2020-08-21 20:52:57 Mark Foster (ExtraHop) bug added subscriber Mark Foster (ExtraHop)
2020-08-25 18:19:20 Launchpad Janitor bind9-libs (Ubuntu Focal): status Fix Committed Fix Released
2020-08-25 18:19:26 Brian Murray removed subscriber Ubuntu Stable Release Updates Team
2020-08-25 19:03:38 Andrew Benson removed subscriber Andrew Benson
2020-10-27 18:12:17 Brian Murray description [Description] isc-dhcp-server uses libisc-export (coming from bind9-libs package) for handling the socket event(s) when configured in peer mode (master/secondary). It's possible that a sequence of messages dispatched by the master that requires acknowledgment from its peers holds a socket in a pending to send state, a timer or a subsequent write request can be scheduled into this socket and the !sock->pending_send assertion will be raised when trying to write again at the time data hasn't been flushed entirely and the pending_send flag hasn't been reset to 0 state. If this race condition happens, the following stacktrace will be hit: (gdb) info threads   Id Target Id Frame * 1 Thread 0x7fb4ddecb700 (LWP 3170) __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50   2 Thread 0x7fb4dd6ca700 (LWP 3171) __lll_lock_wait (futex=futex@entry=0x7fb4de6d2028, private=0) at lowlevellock.c:52   3 Thread 0x7fb4de6cc700 (LWP 3169) futex_wake (private=<optimized out>, processes_to_wake=1, futex_word=<optimized out>) at ../sysdeps/nptl/futex-internal.h:364   4 Thread 0x7fb4de74f740 (LWP 3148) futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fb4de6cd0d0) at ../sysdeps/nptl/futex-internal.h:183 (gdb) frame 2 #2 0x00007fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist,     cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 (gdb) bt #1 0x00007fb4deaa7859 in __GI_abort () at abort.c:79 #2 0x00007fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist,     cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 #3 0x00007fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 #4 process_fd (writeable=<optimized out>, readable=<optimized out>, fd=11, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4054 #5 process_fds (writefds=<optimized out>, readfds=0x7fb4de6d1090, maxfd=13, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4211 #6 watcher (uap=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4397 #7 0x00007fb4dea68609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #8 0x00007fb4deba4103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) frame 3 #3 0x00007fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 4041 in ../../../../lib/isc/unix/socket.c (gdb) p sock->pending_send $2 = 1 [TEST CASE] 1) Install isc-dhcp-server in 2 focal machine(s). 2) Configure peer/cluster mode as follows:    Primary configuration: https://pastebin.ubuntu.com/p/XYj648MghK/    Secondary configuration: https://pastebin.ubuntu.com/p/PYkcshZCWK/ 2) Run dhcpd as follows in both machine(s) # dhcpd -f -d -4 -cf /etc/dhcp/dhcpd.conf --no-pid ens4 3) Leave the cluster running for a long (2h) period until the crash/race condition is reproduced. [REGRESSION POTENTIAL] * The fix will prevent the assertion to happen in the dispatch_send path, later versions of isch-dhcp upstream lack this logic and entirely removed the existence of this flag. Therefore, removing the need for this assertion at process_fd shouldn't be problematic. [Description] isc-dhcp-server uses libisc-export1105 (coming from bind9-libs package) for handling the socket event(s) when configured in peer mode (master/secondary). It's possible that a sequence of messages dispatched by the master that requires acknowledgment from its peers holds a socket in a pending to send state, a timer or a subsequent write request can be scheduled into this socket and the !sock->pending_send assertion will be raised when trying to write again at the time data hasn't been flushed entirely and the pending_send flag hasn't been reset to 0 state. If this race condition happens, the following stacktrace will be hit: (gdb) info threads   Id Target Id Frame * 1 Thread 0x7fb4ddecb700 (LWP 3170) __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50   2 Thread 0x7fb4dd6ca700 (LWP 3171) __lll_lock_wait (futex=futex@entry=0x7fb4de6d2028, private=0) at lowlevellock.c:52   3 Thread 0x7fb4de6cc700 (LWP 3169) futex_wake (private=<optimized out>, processes_to_wake=1, futex_word=<optimized out>) at ../sysdeps/nptl/futex-internal.h:364   4 Thread 0x7fb4de74f740 (LWP 3148) futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fb4de6cd0d0) at ../sysdeps/nptl/futex-internal.h:183 (gdb) frame 2 #2 0x00007fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist,     cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 (gdb) bt #1 0x00007fb4deaa7859 in __GI_abort () at abort.c:79 #2 0x00007fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist,     cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 #3 0x00007fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 #4 process_fd (writeable=<optimized out>, readable=<optimized out>, fd=11, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4054 #5 process_fds (writefds=<optimized out>, readfds=0x7fb4de6d1090, maxfd=13, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4211 #6 watcher (uap=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4397 #7 0x00007fb4dea68609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #8 0x00007fb4deba4103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) frame 3 #3 0x00007fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 4041 in ../../../../lib/isc/unix/socket.c (gdb) p sock->pending_send $2 = 1 [TEST CASE] 1) Install isc-dhcp-server in 2 focal machine(s). 2) Configure peer/cluster mode as follows:    Primary configuration: https://pastebin.ubuntu.com/p/XYj648MghK/    Secondary configuration: https://pastebin.ubuntu.com/p/PYkcshZCWK/ 2) Run dhcpd as follows in both machine(s) # dhcpd -f -d -4 -cf /etc/dhcp/dhcpd.conf --no-pid ens4 3) Leave the cluster running for a long (2h) period until the crash/race condition is reproduced. [REGRESSION POTENTIAL] * The fix will prevent the assertion to happen in the dispatch_send path, later versions of isch-dhcp upstream lack this logic and entirely removed the existence of this flag. Therefore, removing the need for this assertion at process_fd shouldn't be problematic.
2021-07-28 22:54:06 Launchpad Janitor merge proposal linked https://code.launchpad.net/~sergiodj/ubuntu/+source/bind9-libs/+git/bind9-libs/+merge/406332