[Bug 1872118] Re: DHCP Cluster crashes after a few hours
Jorge Niedbalski
1872118 at bugs.launchpad.net
Tue Aug 4 03:01:05 UTC 2020
Hello,
I checked the backtrace of a crashed dhcpd running on 4.4.1-2.1ubuntu5.
(gdb) info threads
Id Target Id Frame
* 1 Thread 0x7fb4ddecb700 (LWP 3170) __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
2 Thread 0x7fb4dd6ca700 (LWP 3171) __lll_lock_wait (futex=futex at entry=0x7fb4de6d2028, private=0) at lowlevellock.c:52
3 Thread 0x7fb4de6cc700 (LWP 3169) futex_wake (private=<optimized out>, processes_to_wake=1, futex_word=<optimized out>) at ../sysdeps/nptl/futex-internal.h:364
4 Thread 0x7fb4de74f740 (LWP 3148) futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fb4de6cd0d0) at ../sysdeps/nptl/futex-internal.h:183
(gdb) frame 2
#2 0x00007fb4dec85985 in isc_assertion_failed (file=file at entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line at entry=3361, type=type at entry=isc_assertiontype_insist,
cond=cond at entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52
(gdb) bt
#1 0x00007fb4deaa7859 in __GI_abort () at abort.c:79
#2 0x00007fb4dec85985 in isc_assertion_failed (file=file at entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line at entry=3361, type=type at entry=isc_assertiontype_insist,
cond=cond at entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52
#3 0x00007fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041
#4 process_fd (writeable=<optimized out>, readable=<optimized out>, fd=11, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4054
#5 process_fds (writefds=<optimized out>, readfds=0x7fb4de6d1090, maxfd=13, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4211
#6 watcher (uap=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4397
#7 0x00007fb4dea68609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#8 0x00007fb4deba4103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) frame 3
#3 0x00007fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041
4041 in ../../../../lib/isc/unix/socket.c
(gdb) p sock->pending_send
$2 = 1
The code is crashing on this assertion: https://gitlab.isc.org/isc-projects/bind9/-/blob/v9_11_3/lib/isc/unix/socket.c#L3364
This was already reported and marked as fixed in debian (?) via [0]
""Now if a wakeup event occurres the socket would be dispatched for
processing regardless which kind of event (timer?) triggered the wakeup.
At least I did not find any sanity checks in process_fds() except
SOCK_DEAD(sock).
This leads to the following situation: The sock is not dead yet but it
is still pending when it is dispatched again.
I would now check sock->pending_send before calling dispatch_send().This
would at least prevent the assertion failure - well knowing that the
situation described above ( not dead but still pending and alerting ) is
not a very pleasant one - until someone comes up with a better solution.
"""
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=430065#20
** Follow up questions:
0) The reproducer doesn't seems consistent and seems to be related to a race
condition associated with a internal timer/futex.
1) Can anyone confirm that a pristine upstream 4.4.1 doesn't reproduces the issue?
** Bug watch added: Debian Bug tracker #430065
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=430065
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to isc-dhcp in Ubuntu.
https://bugs.launchpad.net/bugs/1872118
Title:
DHCP Cluster crashes after a few hours
Status in DHCP:
New
Status in bind9-libs package in Ubuntu:
New
Status in isc-dhcp package in Ubuntu:
Confirmed
Status in bind9-libs source package in Focal:
New
Status in isc-dhcp source package in Focal:
New
Status in bind9-libs source package in Groovy:
New
Status in isc-dhcp source package in Groovy:
Confirmed
Bug description:
I have a pair of DHCP serevrs running in a cluster on ubuntu 20.04, All worked perfectly until recently, when they started stopping with code=killed, status=6/ABRT.
This is being fixed by
https://bugs.launchpad.net/bugs/1870729
However now one stops after a few hours with the following errors. One
can stay on line but not both.
Syslog shows
Apr 10 17:20:15 dhcp-primary sh[6828]: ../../../../lib/isc/unix/socket.c:3361: INSIST(!sock->pending_send) failed, back trace
Apr 10 17:20:15 dhcp-primary sh[6828]: #0 0x7fbe78702a4a in ??
Apr 10 17:20:15 dhcp-primary sh[6828]: #1 0x7fbe78702980 in ??
Apr 10 17:20:15 dhcp-primary sh[6828]: #2 0x7fbe7873e7e1 in ??
Apr 10 17:20:15 dhcp-primary sh[6828]: #3 0x7fbe784e5609 in ??
Apr 10 17:20:15 dhcp-primary sh[6828]: #4 0x7fbe78621103 in ??
nothing in kern.log
apport.log shows
ERROR: apport (pid 6850) Fri Apr 10 17:20:15 2020: called for pid 6828, signal 6, core limit 0, dump mode 2
ERROR: apport (pid 6850) Fri Apr 10 17:20:15 2020: not creating core for pid with dump mode of 2
ERROR: apport (pid 6850) Fri Apr 10 17:20:15 2020: executable: /usr/sbin/dhcpd (command line "dhcpd -user dhcpd -group dhcpd -f -4 -pf /run/dhcp-server/dhcpd.pid -cf /etc/dhcp/dhcpd.conf")
ERROR: apport (pid 6850) Fri Apr 10 17:20:15 2020: is_closing_session(): no DBUS_SESSION_BUS_ADDRESS in environment
ERROR: apport (pid 6850) Fri Apr 10 17:20:15 2020: wrote report /var/crash/_usr_sbin_dhcpd.0.crash
/var/crash/_usr_sbin_dhcpd.0.crash shows
ProblemType: Crash
Architecture: amd64
CrashCounter: 1
Date: Fri Apr 10 17:20:15 2020
DistroRelease: Ubuntu 20.04
ExecutablePath: /usr/sbin/dhcpd
ExecutableTimestamp: 1586210315
ProcCmdline: dhcpd -user dhcpd -group dhcpd -f -4 -pf /run/dhcp-server/dhcpd.pid -cf /etc/dhcp/dhcpd.conf
ProcEnviron: Error: [Errno 13] Permission denied: 'environ'
ProcMaps: Error: [Errno 13] Permission denied: 'maps'
ProcStatus:
Name: dhcpd
Umask: 0022
State: D (disk sleep)
Tgid: 6828
Ngid: 0
Pid: 6828
PPid: 1
TracerPid: 0
Uid: 113 113 113 113
Gid: 118 118 118 118
FDSize: 128
Groups:
NStgid: 6828
NSpid: 6828
NSpgid: 6828
NSsid: 6828
VmPeak: 236244 kB
VmSize: 170764 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 12064 kB
VmRSS: 12064 kB
RssAnon: 5940 kB
RssFile: 6124 kB
RssShmem: 0 kB
VmData: 30792 kB
VmStk: 132 kB
VmExe: 592 kB
VmLib: 5424 kB
VmPTE: 76 kB
VmSwap: 0 kB
HugetlbPages: 0 kB
CoreDumping: 1
THP_enabled: 1
Threads: 4
SigQ: 0/7609
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000001000
SigCgt: 0000000180000000
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
NoNewPrivs: 0
Seccomp: 0
Speculation_Store_Bypass: thread vulnerable
Cpus_allowed: 3
Cpus_allowed_list: 0-1
Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0000
0000,00000000,00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 111
nonvoluntary_ctxt_switches: 144
Signal: 6
Uname: Linux 5.4.0-21-generic x86_64
UserGroups:
To manage notifications about this bug go to:
https://bugs.launchpad.net/dhcp/+bug/1872118/+subscriptions
More information about the foundations-bugs
mailing list