slapd enter in infinite loop on sched_yield syscall
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
openldap (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Bionic |
Incomplete
|
Medium
|
Unassigned |
Bug Description
On a production server, sometimes slapd become unbresponsive, some threads loops in sched_yield syscall and consumme all CPU.
To recover, slapd needs to restart.
No related information is reported in log file.
All same issues in OpenLDAP upstream project are old and fixed.
So maybe this issue affects only Ubuntu package.
It occurs randomly, so I have no steps to reproduce.
OS : Bionic
Openldap version:
libldap-2.4-2:amd64 2.4.45+
libldap-common 2.4.45+
slapd 2.4.45+
Modules loaded:
olcModuleLoad: {0}back_bdb
olcModuleLoad: {1}syncprov
olcModuleLoad: {2}back_monitor
olcModuleLoad: {3}memberof.la
olcModuleLoad: {4}refint.la
olcModuleLoad: {5}rwm
olcModuleload: {6}back_ldap
Backend is BDB. slapd run in (single) master - (multi) slave mode.
Changed in openldap (Ubuntu): | |
status: | Incomplete → New |
information type: | Public → Public Security |
Changed in openldap (Ubuntu Bionic): | |
status: | New → Triaged |
importance: | Undecided → High |
Thank you for taking the time to file a bug report.
This one looks like a rabbit hole :-(. I've also found many (very) old reports of similar problems, but they all appear to have been fixed a while ago (before Bionic was released). I even found a possible patch (from 2005) to fix the issue, and was able to determine that Bionic's openldap already carries an improved version of the patch (unsurprisingly). I've also found an old Launchpad bug (#15270) and the related Debian bug (https:/ /bugs.debian. org/cgi- bin/bugreport. cgi?bug= 255276) that reports the same problem as you, and is marked having been fixed in Debian (also back in 2005).
I am a bit surprised that you're experiencing this problem on Bionic. I understand that it is hard to provide steps for reproducing this problem, but I would like to ask you to provide as much information as you can, please. For example:
- Your full openldap configuration (please remove any confidential bits, of course).
- Any log messages from slapd or related services.
- If you can, please install the debug symbols for openldap/slapd and run "gdb -p $PROCESS_PID" (where "$PROCESS_PID" is slapd's PID), then run a "bt" command and attach the output to this bug.
- More information about what is going on in the system when the problem happens. For example, I've read that this might happen when the system load is high; do you notice that as well?
Meanwhile, I will mark this bug as Incomplete. Feel free to revert its status back to New once you provide more info. Thanks!