Signals and Solaris
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
SBCL |
Triaged
|
Medium
|
Unassigned |
Bug Description
Solaris (both SPARC and x86) can present signal handlers with a
wrong ucontext_t structure when receiving more than one signal at a
time. The correct structure can be found by following
ucontext->uc_link.
This presents particular problems on SPARC, because it uses a trap to
signal that it needs to call a C allocation routine, and it uses the
preceding OR instruction to encode the arguments. Running sb-sprof and
consing at the same time is bound to make the allocation trap handler
to receive bad ucontext.
The C test in http://
I was able to get sb-sprof running by following uc_link and comparing
the value of PC register with siginfo->si_addr, if it matches, that's
the current context.
But this is extremely fishy, I haven't found any documentation saying
what is the correct thing to do. And some signal handler receive NULL
as siginfo, so there's nothing to compare against.
Potentially, any usage of context on Solaris is susceptible to this
problem (or even on other OSes).
Attached is the patch which make sb-sprof work.
Hello!
I work on the illumos project, the open source continuation of OpenSolaris.
I've had a look at your test program, and read through your description of the
signal handling behaviour that you're seeing.
I think there are a few things going on here, so I'll try and lay out a few
suggestions and ask a few questions. I've put some links to our online manual
pages at the end.
1. I'm not sure under what conditions you'll receive a NULL siginfo, but
the sigaction(2) manual page definitely suggests that it might be NULL
sometimes. Do you recall which specific signals (e.g., SIGCHLD, etc)
you were handling when siginfo was NULL?
2. As you've noted, when multiple signals arrive at around the same time,
their delivery may overlap. When signals overlap, we do not always
completely unwind the signal handling machinery in libc before
delivering subsequent signals. In these cases, the context object
may refer to the state of the signal delivery parts of libc which
were interrupted by the nested signal, rather than to the part of
your main program that was interrupted.
In ucontext.h(3HEAD), the "uc_link" member from the context object is
described as follows:
The uc_link member is a pointer to the context that to be resumed
when this context returns. If uc_link is equal to 0, this context
is the main context and the process exits when this context returns.
It sounds like when handling SIGPROF, you're interested in that "main
context", rather than in any of the signal handling code. This context
represents the state we preserved when taking the first of the coincident
signals, and in order to make sure you find it you always need to walk up
the context chain until "uc_link" is NULL.
3. As noted in siginfo.h(3HEAD), "si_addr" is populated with the address
of the faulting instruction for SIGILL signals. That's not true of other
signals, though; e.g., it isn't true for SIGPROF and setitimer(2).
For SPARC systems where you are using an undefined instruction to
generate a trap for allocation, using "si_addr" when handling SIGILL is
definitely the right way to find the instruction in question. If nested
signal delivery has occurred, the context object you get in your SIGILL
handler will not match up with the "si_addr" value.
The context in the nested delivery case will restore execution to the
previously running signal handler, rather than to your main program.
If you only need the program counter value, use "si_addr". If you
need the rest of the context to correctly handle the trap, you'll have
to walk the "uc_link" chain out to find it. There are two options:
- Walk up to a context where the program counter matches "si_addr".
- Walk up to the main context, where "uc_link" is NULL.
The option that is most correct will depend on the structure of your
program; e.g., do you expect to generate a SIGILL in any of the
signal handlers, or just in the main program?
4. To reiterate, the context that signal handlers receive is only
guaranteed to be right for one purpose: the restoration of execution
state from before ...