I work on the illumos project, the open source continuation of OpenSolaris.
I've had a look at your test program, and read through your description of the
signal handling behaviour that you're seeing.
I think there are a few things going on here, so I'll try and lay out a few
suggestions and ask a few questions. I've put some links to our online manual
pages at the end.
1. I'm not sure under what conditions you'll receive a NULL siginfo, but
the sigaction(2) manual page definitely suggests that it might be NULL
sometimes. Do you recall which specific signals (e.g., SIGCHLD, etc)
you were handling when siginfo was NULL?
2. As you've noted, when multiple signals arrive at around the same time,
their delivery may overlap. When signals overlap, we do not always
completely unwind the signal handling machinery in libc before
delivering subsequent signals. In these cases, the context object
may refer to the state of the signal delivery parts of libc which
were interrupted by the nested signal, rather than to the part of
your main program that was interrupted.
In ucontext.h(3HEAD), the "uc_link" member from the context object is
described as follows:
The uc_link member is a pointer to the context that to be resumed
when this context returns. If uc_link is equal to 0, this context
is the main context and the process exits when this context returns.
It sounds like when handling SIGPROF, you're interested in that "main
context", rather than in any of the signal handling code. This context
represents the state we preserved when taking the first of the coincident
signals, and in order to make sure you find it you always need to walk up
the context chain until "uc_link" is NULL.
3. As noted in siginfo.h(3HEAD), "si_addr" is populated with the address
of the faulting instruction for SIGILL signals. That's not true of other
signals, though; e.g., it isn't true for SIGPROF and setitimer(2).
For SPARC systems where you are using an undefined instruction to
generate a trap for allocation, using "si_addr" when handling SIGILL is
definitely the right way to find the instruction in question. If nested
signal delivery has occurred, the context object you get in your SIGILL
handler will not match up with the "si_addr" value.
The context in the nested delivery case will restore execution to the
previously running signal handler, rather than to your main program.
If you only need the program counter value, use "si_addr". If you
need the rest of the context to correctly handle the trap, you'll have
to walk the "uc_link" chain out to find it. There are two options:
- Walk up to a context where the program counter matches "si_addr".
- Walk up to the main context, where "uc_link" is NULL.
The option that is most correct will depend on the structure of your
program; e.g., do you expect to generate a SIGILL in any of the
signal handlers, or just in the main program?
4. To reiterate, the context that signal handlers receive is only
guaranteed to be right for one purpose: the restoration of execution
state from before we started handling the signal. Any nested
execution state is stored in the context chain (via "uc_link").
If your program needs to reason about the state in the context object,
it also needs to handle the case where the relevant context is not
at the head of the chain.
I think this is true of any vaguely POSIX system, not just illumos.
Depending on the design of the signal machinery on a particular
operating system, software may experience the edge cases more or
less frequently, but portable software probably needs to handle
them all.
If this is unclear, or if I can help in some way, please let me know!
Hello!
I work on the illumos project, the open source continuation of OpenSolaris.
I've had a look at your test program, and read through your description of the
signal handling behaviour that you're seeing.
I think there are a few things going on here, so I'll try and lay out a few
suggestions and ask a few questions. I've put some links to our online manual
pages at the end.
1. I'm not sure under what conditions you'll receive a NULL siginfo, but
the sigaction(2) manual page definitely suggests that it might be NULL
sometimes. Do you recall which specific signals (e.g., SIGCHLD, etc)
you were handling when siginfo was NULL?
2. As you've noted, when multiple signals arrive at around the same time,
their delivery may overlap. When signals overlap, we do not always
completely unwind the signal handling machinery in libc before
delivering subsequent signals. In these cases, the context object
may refer to the state of the signal delivery parts of libc which
were interrupted by the nested signal, rather than to the part of
your main program that was interrupted.
In ucontext.h(3HEAD), the "uc_link" member from the context object is
described as follows:
The uc_link member is a pointer to the context that to be resumed
when this context returns. If uc_link is equal to 0, this context
is the main context and the process exits when this context returns.
It sounds like when handling SIGPROF, you're interested in that "main
context", rather than in any of the signal handling code. This context
represents the state we preserved when taking the first of the coincident
signals, and in order to make sure you find it you always need to walk up
the context chain until "uc_link" is NULL.
3. As noted in siginfo.h(3HEAD), "si_addr" is populated with the address
of the faulting instruction for SIGILL signals. That's not true of other
signals, though; e.g., it isn't true for SIGPROF and setitimer(2).
For SPARC systems where you are using an undefined instruction to
generate a trap for allocation, using "si_addr" when handling SIGILL is
definitely the right way to find the instruction in question. If nested
signal delivery has occurred, the context object you get in your SIGILL
handler will not match up with the "si_addr" value.
The context in the nested delivery case will restore execution to the
previously running signal handler, rather than to your main program.
If you only need the program counter value, use "si_addr". If you
need the rest of the context to correctly handle the trap, you'll have
to walk the "uc_link" chain out to find it. There are two options:
- Walk up to a context where the program counter matches "si_addr".
- Walk up to the main context, where "uc_link" is NULL.
The option that is most correct will depend on the structure of your
program; e.g., do you expect to generate a SIGILL in any of the
signal handlers, or just in the main program?
4. To reiterate, the context that signal handlers receive is only
guaranteed to be right for one purpose: the restoration of execution
state from before we started handling the signal. Any nested
execution state is stored in the context chain (via "uc_link").
If your program needs to reason about the state in the context object,
it also needs to handle the case where the relevant context is not
at the head of the chain.
I think this is true of any vaguely POSIX system, not just illumos.
Depending on the design of the signal machinery on a particular
operating system, software may experience the edge cases more or
less frequently, but portable software probably needs to handle
them all.
If this is unclear, or if I can help in some way, please let me know!
Manual page references:
https:/ /illumos. org/man/ 2/sigaction /illumos. org/man/ 2/setitimer /illumos. org/man/ 3HEAD/siginfo. h /illumos. org/man/ 3HEAD/ucontext. h
https:/
https:/
https:/