After investigating this, the root cause of the timeouts is caused by GDB using the wrong type of breakpoint (between a 4 bytes ARM breakpoint and a 2 bytes thumb breakpoint), which causes some unexpected results.
The reason why this is happening is a bit more complex though. GDB has a couple mechanisms for tracking loading/unloading of shared libraries in dynamically-linked binaries. Via _dl_debug_state and r_brk and via stap probes.
Up until Ubuntu 18.04 (glibc 2.27), GDB could not use the stap probes mechanism because it ran into a bug when parsing stap expression, thus failing the check and falling back to using the old _dl_debug_state and r_brk mechanism.
The _dl_debug_state/r_brk mechanism works because we have an entry for _dl_debug_state in the .dynsym section of ld.so. Even though ld.so is completely stripped of mapping symbols (another way to tell arm/thumb modes apart), which are only available via the debug symbols file, GDB can still tell _dl_debug_state is arm or thumb mode because the ELF symbol carries a flag indicating so. That's why this fallback mechanism works.
On Ubuntu 20.04, running glibc 2.31, GDB no longer runs into problems with stap probes. Thus GDB decides to use this mechanism instead of the old _dl_debug_state/r_brk one.
Both mechanisms function by having GDB insert breakpoints at specific location so shared library events can be tracked. But in the stap probes case there are no real symbols.
What we have is metadata that contains the name of the probe and its address. This address falls within a particular function. For example, init_start and init_complete are probe points that fall within dl_main. The probe points do not seem to carry any information about whether we have arm or thumb mode.
As before, the mapping symbols should tell us what the mode is, but ld.so is stripped and doesn't carry those. But GDB could look at the ELF symbol of the function the probe is sitting at, except that these symbols (not considered special in any way) have been stripped as well. So the arm/thumb information is completely gone and GDB can no longer make the correct decision.
So GDB defaults to assuming arm mode for the breakpoint to use, which is obviously wrong for thumb code.
There are two possible solutions:
1 - Fallback to using _dl_debug_state/r_brk for armhf in GDB. This is considered bad by GDB's maintainers, because it means using an outdated mechanism instead of better interfaces.
2 - Don't strip glibc/ld.so function symbols that have stap probes installed in them.
Right now, these are the functions that contain probes and that GDB wants to breakpoint in a special way:
_dl_main, _dl_map_object_from_fd, lose, dl_open_worker and _dl_close_worker
After investigating this, the root cause of the timeouts is caused by GDB using the wrong type of breakpoint (between a 4 bytes ARM breakpoint and a 2 bytes thumb breakpoint), which causes some unexpected results.
The reason why this is happening is a bit more complex though. GDB has a couple mechanisms for tracking loading/unloading of shared libraries in dynamically-linked binaries. Via _dl_debug_state and r_brk and via stap probes.
Up until Ubuntu 18.04 (glibc 2.27), GDB could not use the stap probes mechanism because it ran into a bug when parsing stap expression, thus failing the check and falling back to using the old _dl_debug_state and r_brk mechanism.
The _dl_debug_ state/r_ brk mechanism works because we have an entry for _dl_debug_state in the .dynsym section of ld.so. Even though ld.so is completely stripped of mapping symbols (another way to tell arm/thumb modes apart), which are only available via the debug symbols file, GDB can still tell _dl_debug_state is arm or thumb mode because the ELF symbol carries a flag indicating so. That's why this fallback mechanism works.
On Ubuntu 20.04, running glibc 2.31, GDB no longer runs into problems with stap probes. Thus GDB decides to use this mechanism instead of the old _dl_debug_ state/r_ brk one.
Both mechanisms function by having GDB insert breakpoints at specific location so shared library events can be tracked. But in the stap probes case there are no real symbols.
What we have is metadata that contains the name of the probe and its address. This address falls within a particular function. For example, init_start and init_complete are probe points that fall within dl_main. The probe points do not seem to carry any information about whether we have arm or thumb mode.
As before, the mapping symbols should tell us what the mode is, but ld.so is stripped and doesn't carry those. But GDB could look at the ELF symbol of the function the probe is sitting at, except that these symbols (not considered special in any way) have been stripped as well. So the arm/thumb information is completely gone and GDB can no longer make the correct decision.
So GDB defaults to assuming arm mode for the breakpoint to use, which is obviously wrong for thumb code.
There are two possible solutions:
1 - Fallback to using _dl_debug_ state/r_ brk for armhf in GDB. This is considered bad by GDB's maintainers, because it means using an outdated mechanism instead of better interfaces.
2 - Don't strip glibc/ld.so function symbols that have stap probes installed in them.
Right now, these are the functions that contain probes and that GDB wants to breakpoint in a special way:
_dl_main, _dl_map_ object_ from_fd, lose, dl_open_worker and _dl_close_worker