c stack?) arm - GDB corrupted stack frame - How to debug?

3 Answers

Those bogus adresses (0x00000002 and the like) are actually PC values, not SP values. Now, when you get this kind of SEGV, with a bogus (very small) PC address, 99% of the time it's due to calling through a bogus function pointer. Note that virtual calls in C++ are implemented via function pointers, so any problem with a virtual call can manifest in the same way.

An indirect call instruction just pushes the PC after the call onto the stack and then sets the PC to the target value (bogus in this case), so if this is what happened, you can easily undo it by manually popping the PC off the stack. In 32-bit x86 code you just do:

(gdb) set $pc = *(void **)$esp
(gdb) set $esp = $esp + 4

With 64-bit x86 code you need

(gdb) set $pc = *(void **)$rsp
(gdb) set $rsp = $rsp + 8

Then, you should be able to do a bt and figure out where the code really is.

The other 1% of the time, the error will be due to overwriting the stack, usually by overflowing an array stored on the stack. In this case, you might be able to get more clarity on the situation by using a tool like valgrind

info no

I have the following stack trace. Is it possible to make out anything useful from this for debugging?

Program received signal SIGSEGV, Segmentation fault.
0x00000002 in ?? ()
(gdb) bt
#0  0x00000002 in ?? ()
#1  0x00000001 in ?? ()
#2  0xbffff284 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Where to start looking at the code when we get a Segmentation fault, and the stack trace is not so useful?

NOTE: If I post the code, then the SO experts will give me the answer. I want to take the guidance from SO and find the answer myself, so I'm not posting the code here. Apologies.

Assuming that the stack pointer is valid...

It may be impossible to know exactly where the SEGV occurs from the backtrace -- I think the first two stack frames are completely overwritten. 0xbffff284 seems like a valid address, but the next two aren't. For a closer look at the stack, you can try the following:

gdb$ x/32ga $rsp

or a variant (replace the 32 with another number). That will print out some number of words (32) starting from the stack pointer of giant (g) size, formatted as addresses (a). Type 'help x' for more info on format.

Instrumenting your code with some sentinel 'printf''s may not be a bad idea, in this case.

If it's a stack overwrite, the values may well correspond to something recognisable from the program.

For example, I just found myself looking at the stack

(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x000000000000342d in ?? ()
#2  0x0000000000000000 in ?? ()

and 0x342d is 13357, which turned out to be a node-id when I grepped the application logs for it. That immediately helped narrow down candidate sites where the stack overwrite might have occurred.