linux - x64 - glibc scanf Segmentation faults when called from a function that doesn't align RSP
x86 calling convention arguments (1)
When compiling below code:
global main extern printf, scanf section .data msg: db "Enter a number: ",10,0 format:db "%d",0 section .bss number resb 4 section .text main: mov rdi, msg mov al, 0 call printf mov rsi, number mov rdi, format mov al, 0 call scanf mov rdi,format mov rsi,[number] inc rsi mov rax,0 call printf ret
nasm -f elf64 example.asm -o example.o gcc -no-pie -m64 example.o -o example
and then run
it runs, print: enter a number: but then crashes and prints: Segmentation fault (core dumped)
So printf works fine but scanf not. What am I doing wrong with scanf so?
sub rsp, 8
add rsp, 8
at the start/end of your function
to re-align the stack to 16 bytes before your function does a
Or better push/pop a dummy register, e.g.
, or save/restore a call-preserved register like RBP.
On function entry, RSP is 8 bytes away from 16-byte alignment because the
pushed an 8-byte return address. See
Printing floating point numbers from x86-64 seems to require %rbp to be saved
main and stack alignment
Calling printf in x86_64 using GNU assembler
. This is an ABI requirement which you used to be able to get away with violating when there weren't any FP args for printf. But not any more.
gcc's code-gen for glibc scanf now depends on 16-byte stack alignment even when
AL == 0
It seems to have auto-vectorized copying 16 bytes somewhere in
, which regular
calls after spilling its register args to the stack
. (The many similar ways to call scanf share one big implementation as a back end to the various libc entry points like
I downloaded Ubuntu 18.04's libc6 binary package:
and extracted the files (with
7z x blah.deb
tar xf data.tar
, because 7z knows how to extract a lot of file formats).
I can repro your bug with
, and also it turns out with the system glibc 2.27-3 on my Arch Linux desktop.
With GDB, I ran it on your program and did
set env LD_LIBRARY_PATH /tmp/bionic-libc/lib/x86_64-linux-gnu
, the disassembly window looks like this at the point where it received SIGSEGV:
│0x7ffff786b49a <_IO_vfscanf+602> cmp r12b,0x25 │ │0x7ffff786b49e <_IO_vfscanf+606> jne 0x7ffff786b3ff <_IO_vfscanf+447> │ │0x7ffff786b4a4 <_IO_vfscanf+612> mov rax,QWORD PTR [rbp-0x460] │ │0x7ffff786b4ab <_IO_vfscanf+619> add rax,QWORD PTR [rbp-0x458] │ │0x7ffff786b4b2 <_IO_vfscanf+626> movq xmm0,QWORD PTR [rbp-0x460] │ │0x7ffff786b4ba <_IO_vfscanf+634> mov DWORD PTR [rbp-0x678],0x0 │ │0x7ffff786b4c4 <_IO_vfscanf+644> mov QWORD PTR [rbp-0x608],rax │ │0x7ffff786b4cb <_IO_vfscanf+651> movzx eax,BYTE PTR [rbx+0x1] │ │0x7ffff786b4cf <_IO_vfscanf+655> movhps xmm0,QWORD PTR [rbp-0x608] │ >│0x7ffff786b4d6 <_IO_vfscanf+662> movaps XMMWORD PTR [rbp-0x470],xmm0 │
So it copied two 8-byte objects to the stack with
to load and
to store. But with the stack misaligned,
I didn't grab a debug build to find out exactly which part of the C source turned into this, but the function is written in C and compiled by GCC with optimization enabled. GCC has always been allowed to do this, but only recently did it get smart enough to take better advantage of SSE2 this way.
Footnote 1: printf / scanf with
AL != 0
has always required 16-byte alignment because gcc's code-gen for variadic functions uses test al,al / je to spill the full 16-byte XMM regs xmm0..7 with aligned stores in that case.
can be an argument to a variadic function, not just
, and gcc doesn't check whether the function ever actually reads any 16-byte FP args.