linux - x64 - glibc scanf Segmentation faults when called from a function that doesn't align RSP

x86 calling convention arguments (1)

When compiling below code:

global main
extern printf, scanf

section .data
   msg: db "Enter a number: ",10,0
   format:db "%d",0

section .bss
   number resb 4

section .text
   mov rdi, msg
   mov al, 0
   call printf

   mov rsi, number
   mov rdi, format
   mov al, 0
   call scanf

   mov rdi,format
   mov rsi,[number]
   inc rsi
   mov rax,0
   call printf 



nasm -f elf64 example.asm -o example.o
gcc -no-pie -m64 example.o -o example

and then run


it runs, print: enter a number: but then crashes and prints: Segmentation fault (core dumped)

So printf works fine but scanf not. What am I doing wrong with scanf so?

Use sub rsp, 8 / add rsp, 8 at the start/end of your function to re-align the stack to 16 bytes before your function does a call .

Or better push/pop a dummy register, e.g. push rdx / pop rcx , or save/restore a call-preserved register like RBP.

On function entry, RSP is 8 bytes away from 16-byte alignment because the call pushed an 8-byte return address. See Printing floating point numbers from x86-64 seems to require %rbp to be saved , main and stack alignment , and Calling printf in x86_64 using GNU assembler . This is an ABI requirement which you used to be able to get away with violating when there weren't any FP args for printf. But not any more.

gcc's code-gen for glibc scanf now depends on 16-byte stack alignment even when AL == 0 .

It seems to have auto-vectorized copying 16 bytes somewhere in __GI__IO_vfscanf , which regular scanf calls after spilling its register args to the stack 1 . (The many similar ways to call scanf share one big implementation as a back end to the various libc entry points like scanf , fscanf , etc.)

I downloaded Ubuntu 18.04's libc6 binary package: and extracted the files (with 7z x blah.deb and tar xf data.tar , because 7z knows how to extract a lot of file formats).

I can repro your bug with LD_LIBRARY_PATH=/tmp/bionic-libc/lib/x86_64-linux-gnu ./bad-printf , and also it turns out with the system glibc 2.27-3 on my Arch Linux desktop.

With GDB, I ran it on your program and did set env LD_LIBRARY_PATH /tmp/bionic-libc/lib/x86_64-linux-gnu then run . With layout reg , the disassembly window looks like this at the point where it received SIGSEGV:

   │0x7ffff786b49a <_IO_vfscanf+602>        cmp    r12b,0x25                                                                                             │
   │0x7ffff786b49e <_IO_vfscanf+606>        jne    0x7ffff786b3ff <_IO_vfscanf+447>                                                                      │
   │0x7ffff786b4a4 <_IO_vfscanf+612>        mov    rax,QWORD PTR [rbp-0x460]                                                                             │
   │0x7ffff786b4ab <_IO_vfscanf+619>        add    rax,QWORD PTR [rbp-0x458]                                                                             │
   │0x7ffff786b4b2 <_IO_vfscanf+626>        movq   xmm0,QWORD PTR [rbp-0x460]                                                                            │
   │0x7ffff786b4ba <_IO_vfscanf+634>        mov    DWORD PTR [rbp-0x678],0x0                                                                             │
   │0x7ffff786b4c4 <_IO_vfscanf+644>        mov    QWORD PTR [rbp-0x608],rax                                                                             │
   │0x7ffff786b4cb <_IO_vfscanf+651>        movzx  eax,BYTE PTR [rbx+0x1]                                                                                │
   │0x7ffff786b4cf <_IO_vfscanf+655>        movhps xmm0,QWORD PTR [rbp-0x608]                                                                            │
  >│0x7ffff786b4d6 <_IO_vfscanf+662>        movaps XMMWORD PTR [rbp-0x470],xmm0                                                                          │

So it copied two 8-byte objects to the stack with movq + movhps to load and movaps to store. But with the stack misaligned, movaps [rbp-0x470],xmm0 faults.

I didn't grab a debug build to find out exactly which part of the C source turned into this, but the function is written in C and compiled by GCC with optimization enabled. GCC has always been allowed to do this, but only recently did it get smart enough to take better advantage of SSE2 this way.

Footnote 1: printf / scanf with AL != 0 has always required 16-byte alignment because gcc's code-gen for variadic functions uses test al,al / je to spill the full 16-byte XMM regs xmm0..7 with aligned stores in that case. __m128i can be an argument to a variadic function, not just double , and gcc doesn't check whether the function ever actually reads any 16-byte FP args.