# assembly fast 64 - Why does GCC use multiplication by a strange number in implementing integer division?

Dividing by 5 is the same as multiplying 1/5, which is again the same as multiplying by 4/5 and shifting right 2 bits. The value concerned is `CCCCCCCCCCCCD`

in hex, which is the binary representation of 4/5 if put after a hexadecimal point (i.e. the binary for four fifths is `0.110011001100`

recurring - see below for why). I think you can take it from here! You might want to check out fixed point arithmetic (though note it's rounded to an integer at the end.

As to why, multiplication is faster than division, and when the divisor is fixed, this is a faster route.

See Reciprocal Multiplication, a tutorial for a detailed writeup about how it works, explaining in terms of fixed-point. It shows how the algorithm for finding the reciprocal works, and how to handle signed division and modulo.

Let's consider for a minute why `0.CCCCCCCC...`

(hex) or `0.110011001100...`

binary is 4/5. Divide the binary representation by 4 (shift right 2 places), and we'll get `0.001100110011...`

which by trivial inspection can be added the original to get `0.111111111111...`

, which is obviously equal to 1, the same way `0.9999999...`

in decimal is equal to one. Therefore, we know that `x + x/4 = 1`

, so `5x/4 = 1`

, `x=4/5`

. This is then represented as `CCCCCCCCCCCCD`

in hex for rounding (as the binary digit beyond the last one present would be a `1`

).

I've been reading about `div`

and `mul`

assembly operations, and I decided to see them in action by writing a simple program in C:

### File division.c

```
#include <stdlib.h>
#include <stdio.h>
int main()
{
size_t i = 9;
size_t j = i / 5;
printf("%zu\n",j);
return 0;
}
```

And then generating assembly language code with:

```
gcc -S division.c -O0 -masm=intel
```

But looking at generated `division.s`

file, it doesn't contain any div operations! Instead, it does some kind of black magic with bit shifting and magic numbers. Here's a code snippet that computes `i/5`

:

```
mov rax, QWORD PTR [rbp-16] ; Move i (=9) to RAX
movabs rdx, -3689348814741910323 ; Move some magic number to RDX (?)
mul rdx ; Multiply 9 by magic number
mov rax, rdx ; Take only the upper 64 bits of the result
shr rax, 2 ; Shift these bits 2 places to the right (?)
mov QWORD PTR [rbp-8], rax ; Magically, RAX contains 9/5=1 now,
; so we can assign it to j
```

What's going on here? Why doesn't GCC use div at all? How does it generate this magic number and why does everything work?

Here is link to a document of an algorithm that produces the values and code I see with Visual Studio (in most cases) and that I assume is still used in GCC for division of a variable integer by a constant integer.

http://gmplib.org/~tege/divcnst-pldi94.pdf

In the article, a uword has N bits, a udword has 2N bits, n = numerator, d = denominator = divisor, ℓ is initially set to ceil(log2(d)), shpre is pre-shift (used before multiply) = e = number of trailing zero bits in d, shpost is post-shift (used after multiply), prec is precision = N - e = N - shpre. The goal is to optimize calculation of n/d using a pre-shift, multiply, and post-shift.

Scroll down to figure 6.2, which defines how a udword multiplier (max size is N+1 bits), is generated, but doesn't clearly explain the process. I'll explain this below.

Figure 4.2 and figure 6.2 show how the multiplier can be reduced to a N bit or less multiplier for most divisors. Equation 4.5 explains how the formula used to deal with N+1 bit multipliers in figure 4.1 and 4.2 was derived.

Going back to Figure 6.2. The numerator can be larger than a udword only when divisor > 2^(N-1) (when ℓ == N), in this case the optimized replacement for n/d is a compare (if n>=d, q = 1, else q = 0), so no multiplier is generated. The initial values of mlow and mhigh will be N+1 bits, and two udword/uword divides can be used to produce each N+1 bit value (mlow or mhigh). Using X86 in 64 bit mode as an example:

```
; upper 8 bytes of numerator = 2^(ℓ) = (upper part of 2^(N+ℓ))
; lower 8 bytes of numerator for mlow = 0
; lower 8 bytes of numerator for mhigh = 2^(N+ℓ-prec) = 2^(ℓ+shpre) = 2^(ℓ+e)
numerator dq 2 dup(?) ;16 byte numerator
divisor dq 1 dup(?) ; 8 byte divisor
; ...
mov rcx,divisor
mov rdx,0
mov rax,numerator+8 ;upper 8 bytes of numerator
div rcx ;after div, rax == 1
mov rax,numerator ;lower 8 bytes of numerator
div rcx
mov rdx,1 ;rdx:rax = N+1 bit value = 65 bit value
```

You can test this with GCC. You're already seen how j = i/5 is handled. Take a look at how j = i/7 is handled (which should be the N+1 bit multiplier case).