# functions - python numpy install

## Why the performance difference between numpy.zeros and numpy.zeros_like? (2)

Modern OS allocate memory virtually, ie., memory is given to a process only when it is first used. `zeros`

obtains memory from the operating system so that the OS zeroes it when it is first used. `zeros_like`

on the other hand fills the alloced memory with zeros by itself. Both ways require about same amount of work --- it's just that with `zeros_like`

the zeroing is done upfront, whereas `zeros`

ends up doing it on the fly.

Technically, in C the difference is calling `calloc`

vs. `malloc+memset`

.

I finally found a performance bottleneck in my code but am confused as to what the reason is. To solve it I changed all my calls of `numpy.zeros_like`

to instead use `numpy.zeros`

. But why is `zeros_like`

sooooo much slower?

For example (note `e-05`

on the `zeros`

call):

```
>>> timeit.timeit('np.zeros((12488, 7588, 3), np.uint8)', 'import numpy as np', number = 10)
5.2928924560546875e-05
>>> timeit.timeit('np.zeros_like(x)', 'import numpy as np; x = np.zeros((12488, 7588, 3), np.uint8)', number = 10)
1.4402990341186523
```

But then strangely writing to an array created with `zeros`

is noticeably slower than an array created with `zeros_like`

:

```
>>> timeit.timeit('x[100:-100, 100:-100] = 1', 'import numpy as np; x = np.zeros((12488, 7588, 3), np.uint8)', number = 10)
0.4310588836669922
>>> timeit.timeit('x[100:-100, 100:-100] = 1', 'import numpy as np; x = np.zeros_like(np.zeros((12488, 7588, 3), np.uint8))', number = 10)
0.33325695991516113
```

My guess is `zeros`

is using some CPU trick and not actually writing to the memory to allocate it. This is done on the fly when it's written to. But that still doesn't explain the massive discrepancy in array creation times.

I'm running Mac OS X Yosemite with the current numpy version:

```
>>> numpy.__version__
'1.9.1'
```

My timings in Ipython are (with a simplier timeit interface):

```
In [57]: timeit np.zeros_like(x)
1 loops, best of 3: 420 ms per loop
In [58]: timeit np.zeros((12488, 7588, 3), np.uint8)
100000 loops, best of 3: 15.1 µs per loop
```

When I look at the code with IPython (`np.zeros_like??`

) I see:

```
res = empty_like(a, dtype=dtype, order=order, subok=subok)
multiarray.copyto(res, 0, casting='unsafe')
```

while `np.zeros`

is a blackbox - pure compiled code.

Timings for `empty`

are:

```
In [63]: timeit np.empty_like(x)
100000 loops, best of 3: 13.6 µs per loop
In [64]: timeit np.empty((12488, 7588, 3), np.uint8)
100000 loops, best of 3: 14.9 µs per loop
```

So the extra time in `zeros_like`

is in that `copy`

.

In my tests, the difference in assignment times (`x[]=1`

) is negligible.

My guess is that `zeros`

, `ones`

, `empty`

are all early compiled creations. `empty_like`

was added as a convenience, just drawing shape and type info from its input. `zeros_like`

was written with more of an eye toward easy programming maintenance (reusing `empty_like`

) than for speed.

`np.ones`

and `np.full`

also use the `np.empty ... copyto`

sequence, and show similar timings.

https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/array_assign_scalar.c
appears to be file that copies a scalar (such as `0`

) to an array. I don't see a use of `memset`

.

https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/alloc.c has calls to `malloc`

and `calloc`

.

https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/ctors.c - source for `zeros`

and `empty`

. Both call `PyArray_NewFromDescr_int`

, but one ends up using `npy_alloc_cache_zero`

and the other `npy_alloc_cache`

.

`npy_alloc_cache`

in `alloc.c`

calls `alloc`

. `npy_alloc_cache_zero`

calls `npy_alloc_cache`

followed by a `memset`

. Code in `alloc.c`

is further confused with a THREAD option.

More on the `calloc`

v `malloc+memset`

difference at:
Why malloc+memset is slower than calloc?

But with caching and garbage collection, I wonder whether the `calloc/memset`

distinction applies.

This simple test with the `memory_profile`

package supports the claim that `zeros`

and `empty`

allocate memory 'on-the-fly', while `zeros_like`

allocates everything up front:

```
N = (1000, 1000)
M = (slice(None, 500, None), slice(500, None, None))
Line # Mem usage Increment Line Contents
================================================
2 17.699 MiB 0.000 MiB @profile
3 def test1(N, M):
4 17.699 MiB 0.000 MiB print(N, M)
5 17.699 MiB 0.000 MiB x = np.zeros(N) # no memory jump
6 17.699 MiB 0.000 MiB y = np.empty(N)
7 25.230 MiB 7.531 MiB z = np.zeros_like(x) # initial jump
8 29.098 MiB 3.867 MiB x[M] = 1 # jump on usage
9 32.965 MiB 3.867 MiB y[M] = 1
10 32.965 MiB 0.000 MiB z[M] = 1
11 32.965 MiB 0.000 MiB return x,y,z
```