functions - python numpy install
Why the performance difference between numpy.zeros and numpy.zeros_like? (2)
Modern OS allocate memory virtually, ie., memory is given to a process only when it is first used.
zeros obtains memory from the operating system so that the OS zeroes it when it is first used.
zeros_like on the other hand fills the alloced memory with zeros by itself. Both ways require about same amount of work --- it's just that with
zeros_like the zeroing is done upfront, whereas
zeros ends up doing it on the fly.
Technically, in C the difference is calling
I finally found a performance bottleneck in my code but am confused as to what the reason is. To solve it I changed all my calls of
numpy.zeros_like to instead use
numpy.zeros. But why is
zeros_like sooooo much slower?
For example (note
e-05 on the
>>> timeit.timeit('np.zeros((12488, 7588, 3), np.uint8)', 'import numpy as np', number = 10) 5.2928924560546875e-05 >>> timeit.timeit('np.zeros_like(x)', 'import numpy as np; x = np.zeros((12488, 7588, 3), np.uint8)', number = 10) 1.4402990341186523
But then strangely writing to an array created with
zeros is noticeably slower than an array created with
>>> timeit.timeit('x[100:-100, 100:-100] = 1', 'import numpy as np; x = np.zeros((12488, 7588, 3), np.uint8)', number = 10) 0.4310588836669922 >>> timeit.timeit('x[100:-100, 100:-100] = 1', 'import numpy as np; x = np.zeros_like(np.zeros((12488, 7588, 3), np.uint8))', number = 10) 0.33325695991516113
My guess is
zeros is using some CPU trick and not actually writing to the memory to allocate it. This is done on the fly when it's written to. But that still doesn't explain the massive discrepancy in array creation times.
I'm running Mac OS X Yosemite with the current numpy version:
>>> numpy.__version__ '1.9.1'
My timings in Ipython are (with a simplier timeit interface):
In : timeit np.zeros_like(x) 1 loops, best of 3: 420 ms per loop In : timeit np.zeros((12488, 7588, 3), np.uint8) 100000 loops, best of 3: 15.1 µs per loop
When I look at the code with IPython (
np.zeros_like??) I see:
res = empty_like(a, dtype=dtype, order=order, subok=subok) multiarray.copyto(res, 0, casting='unsafe')
np.zeros is a blackbox - pure compiled code.
In : timeit np.empty_like(x) 100000 loops, best of 3: 13.6 µs per loop In : timeit np.empty((12488, 7588, 3), np.uint8) 100000 loops, best of 3: 14.9 µs per loop
So the extra time in
zeros_like is in that
In my tests, the difference in assignment times (
x=1) is negligible.
My guess is that
empty are all early compiled creations.
empty_like was added as a convenience, just drawing shape and type info from its input.
zeros_like was written with more of an eye toward easy programming maintenance (reusing
empty_like) than for speed.
np.full also use the
np.empty ... copyto sequence, and show similar timings.
appears to be file that copies a scalar (such as
0) to an array. I don't see a use of
https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/alloc.c has calls to
https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/ctors.c - source for
empty. Both call
PyArray_NewFromDescr_int, but one ends up using
npy_alloc_cache_zero and the other
npy_alloc_cache followed by a
memset. Code in
alloc.c is further confused with a THREAD option.
More on the
malloc+memset difference at:
Why malloc+memset is slower than calloc?
But with caching and garbage collection, I wonder whether the
calloc/memset distinction applies.
This simple test with the
memory_profile package supports the claim that
empty allocate memory 'on-the-fly', while
zeros_like allocates everything up front:
N = (1000, 1000) M = (slice(None, 500, None), slice(500, None, None)) Line # Mem usage Increment Line Contents ================================================ 2 17.699 MiB 0.000 MiB @profile 3 def test1(N, M): 4 17.699 MiB 0.000 MiB print(N, M) 5 17.699 MiB 0.000 MiB x = np.zeros(N) # no memory jump 6 17.699 MiB 0.000 MiB y = np.empty(N) 7 25.230 MiB 7.531 MiB z = np.zeros_like(x) # initial jump 8 29.098 MiB 3.867 MiB x[M] = 1 # jump on usage 9 32.965 MiB 3.867 MiB y[M] = 1 10 32.965 MiB 0.000 MiB z[M] = 1 11 32.965 MiB 0.000 MiB return x,y,z