# python - 結合 - numpy:nansで区切られたチャンクの1D配列をチャンクのリストに分割します。

## python 配列 分割 (2)

``````import numpy as np
nan = np.nan

def using_clump(a):

x = [nan,nan, 1 , 2 , 3 , nan, nan, 10, 11 , nan, nan, nan, 23, 1, nan, 7, 8]

In [56]: using_clump(x)
Out[56]:
[array([ 1.,  2.,  3.]),
array([ 10.,  11.]),
array([ 23.,   1.]),
array([ 7.,  8.])]``````

using_clumpとusing_groupbyを比較したベンチマーク：

``````import itertools as IT
groupby = IT.groupby
def using_groupby(a):
return [list(v) for k,v in groupby(a,np.isfinite) if k]``````
``````In [58]: %timeit using_clump(x)
10000 loops, best of 3: 37.3 us per loop

In [59]: %timeit using_groupby(x)
10000 loops, best of 3: 53.1 us per loop``````

パフォーマンスは、より大きい配列の方が優れています。

``````In [9]: x = x*1000
In [12]: %timeit using_clump(x)
100 loops, best of 3: 5.69 ms per loop

In [13]: %timeit using_groupby(x)
10 loops, best of 3: 60 ms per loop``````

``[nan,nan, 1 , 2 , 3 , nan, nan, 10, 11 , nan, nan, nan, 23, 1, nan, 7, 8]``

``[[1,2,3], [10,11], [23,1], [7,8]]``

しかし...それは痛いほど遅いです...

おそらくもっと良いアイデアはありますか？

``````from numpy import NaN as nan
import numpy as np
a = np.array([nan,nan, 1 , 2 , 3 , nan, nan, 10, 11 , nan, nan, nan, 23, 1, nan, 7, 8])
from itertools import groupby
result = [list(v) for k,v in groupby(a,np.isfinite) if k]
print result #[[1.0, 2.0, 3.0], [10.0, 11.0], [23.0, 1.0], [7.0, 8.0]]``````