python两个数组相同个数_python–快速算法,⽤于查多个数
组具有相同值的索引...
最终破解了它的⽮量化解决⽅案!这是⼀个有趣的问题.问题是我们必须标记从列表的相应数组元素中获取的每对值.然后,我们应该根据它们在其他对中的唯⼀性来标记每个这样的对.因此,我们可以使⽤np.unique滥⽤所有可选参数,最后做⼀些额外的⼯作来保持最终输出的顺序.这⾥的实施基本上分三个阶段完成 –
# Stack as a 2D array with each pair from values as a column each.
# Convert to linear index equivalent considering each column as indexing tuple
arr = np.vstack(values)
idx = np.ravel_multi_index(arr,arr.max(1)+1)
# Do the heavy work with np.unique to give us :
# 1. Starting indices of unique elems,
# 2. Srray that has unique IDs for each element in idx, and
# 3. Group ID counts
_,unq_start_idx,unqID,count = np.unique(idx,return_index=True, \
return_inverse=True,return_counts=True)
# Best part happens here : Use mask to ignore the repeated elems and re-tag
# each unqID using argsort() of masked elements from idx
mask = ~np.in1d(unqID,np.where(count>1)[0])
mask[unq_start_idx] = 1
out = idx[mask].argsort()[unqID]
运⾏时测试
让我们将提出的⽮量化⽅法与原始代码进⾏⽐较.由于建议的代码仅为我们提供了组ID,因此对于公平的基准测试,我们只需从原始代码中删除不⽤于提供给我们的部分.那么,这是函数定义 –
def groupify(values): # Original code
group = np.zeros((len(values[0]),), dtype=np.int64) - 1
next_hash = 0
matching = np.ones((len(values[0]),), dtype=bool)
while any(group == -1):
matching[:] = (group == -1)
first_ungrouped_idx = np.where(matching)[0][0]
for curr_id, value_array in enumerate(values):
needed_value = value_array[first_ungrouped_idx]
python 定义数组matching[matching] = value_array[matching] == needed_value
# Assign all of the found elements to a new group
group[matching] = next_hash
next_hash += 1
return group
def groupify_vectorized(values): # Proposed code
arr = np.vstack(values)
idx = np.ravel_multi_index(arr,arr.max(1)+1)
_,unq_start_idx,unqID,count = np.unique(idx,return_index=True, \ return_inverse=True,return_counts=True)
mask = ~np.in1d(unqID,np.where(count>1)[0])
mask[unq_start_idx] = 1
return idx[mask].argsort()[unqID]
运⾏时结果列表包含⼤型数组 –
In [345]: # Input list with random elements
...: values = [item for item in np.random.randint(10,40,(10,10000))] In [346]: np.allclose(groupify(values),groupify_vectorized(values)) Out[346]: True
In [347]: %timeit groupify(values)
1 loops, best of 3: 4.0
2 s per loop
In [348]: %timeit groupify_vectorized(values)
100 loops, best of 3: 3.74 ms per loop
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论