向量化的NumPy Linspace用于多个起始值和终止值（numpy向量化编程）

25-02-28 10

对于想了解向量化的NumPyLinspace用于多个起始值和终止值的读者，本文将是一篇不可错过的文章，我们将详细介绍numpy向量化编程，并且为您提供关于KMeans向量化实现更新了群集质心Numpy

对于想了解向量化的NumPy Linspace用于多个起始值和终止值的读者，本文将是一篇不可错过的文章，我们将详细介绍numpy向量化编程，并且为您提供关于KMeans向量化实现更新了群集质心 Numpy Pro、NumPy 中向量化的参考索引、numpy.dot作为向量化操作的一部分、numpy.linspace 使用详解的有价值信息。

本文目录一览：

向量化的NumPy Linspace用于多个起始值和终止值（numpy向量化编程）
KMeans向量化实现更新了群集质心 Numpy Pro
NumPy 中向量化的参考索引
numpy.dot作为向量化操作的一部分
numpy.linspace 使用详解

向量化的NumPy Linspace用于多个起始值和终止值（numpy向量化编程）

我需要创建一个2D数组，其中每一行都可以以不同的数字开头和结尾。假设给定了每行的第一个和最后一个元素，并且根据行的长度对所有其他元素进行了插值。在一个简单的例子中，我想创建一个3X3数组，该数组的起点从0开始，但结尾由W指定，如下所示：

array([[ 0.,  1.,  2.],       [ 0.,  2.,  4.],       [ 0.,  3.,  6.]])

有没有比以下更好的方法：

D=np.ones((3,3))*np.arange(0,3)D=D/D[:,-1] W=np.array([2,4,6]) # last element of each row assumed givenRes= (D.T*W).T

答案1

小编典典

这是使用broadcasting-

def create_ranges(start, stop, N, endpoint=True):    if endpoint==1:        divisor = N-1    else:        divisor = N    steps = (1.0/divisor) * (stop - start)    return steps[:,None]*np.arange(N) + start[:,None]

样品运行-

In [22]: # Setup start, stop for each row and no. of elems in each row    ...: start = np.array([1,4,2])    ...: stop  = np.array([6,7,6])    ...: N = 5    ...:In [23]: create_ranges(start, stop, 5)Out[23]: array([[ 1.  ,  2.25,  3.5 ,  4.75,  6.  ],       [ 4.  ,  4.75,  5.5 ,  6.25,  7.  ],       [ 2.  ,  3.  ,  4.  ,  5.  ,  6.  ]])In [24]: create_ranges(start, stop, 5, endpoint=False)Out[24]: array([[ 1. ,  2. ,  3. ,  4. ,  5. ],       [ 4. ,  4.6,  5.2,  5.8,  6.4],       [ 2. ,  2.8,  3.6,  4.4,  5.2]])

让我们利用多核！

我们可以利用multi-core与numexpr模块的大数据，并获得存储效率，从而表现-

import numexpr as nedef create_ranges_numexpr(start, stop, N, endpoint=True):    if endpoint==1:        divisor = N-1    else:        divisor = N    s0 = start[:,None]    s1 = stop[:,None]    r = np.arange(N)    return ne.evaluate(''((1.0/divisor) * (s1 - s0))*r + s0'')

KMeans向量化实现更新了群集质心 Numpy Pro

更新中心

您可以使用boolean array indexing和computation along an axis仅显式地遍历群集，而不是遍历每个数据点。

K = 8
for k in range(K):
    centers[k] = X[label==k].mean(axis=0)

更新标签

这也可以通过遍历所有集群来完成：

distances = np.empty(shape=(X.shape[0],K))
for k in range(K):
    distances[:,k] = np.sqrt(np.sum((X - centers[k])**2,axis=1))
labels = distances.argmin(axis=1)

但是通过利用矩阵乘法是成对的点积，也可以在没有显式循环的情况下完成。

squared_distances = np.sum(centers**2,axis=1) + (np.sum(X**2,axis=1) - 2*centers @ X.T).T
squared_distances[np.isclose(squared_distances,0)] = 0  # self-distance can become slightly negative with this method (floating point precision problem)
distances = np.sqrt(squared_distances)
labels = distances.argmin(axis=1)

NumPy 中向量化的参考索引

如何解决NumPy 中向量化的参考索引？

我有几个 for 循环，我想对其进行矢量化以提高性能。它们对 1 x N 矩阵进行运算。

for y in range(1,len(array[0]) + 1):
        array[0,y - 1] =  np.floor(np.nanmean(otherArray[0,((y-1)*3):((y-1)*3+3)]))

for i in range(len(array[0])):
        array[0,int((i-1)*L+1)] = otherArray[0,i]

操作依赖于由 for 循环给出的数组索引。在使用 numpy.vectorize 时有什么方法可以访问索引，以便我可以将它们重写为矢量化函数？

解决方法

第一次循环：

import numpy as np
array = np.zeros((1,10))
otherArray = np.arange(30).reshape(1,-1)


print(f''array = \n{array}'')
print(f''otherArray = \n{otherArray}'')

for y in range(1,len(array[0]) + 1):
        array[0,y - 1] =  np.floor(np.nanmean(otherArray[0,((y-1)*3):((y-1)*3+3)]))

print(f''array = \n{array}'')

array = np.floor(np.nanmean(otherArray.reshape(-1,3),axis = 1)).reshape(1,-1)

print(f''array = \n{array}'')

输出：

array = 
[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]
otherArray = 
[[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
  24 25 26 27 28 29]]
array = 
[[ 1.  4.  7. 10. 13. 16. 19. 22. 25. 28.]]
array = 
[[ 1.  4.  7. 10. 13. 16. 19. 22. 25. 28.]]

第二个循环：

array = np.zeros((1,10))
otherArray = np.arange(10,dtype = float).reshape(1,-1)
L = 1

print(f''array = \n{array}'')
print(f''otherArray = \n{otherArray}'')


for i in range(len(otherArray[0])):
        array[0,int((i-1)*L+1)] = otherArray[0,i]

print(f''array = \n{array}'')


array = otherArray

print(f''array = \n{array}'')

输出：

array = 
[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]
otherArray = 
[[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]]
array = 
[[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]]
array = 
[[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]]

看起来您在第一个循环中试图计算移动平均线。最好这样做：

import numpy as np


window_width = 3
arr = np.arange(12)

out = np.floor(np.nanmean(arr.reshape(-1,window_width),axis=-1))

print(out)

关于你的第二个循环，我不知道它是做什么的。您正在尝试将值从 otherArray 复制到具有某些偏移量的数组？我建议你看看 numpy 的切片功能。

numpy.dot作为向量化操作的一部分

In [31]: a = np.array([[1,2],[2,3],[3,4],[4,5],[5,6],[6,7],[7,8]]) #shape is (5,7)
    ...: b = np.array([[11],[12],[11],[11]]) #shape is (5,1)
    ...: c = np.array([[10],[20],[30],[40],[50],[60],[70]]) #shape is (7,1)
In [32]: a.shape,b.shape,c.shape
Out[32]: ((7,2),(5,1),(7,1))

a.shape与评论不符。

In [33]:     iBatchSize = a.shape[0]
    ...:     iFeatureCount = a.shape[1]
    ...: 
    ...:     result = np.zeros((iBatchSize,1))
    ...: 
    ...:     for i in range(iBatchSize):
    ...:         for j in range(iFeatureCount):
    ...:             result [i] = 10 + (b[i][0] * (np.dot(c.T,a[i]) + b))
    ...: 
Traceback (most recent call last):
  File "<ipython-input-33-717691add3dd>",line 8,in <module>
    result [i] = 10 + (b[i][0] * (np.dot(c.T,a[i]) + b))
  File "<__array_function__ internals>",line 6,in dot
ValueError: shapes (1,7) and (2,) not aligned: 7 (dim 1) != 2 (dim 0)

np.dot正在引发该错误。它期望第一个arg的最后一个与第二个arg的第二个到最后一个（或唯一）匹配：

In [34]: i
Out[34]: 0
In [35]: c.T.shape
Out[35]: (1,7)
In [37]: a[i].shape
Out[37]: (2,)

此dot起作用：

In [38]: np.dot(c.T,a).shape    # (1,7) with (7,2) => (1,2)
Out[38]: (1,2)

====

使用正确的a，

10 + (b[i][0] * (np.dot(c.T,a[i]) + b))

是（5,1）数组（由于+b），不能放在result[i]中。

===

a和c的简单点产生一个（5,1），可以将其与b组合（使用+或*或同时使用两者），结果是（5 ，1）数组：

In [68]: np.dot(a,c).shape
Out[68]: (5,1)
In [69]: b*(np.dot(a,c)+b)
Out[69]: 
array([[15521],[16944],[15521],[15521]])

numpy.linspace 使用详解

numpy.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)

在指定的间隔内返回均匀间隔的数字。

返回 num 均匀分布的样本，在 [start, stop]。

这个区间的端点可以任意的被排除在外。

1 arange
2 Similar to linspace, but uses a step size (instead of the number of samples).
3 arange使用的是步长，而不是样本的数量 
4 
5 logspace
6 Samples uniformly distributed in log space.

当 endpoint 被设置为 False 的时候
>>> import numpy as np
>>> np.linspace(1, 10, 10)
array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.])
>>> np.linspace(1, 10, 10, endpoint = False)
array([ 1. , 1.9, 2.8, 3.7, 4.6, 5.5, 6.4, 7.3, 8.2, 9.1])
In [4]: np.linspace(1, 10, 10, endpoint = False, retstep= True)
Out[4]: (array([ 1. , 1.9, 2.8, 3.7, 4.6, 5.5, 6.4, 7.3, 8.2, 9.1]), 0.9)

1 >>> np.linspace(2.0, 3.0, num=5)
2     array([ 2.  ,  2.25,  2.5 ,  2.75,  3.  ])
3 >>> np.linspace(2.0, 3.0, num=5, endpoint=False)
4     array([ 2. ,  2.2,  2.4,  2.6,  2.8])
5 >>> np.linspace(2.0, 3.0, num=5, retstep=True)
6     (array([ 2.  ,  2.25,  2.5 ,  2.75,  3.  ]), 0.25)

 1 >>> import matplotlib.pyplot as plt
 2 >>> N = 8
 3 >>> y = np.zeros(N)
 4 >>> x1 = np.linspace(0, 10, N, endpoint=True)
 5 >>> x2 = np.linspace(0, 10, N, endpoint=False)
 6 >>> plt.plot(x1, y, ''o'')
 7 [<matplotlib.lines.Line2D object at 0x...>]
 8 >>> plt.plot(x2, y + 0.5, ''o'')
 9 [<matplotlib.lines.Line2D object at 0x...>]
10 >>> plt.ylim([-0.5, 1])
11 (-0.5, 1)
12 >>> plt.show()

关于向量化的NumPy Linspace用于多个起始值和终止值和numpy向量化编程的问题就给大家分享到这里，感谢你花时间阅读本站内容，更多关于KMeans向量化实现更新了群集质心 Numpy Pro、NumPy 中向量化的参考索引、numpy.dot作为向量化操作的一部分、numpy.linspace 使用详解等相关知识的信息别忘了在本站进行查找喔。

本文标签：