mvpa2.measures.rsa.cdist¶
-
mvpa2.measures.rsa.
cdist
(XA, XB, metric='euclidean', p=2, V=None, VI=None, w=None)¶ Computes distance between each pair of the two collections of inputs.
The following are common calling conventions:
Y = cdist(XA, XB, 'euclidean')
Computes the distance between
points using Euclidean distance (2-norm) as the distance metric between the points. The points are arranged as -dimensional row vectors in the matrix X.Y = cdist(XA, XB, 'minkowski', p)
Computes the distances using the Minkowski distance
( -norm) where .Y = cdist(XA, XB, 'cityblock')
Computes the city block or Manhattan distance between the points.
Y = cdist(XA, XB, 'seuclidean', V=None)
Computes the standardized Euclidean distance. The standardized Euclidean distance between two n-vectors
u
andv
isV is the variance vector; V[i] is the variance computed over all the i’th components of the points. If not passed, it is automatically computed.
Y = cdist(XA, XB, 'sqeuclidean')
Computes the squared Euclidean distance
between the vectors.Y = cdist(XA, XB, 'cosine')
Computes the cosine distance between vectors u and v,
where
is the 2-norm of its argument*
, and is the dot product of and .Y = cdist(XA, XB, 'correlation')
Computes the correlation distance between vectors u and v. This is
where
is the mean of the elements of vector v, and is the dot product of and .Y = cdist(XA, XB, 'hamming')
Computes the normalized Hamming distance, or the proportion of those vector elements between two n-vectors
u
andv
which disagree. To save memory, the matrixX
can be of type boolean.Y = cdist(XA, XB, 'jaccard')
Computes the Jaccard distance between the points. Given two vectors,
u
andv
, the Jaccard distance is the proportion of those elementsu[i]
andv[i]
that disagree where at least one of them is non-zero.Y = cdist(XA, XB, 'chebyshev')
Computes the Chebyshev distance between the points. The Chebyshev distance between two n-vectors
u
andv
is the maximum norm-1 distance between their respective elements. More precisely, the distance is given byY = cdist(XA, XB, 'canberra')
Computes the Canberra distance between the points. The Canberra distance between two points
u
andv
isY = cdist(XA, XB, 'braycurtis')
Computes the Bray-Curtis distance between the points. The Bray-Curtis distance between two points
u
andv
isY = cdist(XA, XB, 'mahalanobis', VI=None)
Computes the Mahalanobis distance between the points. The Mahalanobis distance between two pointsu
andv
is where (theVI
variable) is the inverse covariance. IfVI
is not None,VI
will be used as the inverse covariance matrix.Y = cdist(XA, XB, 'yule')
Computes the Yule distance between the boolean vectors. (seeyule
function documentation)Y = cdist(XA, XB, 'matching')
Synonym for ‘hamming’.Y = cdist(XA, XB, 'dice')
Computes the Dice distance between the boolean vectors. (seedice
function documentation)Y = cdist(XA, XB, 'kulsinski')
Computes the Kulsinski distance between the boolean vectors. (seekulsinski
function documentation)Y = cdist(XA, XB, 'rogerstanimoto')
Computes the Rogers-Tanimoto distance between the boolean vectors. (seerogerstanimoto
function documentation)Y = cdist(XA, XB, 'russellrao')
Computes the Russell-Rao distance between the boolean vectors. (seerussellrao
function documentation)Y = cdist(XA, XB, 'sokalmichener')
Computes the Sokal-Michener distance between the boolean vectors. (seesokalmichener
function documentation)Y = cdist(XA, XB, 'sokalsneath')
Computes the Sokal-Sneath distance between the vectors. (seesokalsneath
function documentation)Y = cdist(XA, XB, 'wminkowski')
Computes the weighted Minkowski distance between the vectors. (seewminkowski
function documentation)Y = cdist(XA, XB, f)
Computes the distance between all pairs of vectors in X using the user supplied 2-arity function f. For example, Euclidean distance between the vectors could be computed as follows:
dm = cdist(XA, XB, lambda u, v: np.sqrt(((u-v)**2).sum()))
Note that you should avoid passing a reference to one of the distance functions defined in this library. For example,:
dm = cdist(XA, XB, sokalsneath)
would calculate the pair-wise distances between the vectors in X using the Python function
times, which is inefficient. Instead, the optimized C version is more efficient, and we call it using the following syntax:sokalsneath
. This would result in sokalsneath being calleddm = cdist(XA, XB, 'sokalsneath')
Parameters: XA : ndarray
An
by array of original observations in an -dimensional space. Inputs are converted to float type.XB : ndarray
An
by array of original observations in an -dimensional space. Inputs are converted to float type.metric : str or callable, optional
The distance metric to use. If a string, the distance function can be ‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’, ‘jaccard’, ‘kulsinski’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘wminkowski’, ‘yule’.
w : ndarray, optional
The weight vector (for weighted Minkowski).
p : scalar, optional
The p-norm to apply (for Minkowski, weighted and unweighted)
V : ndarray, optional
The variance vector (for standardized Euclidean).
VI : ndarray, optional
The inverse of the covariance matrix (for Mahalanobis).
Returns: Y : ndarray
A
by distance matrix is returned. For each and , the metricdist(u=XA[i], v=XB[j])
is computed and stored in the th entry.Raises: ValueError
An exception is thrown if
XA
andXB
do not have the same number of columns.Examples
Find the Euclidean distances between four 2-D coordinates:
>>> from scipy.spatial import distance >>> coords = [(35.0456, -85.2672), ... (35.1174, -89.9711), ... (35.9728, -83.9422), ... (36.1667, -86.7833)] >>> distance.cdist(coords, coords, 'euclidean') array([[ 0. , 4.7044, 1.6172, 1.8856], [ 4.7044, 0. , 6.0893, 3.3561], [ 1.6172, 6.0893, 0. , 2.8477], [ 1.8856, 3.3561, 2.8477, 0. ]])
Find the Manhattan distance from a 3-D point to the corners of the unit cube:
>>> a = np.array([[0, 0, 0], ... [0, 0, 1], ... [0, 1, 0], ... [0, 1, 1], ... [1, 0, 0], ... [1, 0, 1], ... [1, 1, 0], ... [1, 1, 1]]) >>> b = np.array([[ 0.1, 0.2, 0.4]]) >>> distance.cdist(a, b, 'cityblock') array([[ 0.7], [ 0.9], [ 1.3], [ 1.5], [ 1.5], [ 1.7], [ 2.1], [ 2.3]])