Distance measurements

Distance measurements are very important since they are used in clustering and nearest neibour algorithms. The Scipy library has an implemention of common distance computations. The common ones are outlined below:

Better figures, formulas and examples coming soon!

1. Numeric Attributes

Euclidean Distance: ordinary straight line distance between two points.

Given two points p and q:

Manhatten Distance: Distance a cab would drive (on a city grid)

2. Discrete

1 if the same, 0 if different

3. Itemsets (binary attributes)

Jaccard similarity

4. Text data/vector angles

Cosine similarity

Three part tutorial on the tf-idf and cosine similarity

Distance Measurements - February 19, 2015 - Andrew Andrade