Distance measurements are very important since they are used in clustering and nearest neibour algorithms. The Scipy library has an implemention of common distance computations. The common ones are outlined below:
Better figures, formulas and examples coming soon!
1. Numeric Attributes
Euclidean Distance: ordinary straight line distance between two points.
Given two points p and q:
Manhatten Distance: Distance a cab would drive (on a city grid)
1 if the same, 0 if different
3. Itemsets (binary attributes)
4. Text data/vector angles
Three part tutorial on the tf-idf and cosine similarity