Support vector machines

Linear models for classification

In addition to logistic regression, linear models for classification (e.g., Support Vector Machines (SVM)) are designed for problems with a binary class label and numeric feature variables. Suppose we have the following labeled dataset, with two numeric features - credit score and credit limit - and a binary class label loan, denoting whether the given customer was able to pay off his/her loan:

credit score credit limit loan
100 500 n
200 500 n
100 1000 n
400 500 y
500 500 y
300 1000 y
400 1000 y
500 1000 y

Now draw a scatter plot with credit score on the x axis and credit limit on the y axis. Notice that the following line divides the points with loan=yes from those with loan=no

15/4 x + y - 1500 = 0

This is the linear model that we can use for classification. To predict the class label of a new data point, we plug in the x and y values (i.e., the credit score and credit limit) to the above equation and see if we get a value below or above zero. If below zero, then the new data point lies below the dividing line, and we predict loan=no. Otherwise, we predict loan=yes.

For example, for (x,y)=(credit score,credit limit)=(500,500), we get 15/4 * 500 + 500 - 1500= 875 > 0, so predict loan = yes. For (x,y)=(credit score,credit limit)=(100,500), we get 15/4 * 100 + 500 -1500 = -625 < 0, so predict loan = no.

An interesting question is how to select the best line that divides the two class values.

Further Reading

Support Vector Machines and Margins simplified
Support Vector Machine Full
Support Vector Machines

Support Vector Machines - February 19, 2015 - Andrew Andrade