Support vector machines
Linear models for classification
In addition to logistic regression, linear models for classification (e.g., Support Vector Machines (SVM)) are designed for problems with a binary class label and numeric feature variables. Suppose we have the following labeled dataset, with two numeric features - credit score and credit limit - and a binary class label loan, denoting whether the given customer was able to pay off his/her loan:
credit score | credit limit | loan |
100 | 500 | n |
200 | 500 | n |
100 | 1000 | n |
400 | 500 | y |
500 | 500 | y |
300 | 1000 | y |
400 | 1000 | y |
500 | 1000 | y |
Now draw a scatter plot with credit score on the x axis and credit limit on the y axis. Notice that the following line divides the points with loan=yes from those with loan=no
15/4 x + y - 1500 = 0
This is the linear model that we can use for classification. To predict the class label of a new data point, we plug in the x and y values (i.e., the credit score and credit limit) to the above equation and see if we get a value below or above zero. If below zero, then the new data point lies below the dividing line, and we predict loan=no. Otherwise, we predict loan=yes.
For example, for (x,y)=(credit score,credit limit)=(500,500), we get 15/4 * 500 + 500 - 1500= 875 > 0, so predict loan = yes. For (x,y)=(credit score,credit limit)=(100,500), we get 15/4 * 100 + 500 -1500 = -625 < 0, so predict loan = no.
An interesting question is how to select the best line that divides the two class values.
Further Reading
Support Vector Machines and Margins simplified
Support Vector Machine Full
Support Vector Machines