Support vector machines

Linear models for classification

In addition to logistic regression, linear models for classification (e.g., Support Vector Machines (SVM)) are designed for problems with a binary class label and numeric feature variables. Suppose we have the following labeled dataset, with two numeric features - credit score and credit limit - and a binary class label loan, denoting whether the given customer was able to pay off his/her loan:

credit score	credit limit	loan
100	500	n
200	500	n
100	1000	n
400	500	y
500	500	y
300	1000	y
400	1000	y
500	1000	y

Now draw a scatter plot with credit score on the x axis and credit limit on the y axis. Notice that the following line divides the points with loan=yes from those with loan=no

15/4 x + y - 1500 = 0

This is the linear model that we can use for classification. To predict the class label of a new data point, we plug in the x and y values (i.e., the credit score and credit limit) to the above equation and see if we get a value below or above zero. If below zero, then the new data point lies below the dividing line, and we predict loan=no. Otherwise, we predict loan=yes.

For example, for (x,y)=(credit score,credit limit)=(500,500), we get 15/4 * 500 + 500 - 1500= 875 > 0, so predict loan = yes. For (x,y)=(credit score,credit limit)=(100,500), we get 15/4 * 100 + 500 -1500 = -625 < 0, so predict loan = no.

An interesting question is how to select the best line that divides the two class values.

Support vector machines

Further Reading