Model Evaluation is one of the most important parts of the model development process. The purpose is to find the best model which represents the data based on how well the chosen model will work in the future. The model evaluation involves the following concepts:
- Metrics for evaluation (this post)
- Cross Validation
- Bias vs Variance and Learning Curves
- Parameter Tuning and Grid Search
- Ensembling and combining models
- Boosting and building models on sampled data
The first concept to understand evaluation is metrics for evaluation. The idea behind a metric is a measure to determine “how good” the model is performing. In this post, we will cover the metrics for evaluation of classification, regression and clustering (in this order).
The basic visualization of the performance of an classification algorithm is the confusion matrix, also known as a contingency table or an error matrix. It shows the number of correct and incorrect predictions made by the classification model compared to the actual outcomes (target value) in the data. The matrix is NxN, where N is the number of target values (classes). Performance of such models is commonly evaluated using the data in the matrix.
The following table displays the most basic 2x2 confusion matrix for two classes (Positive and Negative) and related metrics.
Accuracy : the proportion of the total number of predictions that were correct.
Positive Predictive Value or Precision () : the proportion of positive cases that were correctly identified.
Negative Predictive Value () : the proportion of negative cases that were correctly identified.
Sensitivity or Recall : the proportion of actual positive cases which are correctly identified.
Specificity : the proportion of actual negative cases which are correctly identified.
While this is the most basic metrics you can use for classification, I highly recommend you read the Wikipedia article on the confusion matrix and related classification metrics.
Confusion Matrix Terminology
Model Evaluation Classification
Model Evaluation Regression