Principal component analysis (PCA) is a standard statistical procedure to convert a set of possibly correlated variables into a (typically smaller) set of linearly uncorrelated variables by using a coordinate transformation.


R squared: coefficient of determination, measures the variance in the predicted variable that is accounted by the regression built using the predictors (code metrics combined with static analysis fault density).


Mean squared error (MSE) is a measure of the unbiased error estimate of the error variance.

ROC curve

Receiver operating characteristic (ROC) curve, is a popular measure for evaluating classifier performance. The ROC curve is created by plotting the true positive rate against the false positive rate at various threshold settings.


Area under curve (AUC) equals the probability that the classifier predicts a randomly chosen true positive higher than a randomly chosen false negative.

The larger the AUC, the more accurate is the classification model.