Algorithms

Explanation

LinearSVC

Support vector machine (SVM) have two main categories: support vector classification (SVC) and support vector regression (SVR). SVM is a learning system using a high dimensional feature space. The main objective of support vector machine is to identify maximum margin hyper plane as the final decision boundary.

Logistic

Regression

Logistic regression predicts the probability of an outcome that can only have two values (i.e. a dichotomy). A logistic regression produces a logistic curve, which is limited to values between 0 and 1. Logistic regression is similar to a linear regression, but the curve is constructed using the natural logarithm of the “odds” of the target variable, rather than the probability. Moreover, the predictors do not have to be normally distributed or have equal variance in each group.

Naive

Bayesian

The Naive Bayesian classifier is based on Bayes theorem with the independence assumptions between predictors. Bayes theorem provides a way of calculating the posterior probability. A Naive Bayesian model is useful for very large datasets. Despite its simplicity, the Naive Bayesian classifier often does surprisingly well and is widely used because it often outperforms more sophisticated classification methods.

K neighbors

Classifier

K nearest neighbors is a simple algorithm that stores all available cases and classifies new cases based on a similarity measure. KNN has been used in statistical estimation and pattern recognition already in 1970’s as a non-parametric technique. A case is classified by a majority vote of its neighbors, with the case being assigned to the class most common amongst its K nearest neighbors measured by a distance function.

Decision

Tree

Decision tree builds classification or regression models in the form of a tree structure. It breaks down a dataset into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed. The final result is a tree with decision nodes and leaf nodes.

Random

Forest

The training algorithm for random forests applies the general technique of bootstrap aggregating, or bagging, to tree learners.

AdaBoost

Adaptive boosting machine learning meta-algorithm used for enhancing performance and classifier accuracy by means of adding more weight to previously misclassified instances.