1) Kernel functions such as Gaussian basis function or polynomials aids in non-linear separable classes.

2) Works well as a linear classifier.

3) Performs well on datasets that have many attributes.

4) Guarantees optimal separation of data.

5) Works well with heterogeneous distributed points.

1) Computationally intensive when dealing with unlabeled dataset.

2) Has limitation in speed and size during both training and testing phase of the algorithm.

3) Has limitation in speed with regards to the selection of the kernel function parameters.


1) Easy to implement and understand.

2) Considered high computational complexity because one needs to calculate Euclidean distance of input feature with all the features in the database. However, it is free of training phase but computational in classification phase.

3) Works well with heterogeneous distributed points.

1) It scales badly, if there are a million labeled examples in the dataset.

2) It would take a long time to find K nearest neighbors when there are a million labeled examples in the dataset.


1) With a higher number of units of the hidden layer, the network capacity becomes greater to represent the training data patterns.

1) Higher hidden layer has a high number of units and produces a loss in the networks’ generalization power.

2) More data would lengthen NN training times to unacceptable levels so that it would be highly impractical to work with them.

3) Has time-consuming parameter tuning procedure.


1) Generalization error always converges even with increasing number of trees.

2) It is not easy to overfit to one particular feature. However, overfitting to training data remains a problem.

3) Achieve results often faster.

1) Larger input datasets will lengthen classification times.