Author and Year

Dataset

Machine Learning Algorithm

Features

Challenges

Alam et al., 2018; Asada et al., 2020;

Radhika et al., 2019

UCI Machine Learning Repository and Data World (Lung cancer dataset)

Support Vector Machine (SVM)

- It is efficient in high dimensional spaces

- There is less probability of over-fitting

- It can manage linear and nonlinear data

- Training time is long when using large data sets It may be difficult to interpret and understand because of problems caused by personal factors and the weights of variables.

- The weights of the variables are not constant, thus the contribution of each variable to the output is variant

- The performance

- declines while the target classes are overlapping

- It is not suitable for vast datasets.

- High computational complexity

- Prone to over-fitting

- With a large dataset, performance goes down

- Do not work well when a dataset is noisy

Bharati et al., 2019

Lung cancer dataset

Random Forest

- It has the capability for solving regression and classification issues

- It has the capacity to handle the missing values automatically

- They do not over-fit

- Solve over-fitting a problem in the decision tree

- They are very complex and take more time to build a DT

- It is highly expensive because training more deep trees requires more storage space

- Computationally expensive training and inference

- Missing data imputation

- Hard to build accurate and computationally efficient classifiers for medical applications

Günaydin, et al., 2019;

Radhika et al., 2019;

Pradeep & Naveen., 2018

Lung Cancer

dataset

Naïve Bayes (NB)

- Easy to understand and efficient training algorithm

- Order of training instances has no effect on training

- Useful across multiple domains

- Handles discrete and continuous data

- Can be used for both binary and multi-classification

- Not sensitive to irrelevant features

- Feature interactions cannot be integrated

- Assumes attributes are statistically independent

- Assumes normal distribution of numerical attributes

- Redundant attributes mislead classification

- attributes and class frequencies affect accuracy

- Computationally intensive especially, for models including many variables

NegarMaleki et al. (2020);

Chinmayi, Aarsha and Sagi., (2020);

Kavit et al., (2018)

Lung Cancer dataset

K-Nearest Neighbour (KNN)

- Training is very quick.

- It is easy and simple to implement

- It is easy and simple to implement

- Tolerant of noisy instances or instances with missing attribute value

- works on concept that samples in nearby space are likely to fit in the alike class

- It requires more memory space

- The testing procedure is quite slow and the noise is very sensitive

- Noisy and irrelevant features resulted in degradation of accuracy

- Too computationally complex as number of attributes increases

- A lazy algorithm that needs more time to run

- High dimensionality of the feature space and imbalance in the size of the samples of the target classes

- Inaccurate or mislabeled training data which presents some noise in ML training data