Author and Year	Dataset	Machine Learning Algorithm	Features	Challenges
Alam et al., 2018; Asada et al., 2020; Radhika et al., 2019	UCI Machine Learning Repository and Data World (Lung cancer dataset)	Support Vector Machine (SVM)	- It is efficient in high dimensional spaces - There is less probability of over-fitting - It can manage linear and nonlinear data - Training time is long when using large data sets It may be difficult to interpret and understand because of problems caused by personal factors and the weights of variables. - The weights of the variables are not constant, thus the contribution of each variable to the output is variant	- The performance - declines while the target classes are overlapping - It is not suitable for vast datasets. - High computational complexity - Prone to over-fitting - With a large dataset, performance goes down - Do not work well when a dataset is noisy
Bharati et al., 2019	Lung cancer dataset	Random Forest	- It has the capability for solving regression and classification issues - It has the capacity to handle the missing values automatically - They do not over-fit - Solve over-fitting a problem in the decision tree	- They are very complex and take more time to build a DT - It is highly expensive because training more deep trees requires more storage space - Computationally expensive training and inference - Missing data imputation - Hard to build accurate and computationally efficient classifiers for medical applications
Günaydin, et al., 2019; Radhika et al., 2019; Pradeep & Naveen., 2018	Lung Cancer dataset	Naïve Bayes (NB)	- Easy to understand and efficient training algorithm - Order of training instances has no effect on training - Useful across multiple domains - Handles discrete and continuous data - Can be used for both binary and multi-classification - Not sensitive to irrelevant features	- Feature interactions cannot be integrated - Assumes attributes are statistically independent - Assumes normal distribution of numerical attributes - Redundant attributes mislead classification - attributes and class frequencies affect accuracy - Computationally intensive especially, for models including many variables
NegarMaleki et al. (2020); Chinmayi, Aarsha and Sagi., (2020); Kavit et al., (2018)	Lung Cancer dataset	K-Nearest Neighbour (KNN)	- Training is very quick. - It is easy and simple to implement - It is easy and simple to implement - Tolerant of noisy instances or instances with missing attribute value - works on concept that samples in nearby space are likely to fit in the alike class	- It requires more memory space - The testing procedure is quite slow and the noise is very sensitive - Noisy and irrelevant features resulted in degradation of accuracy - Too computationally complex as number of attributes increases - A lazy algorithm that needs more time to run - High dimensionality of the feature space and imbalance in the size of the samples of the target classes - Inaccurate or mislabeled training data which presents some noise in ML training data