Author and Year | Dataset | Machine Learning Algorithm | Features | Challenges |
Alam et al., 2018; Asada et al., 2020; Radhika et al., 2019 | UCI Machine Learning Repository and Data World (Lung cancer dataset) | Support Vector Machine (SVM) | - It is efficient in high dimensional spaces - There is less probability of over-fitting - It can manage linear and nonlinear data - Training time is long when using large data sets It may be difficult to interpret and understand because of problems caused by personal factors and the weights of variables. - The weights of the variables are not constant, thus the contribution of each variable to the output is variant | - The performance - declines while the target classes are overlapping - It is not suitable for vast datasets. - High computational complexity - Prone to over-fitting - With a large dataset, performance goes down - Do not work well when a dataset is noisy |
Bharati et al., 2019 | Lung cancer dataset | Random Forest | - It has the capability for solving regression and classification issues - It has the capacity to handle the missing values automatically - They do not over-fit - Solve over-fitting a problem in the decision tree | - They are very complex and take more time to build a DT - It is highly expensive because training more deep trees requires more storage space - Computationally expensive training and inference - Missing data imputation - Hard to build accurate and computationally efficient classifiers for medical applications |
Günaydin, et al., 2019; Radhika et al., 2019; Pradeep & Naveen., 2018 | Lung Cancer dataset | Naïve Bayes (NB) | - Easy to understand and efficient training algorithm - Order of training instances has no effect on training - Useful across multiple domains - Handles discrete and continuous data - Can be used for both binary and multi-classification - Not sensitive to irrelevant features | - Feature interactions cannot be integrated - Assumes attributes are statistically independent - Assumes normal distribution of numerical attributes - Redundant attributes mislead classification - attributes and class frequencies affect accuracy - Computationally intensive especially, for models including many variables |
NegarMaleki et al. (2020); Chinmayi, Aarsha and Sagi., (2020); Kavit et al., (2018) | Lung Cancer dataset | K-Nearest Neighbour (KNN) | - Training is very quick. - It is easy and simple to implement - It is easy and simple to implement - Tolerant of noisy instances or instances with missing attribute value - works on concept that samples in nearby space are likely to fit in the alike class | - It requires more memory space - The testing procedure is quite slow and the noise is very sensitive - Noisy and irrelevant features resulted in degradation of accuracy - Too computationally complex as number of attributes increases - A lazy algorithm that needs more time to run - High dimensionality of the feature space and imbalance in the size of the samples of the target classes - Inaccurate or mislabeled training data which presents some noise in ML training data |