Shannon Entropy of 2 Dimensional (2D) Gabor wavelet	Song, et al. [22]	2016	The image is filtered by 2D Gabor wavelet then the features are extracted. This wavelet can effectively captures the image texture and edge properties from different scale and orientations.	1. Detection error rate 2. Entropy value	Advantage 1. Effectively capturing the changes in image texture Disadvantage 1. Reduced detection accuracy
Discrete Fourier Transform (DFT) and Discrete Cosine Transform (DCT)	Deepa, et al. [23]	2012	The transformation of facial images in frequency domain used to reduce the image redundancy. Five images are randomly selected for training and another five images are selected for testing.	1. Recognition rate 2. Training time 3. Testing time 4. Euclidean distance	Advantage 1. Dimensionality reduction 2. Higher recognition rate Disadvantage 1. Reduced sustainability 2. Simple classifier is used
Enhanced histogram features method	Song, et al. [25]	2015	The Perturbed Quantization (PQ) is applied for double compression JPEG images. The global, local and dual histogram features of DCT coefficients and its differences are calculated.	1. True positive rate 2. False positive rate 3. Detection accuracy	Advantage 1. High detection accuracy Disadvantage 1. Constrained for complex image regions
Machine Learning Based Classification Techniques
Ensemble Classifier	Kodovsky, et al. [27]	2012	The ensemble classifier provides the fast construction of the steganography detector. The steganalyst are allowed to work on the high dimensional feature space with large dataset.	1. Detection error 2. Median (MED) 3. Median Absolute Deviation (MAD)		Advantage 1. Improved accuracy Disadvantage 1. Increased computational complexity
Ensemble based Extreme Learning Machine (EN-ELM)	Liu and Wang [28]	2010	The decisions are made by the cross validation scheme. The Discrete Cosine Transform (DCT) reduces the dimensionality.	1. Classification accuracy 2. testing accuracy 3. Training time		Advantage 1. Alleviate overfitting 2. Higher testing accuracy Disadvantage 1. Increased training time 2. Increased computational burden
Extreme Learning Machine (ELM) classifier	Huang, et al. [29]	2010	The ELM is based on the Karush-Kuhn-Tucker (KKT) theorem. All training data are linearly separable in the ELM feature space. ELM is less sensitive to learning parameters.	1. Testing rate 2. Training time 3. Testing deviation		Advantage 1. Easily implemented 2. Minimized testing error Disadvantage 1. Average testing accuracy
Differential Evolution based Extreme Learning Machine (DE-ELM)	Bazi, et al. [31]	2014	An automatic solution based Differential Evolution (DE) algorithm is developed in associated with the ELM classifier. The Principal Component Analysis (PCA) is applied to reduce the dimensionality of the data.	1. Overall (OA) standard deviation 2. Average (AA) standard deviation 3. Sensitivity		Advantage 1. Faster solution Disadvantage 1. Reduced classification accuracy
Support Vector Machine (SVM) classifier	Shankar, et al [36]	2012	The block dependency features (inter and intra features) are used for the classification of steganography.	1. classification percentage 2. Embedding percentage		Advantage 1. Faster solution Disadvantage 1. Message length is not detected
Hinge loss function based cognitive ensemble of Extreme Learning Machine (ELM)	Sachnev, et al. [32]	2015	The classifier performance is depended on the choice of classifier and weightage of each classifier. The quality of extracted features defines the performance of binary classifier.	1. Number of hidden neurons 2. Testing efficiency		Advantage 1. Improved classification performance 2. Best testing efficiency Disadvantage 1. Geometric features are not recognized
Bayesian Ensemble Classifier	Li, et al. [35]	2013	The high dimensional feature vector is calculated from each JPEG image in a training set. The feature vectors are trained by the sub-classifier and it is integrated to make final decisions.	1. Average computation time 2. detection percentage		Advantage 1. Low computational complexity Disadvantage 1. Increased number of feature vector 2. Required large training set