Researchers | Data Mining | Data Set | Type of Malware | Type of Approach |
Chen et al. [16] | Data Mining Classifiers Algorithms i.e. ANNs (Artificial Neural Networks) i.e. JavaNNS and Symbolic Rule Extraction i.e. J48 classifier | 60 malicious files, 30 belonging to virus group and 30 belonging to worm group. | One family, with a total of 60 malicious samples, 30 each for virus and worm categories. | Extraction of hex sequences from viral and worm malicious files. Multiple sequence alignment using T-Coffee was applied on the extracted hex sequences for data mining process. |
Kumar et al. [44] | Data Mining Classifier Algorithms i.e. IBK (k-nearest neighbours classifier) | Existing dataset: 323 malicious files with a combination of viruses and worms. New upcoming dataset: 323 malicious files with a combination of viruses and worms. | Virus and Worm. | Extraction of hex sequences from viral files and conversion of hex sequences into ASCII sequences. Multiple sequence alignment was applied on the converted ASCII sequences for data mining process. |
Prabha et al. [45] | Data Mining Classifier Algorithms i.e. J48, KNN (K-Nearest Neighbours), Naïve Bayes. | 100 binaries out of which 90 were benign and 10 were malware binaries. | 15 subfamilies, with a total of 1056 malicious viral samples. | Extraction of hex dumps/Extraction of byte sequences in terms of n-grams of different sizes. |