Researchers

Data Mining

Data Set

Type of Malware

Type of Approach

Chen et al. [16]

Data Mining Classifiers Algorithms i.e. ANNs (Artificial Neural Networks) i.e.

JavaNNS and Symbolic Rule Extraction i.e. J48 classifier

60 malicious files, 30 belonging to virus group and 30 belonging to worm group.

One family, with a total of 60 malicious samples, 30 each for virus and worm categories.

Extraction of hex sequences from viral and worm malicious files. Multiple sequence alignment using T-Coffee was applied on the extracted hex sequences for data mining process.

Kumar et al. [44]

Data Mining Classifier Algorithms i.e. IBK (k-nearest neighbours classifier)

Existing dataset: 323 malicious files with a combination of viruses and worms.

New upcoming dataset: 323 malicious files with a combination of viruses and worms.

Virus and Worm.

Extraction of hex sequences from viral files and conversion of hex sequences into ASCII sequences.

Multiple sequence alignment was applied on the converted ASCII sequences for data mining process.

Prabha et al. [45]

Data Mining Classifier Algorithms i.e. J48, KNN (K-Nearest Neighbours), Naïve Bayes.

100 binaries out of which 90 were benign and 10 were malware binaries.

15 subfamilies, with a total of 1056 malicious viral samples.

Extraction of hex dumps/Extraction of byte sequences in terms of n-grams of different sizes.