Features/Steps | Variable length data mining | Equal length data mining | |
Experiment I | Experiment II | Experiment III | |
Hex to DNA conversion | For the process of pairwise sequence alignment only. | For the processes of data mining and pairwise sequence alignment. | For the processes of multiple sequence alignment, data mining and pairwise sequence alignment. |
Multiple sequence alignment for the process of data mining | No | No | Yes |
Conversion of variable length sequences into equal length sequences | By adding the letter “x” towards the end of each sequence until all the variable length sequences were of equal lengths. | By adding the letter “X” towards the end of each sequence until all the variable length sequences were of equal lengths. | By the process of multiple sequence alignment. All the gaps introduced by the process of alignment were substituted by “X”. |
Total number of attributes for the process of data mining | 24,565 | 49,129 | 93,438 |
Total number of labels for the process of data mining | 17 (hex labels: a - f, 0 - 9 and x) | Five (DNA labels: A, T, G, C and X) | Five (DNA labels: A, T, G, C and X) |
File size of the ARFF file | 2.49 MB | 3.87 MB | 7.38 MB |
Total time taken to generate NNge results by Weka | 2 minutes and 32 seconds | 6 minutes and 13 seconds | 32 minutes and 28 seconds |
Time taken to build model | 0.62 second | 0.73 second | 1.23 seconds |
Correctly classified instances (%)―Accuracy | 22/22 (100.00%) | 0/22 (0.00%) | 22/22 (100.00%) |
Incorrectly classified instances (%)―Inaccuracy | 0/22 (0.00%) | 22/22 (100.00%) | 0/22 (0.00%) |
Kappa statistic | 1 | −1 | 1 |
Mean absolute error | 0 | 1 | 0 |
Root mean squared error | 0 | 1 | 0 |
Relative absolute error (%) | 0.00% | 200.00% | 0.00% |
Root relative squared error (%) | 0.00% | 200.00% | 0.00% |
Total number of instances | 22 | 22 | 22 |
Total number of rules generated | Two (one for malicious class and one for non-malicious class) | Two (one for malicious class and one for non-malicious class) | Three (one for malicious class and two for non-malicious class) |
Sequence lengths of extracted hex/DNA data (first-level consensuses) from NNge rules | Malicious (hex): 123,338 Non-Malicious (hex): 37,249 | Malicious (DNA): 132,103 Non-Malicious (DNA): 41,670 | Malicious (DNA): 161,495 Non-Malicious 1 (DNA): 59,740 Non-Malicious 2 (DNA): 11,860 |
Total number of pairwise alignments performed | Six (three each for malicious and non-malicious classes) | Six (three each for malicious and non-malicious classes) | Nine (three each for malicious, non-malicious 1 and non-malicious 2 classes) |
Total number of meta-signatures (C1HEX, C2HEX) generated | Nine (Four for malicious class and five for non-malicious class) | 14 (Nine for malicious class and five for non-malicious class) | 48 (31 for malicious class, nine for non-malicious class 1 and eight for non-malicious class 2) |
Total number of unique meta-signatures (C1HEX, C2HEX) | Five | Five | 43 |
Total number of common meta-signatures (C1HEX, C2HEX) | Four | Nine | Five |