Features/Steps

Variable length data mining

Equal length data mining

Experiment I

Experiment II

Experiment III

Hex to DNA conversion

For the process of pairwise sequence alignment only.

For the processes of data mining and pairwise sequence alignment.

For the processes of multiple sequence alignment, data mining and pairwise sequence alignment.

Multiple sequence alignment for the process of data mining

No

No

Yes

Conversion of variable length

sequences into equal length

sequences

By adding the letter “x” towards the end of each sequence until all the variable length sequences were of equal lengths.

By adding the letter “X” towards the end of each sequence until all the variable length sequences were of equal lengths.

By the process of multiple sequence alignment. All the gaps introduced by the process of alignment were substituted by “X”.

Total number of attributes for the process of data mining

24,565

49,129

93,438

Total number of labels for the process of data mining

17 (hex labels: a - f, 0 - 9 and x)

Five (DNA labels: A, T, G, C and X)

Five (DNA labels: A, T, G, C and X)

File size of the ARFF file

2.49 MB

3.87 MB

7.38 MB

Total time taken to generate NNge results by Weka

2 minutes and 32 seconds

6 minutes and 13 seconds

32 minutes and 28 seconds

Time taken to build model

0.62 second

0.73 second

1.23 seconds

Correctly classified instances (%)―Accuracy

22/22 (100.00%)

0/22 (0.00%)

22/22 (100.00%)

Incorrectly classified instances (%)―Inaccuracy

0/22 (0.00%)

22/22 (100.00%)

0/22 (0.00%)

Kappa statistic

1

−1

1

Mean absolute error

0

1

0

Root mean squared error

0

1

0

Relative absolute error (%)

0.00%

200.00%

0.00%

Root relative squared error (%)

0.00%

200.00%

0.00%

Total number of instances

22

22

22

Total number of rules generated

Two (one for malicious class and one for non-malicious class)

Two (one for malicious class and one for non-malicious class)

Three (one for malicious class and two for non-malicious class)

Sequence lengths of extracted hex/DNA data (first-level

consensuses) from NNge rules

Malicious (hex): 123,338

Non-Malicious (hex): 37,249

Malicious (DNA): 132,103

Non-Malicious (DNA): 41,670

Malicious (DNA): 161,495

Non-Malicious 1 (DNA): 59,740

Non-Malicious 2 (DNA): 11,860

Total number of pairwise

alignments performed

Six (three each for malicious and non-malicious classes)

Six (three each for malicious and non-malicious classes)

Nine (three each for malicious, non-malicious 1 and non-malicious 2 classes)

Total number of meta-signatures (C1HEX, C2HEX) generated

Nine (Four for malicious class and five for non-malicious class)

14 (Nine for malicious class and five for non-malicious class)

48 (31 for malicious class, nine for non-malicious class 1 and eight for non-malicious class 2)

Total number of unique

meta-signatures (C1HEX, C2HEX)

Five

Five

43

Total number of common

meta-signatures (C1HEX, C2HEX)

Four

Nine

Five