Veetil and Gao, 2013

Real-time Intrusion Detection System by using Hadoop and Naive Bayes Classification

Packets per second, packets per minute

10% KDD intrusion detection dataset, Live network stream packets as training data

1) Snort

2) Tshark

3) D3

1) Increased parallelism due to the Naive Bayes algorithm

2) Using Hadoop-based Naive Bayes algorithm training speed increases implying faster detection rates

3) High detection rate of over 434 network packets per minute

1) This approach compared its performance to a previous approach rather than testing new attacks

1) The technique may not perform well in a distributed environment since its ineffective in a heterogenous cluster

Cepheli, Buyukcorak, and Kurt, 2016

Hybrid Intrusion Detection System (H-IDS) for DDOS attacks

Protocol frequencies, packet sizes, packet inter-arrival times

DARPA 2000 dataset, Real training data from a past penetration test of commercial bank in Turkey

1) Gaussian Mixture Model

2) SNORT

1) Combines the power of anomaly and signature based techniques for a more accurate detection

2) Combining anomaly and rule-based detection reduces detection delays

3) Easily integrates as a module with other IDS

1) Cannot detect complex DDoS attacks

2) Cannot detect attacks internally generated attacks

1) Training data does not reflect real network data implying reduced performance

Singh, Guntuku, Thakur, and Hota, 2014

Using Random Forests for Big Data Analytics in Peer-to-Peer Botnet detection

Packet buffer sizes

CAIDA datasets.

84,030 instances of mixed traffic

1) Hadoop

2) Mahout

3) MapReduce

4) Tshark using Libpcap library

1) Usable for predictive data modeling as Mahout ensures high data accuracy and time efficacy

2) Ease of detecting peer-to-peer attacks due to ability to process high bandwidths in real-time with 30 seconds delay

1) High computational costs due to the use of MapReduce jobs

2) Cannot run with non-distributed classifiers due to the large space required by data and JVM

1) Inability to block traffic from botnets or isolate compromised machines

Korad, Kadam, Deore, Jadhav, and Patil, 2016

Using Hadoop on Live Network to detect DDOS

Packet file sizes and packet pairs

Simulation of Live HTTP GET packet, UDP, TCP, and ICMP packet.

Masked timestamp

1) Hadoop

2) Wireshark

1) Ability to handle and analyze petabytes of data with ease

2) Hadoop clustering help in harnessing the processing power of many computer as one

3) Ease of management and paremeter setting through a web interface

1) Cannot be used to detect internal attacks such as from memory corruption

2) High computational costs from combining multiple nodes

1) Ineffective with few nodes due to the high computational costs

Jia, Ma, Huang, Lin, and Sun, 2016

Novel Real-Time DDoS Attack Detection Mechanism Based on MDRA Algorithm in Big Data

Precision rate, TNR, memory resource, computing complexity, and time cost

Knowledge Discovery and Data Mining (KDD) Cup 1999 data set for training and testing. The data set is real

1) High precision rates of almost 100% for True Negative Rates (TNR)

2) Reduced CPU computation cost

3) Reduced memory consumption compared to MCA based techniques

4) Network DDoS attacks in real-time

1) The technique only depicts abnormal network traffic after it has been predefined

1) Since the approach is theoretical, it may not be possible to ascertain its effectiveness