Veetil and Gao, 2013 | Real-time Intrusion Detection System by using Hadoop and Naive Bayes Classification | Packets per second, packets per minute | 10% KDD intrusion detection dataset, Live network stream packets as training data | 1) Snort 2) Tshark 3) D3 | 1) Increased parallelism due to the Naive Bayes algorithm 2) Using Hadoop-based Naive Bayes algorithm training speed increases implying faster detection rates 3) High detection rate of over 434 network packets per minute | 1) This approach compared its performance to a previous approach rather than testing new attacks | 1) The technique may not perform well in a distributed environment since its ineffective in a heterogenous cluster |
Cepheli, Buyukcorak, and Kurt, 2016 | Hybrid Intrusion Detection System (H-IDS) for DDOS attacks | Protocol frequencies, packet sizes, packet inter-arrival times | DARPA 2000 dataset, Real training data from a past penetration test of commercial bank in Turkey | 1) Gaussian Mixture Model 2) SNORT | 1) Combines the power of anomaly and signature based techniques for a more accurate detection 2) Combining anomaly and rule-based detection reduces detection delays 3) Easily integrates as a module with other IDS | 1) Cannot detect complex DDoS attacks 2) Cannot detect attacks internally generated attacks | 1) Training data does not reflect real network data implying reduced performance |
Singh, Guntuku, Thakur, and Hota, 2014 | Using Random Forests for Big Data Analytics in Peer-to-Peer Botnet detection | Packet buffer sizes | CAIDA datasets. 84,030 instances of mixed traffic | 1) Hadoop 2) Mahout 3) MapReduce 4) Tshark using Libpcap library | 1) Usable for predictive data modeling as Mahout ensures high data accuracy and time efficacy 2) Ease of detecting peer-to-peer attacks due to ability to process high bandwidths in real-time with 30 seconds delay | 1) High computational costs due to the use of MapReduce jobs 2) Cannot run with non-distributed classifiers due to the large space required by data and JVM | 1) Inability to block traffic from botnets or isolate compromised machines |
Korad, Kadam, Deore, Jadhav, and Patil, 2016 | Using Hadoop on Live Network to detect DDOS | Packet file sizes and packet pairs | Simulation of Live HTTP GET packet, UDP, TCP, and ICMP packet. Masked timestamp | 1) Hadoop 2) Wireshark | 1) Ability to handle and analyze petabytes of data with ease 2) Hadoop clustering help in harnessing the processing power of many computer as one 3) Ease of management and paremeter setting through a web interface | 1) Cannot be used to detect internal attacks such as from memory corruption 2) High computational costs from combining multiple nodes | 1) Ineffective with few nodes due to the high computational costs |
Jia, Ma, Huang, Lin, and Sun, 2016 | Novel Real-Time DDoS Attack Detection Mechanism Based on MDRA Algorithm in Big Data | Precision rate, TNR, memory resource, computing complexity, and time cost |
| Knowledge Discovery and Data Mining (KDD) Cup 1999 data set for training and testing. The data set is real | 1) High precision rates of almost 100% for True Negative Rates (TNR) 2) Reduced CPU computation cost 3) Reduced memory consumption compared to MCA based techniques 4) Network DDoS attacks in real-time | 1) The technique only depicts abnormal network traffic after it has been predefined | 1) Since the approach is theoretical, it may not be possible to ascertain its effectiveness |