Project Title: An Ensemble Framework for Network Intrusions
Team Members
Dr. G. Padmavathi, Dean - PSCS, Professor, Department of Computer Science
Ms. A. Roshni, Research Assistant, Centre for Cyber Intelligence, DST - CURIE - AI
Ms. V. Kanimozhi, Master of Computer Application
Project Summary
The amount of data that moves through a network at any particular time is referred to as network traffic. Data traffic or just plain traffic are other terms for network traffic. The global network traffic analysis market is predicted to increase at a compound yearly growth rate of 9.7% from 2021 to 2028, reaching USD 5.69 billion by 2028, according to a report by Grand View Research. The COVID-19 pandemic outbreak and the accompanying lockdowns and limitations enforced in many parts of the world have had a minor influence on network traffic analysis. Aims: To propose an ensemble learning framework to detect the different attack types. This project deals with the development of supervised machine learning algorithms to detect anomalies in network traffic from the CIC-IDS2018 dataset. Method: The detection of anomalies in network traffic using a supervised machine learning approach comprises five phases. Phase 1 is Data Acquisition. In Phase 2 is the Data Preprocessing method, which transforms the dataset and resamples the majority and minority of attacks on the dataset (CIC-IDS2018). In Phase 3, embedded-based feature selection methods are used to select the important features. In Phase 4, we discuss the supervised machine learning models developed with ensemble methods such as bagging (Random Forest, Decision Tree, Bagging Classifier) and boosting (Adaptive Boosting, Extreme Gradient, Light Gradient Boosting, Histogram-based gradient boosting) and then they are evaluated. The output of different algorithms is evaluated in phase 5 with performance measures such as precision, recall, F1 Score, and accuracy score. It is observed that some models give better accuracy than others, and the entire project is developed on the Python platform. Results: From this proposed system, the best accuracy was obtained using the first method of Random Forest Feature Selection with the Bagging method, the decision tree model was obtained with a 96% accuracy score. In the Boosting method, the Light Gradient Boosting model and the Histogram-based gradient boosting model both have a 96% accuracy score. By using the second method of Gradient Boosting Feature Selection with the Boosting method, the highest accuracy was obtained in the Light Gradient Boosting model and Histogram-based gradient boosting model with 96% accuracy score. These are ensemble methods and models that have better detection rates for multi-class attack classifications.