- Project Overview
- Features
- Architecture
- Installation
- Datasets
- Models
- Web Interface
- Docker Usage
- API Documentation
- Contributing
- License
- References
This project implements a machine learning-based intrusion detection system for Industrial Control Systems (ICS) in power systems. The system can classify power system events into three categories:
- Normal Operations: Regular power system behavior
- Natural Events: Faults and maintenance activities
- Cyber Attacks: Malicious activities including data injection, command injection, and relay setting changes
The system uses multiple machine learning algorithms to detect and classify cyber attacks in real-time, providing critical security monitoring for industrial power control systems.
- Multi-class Classification: Supports binary (2-class), triple (3-class), and multi-class (37-class) classification
- Real-time Detection: Processes CSV data files for immediate threat assessment
- Multiple ML Algorithms: Implements Random Forest, Neural Networks, K-Nearest Neighbors, Decision Trees, and more
- Web Interface: User-friendly Flask-based web application for easy data upload and analysis
- Docker Support: Containerized deployment for easy setup and deployment
- Feature Engineering: Advanced preprocessing including feature selection and scaling
- Comprehensive Evaluation: Detailed performance metrics and confusion matrices
- Data Preprocessing: Handles missing values, outliers, and feature scaling
- Feature Engineering: Creates domain-specific features for power system measurements
- Model Training: Multiple ML algorithms with hyperparameter optimization
- Prediction Engine: Real-time classification of new data
- Web Interface: User-friendly interface for data upload and results display
- Python 3.8 or higher
- pip package manager
- Git (for cloning the repository)
- Clone the repository:
git clone https://github.com/VictoKu1/IndustrialControlSystemCyberAttackDetectingCourse.git
cd IndustrialControlSystemCyberAttackDetectingCourse
- Upgrade pip:
pip install --upgrade pip
- Install dependencies:
pip install -r requirements.txt
The project requires the following Python packages:
scikit-learn==1.5.0
- Machine learning algorithmspickle-mixin
- Model serializationmatplotlib
- Data visualizationpandas
- Data manipulationnumpy
- Numerical computingflask
- Web framework (for web interface)
Your CSV file should contain power system measurements with columns including:
- Relay measurements (R1-PA:Z, R2-PA:Z, etc.)
- Voltage phase angles (R1-PA1:VH, R1-PA2:VH, etc.)
- Current phase angles (R1-PA4:IH, R1-PA5:IH, etc.)
- Voltage phase magnitudes (R1-PM1:V, R1-PM2:V, etc.)
- Current phase magnitudes (R1-PM4:I, R1-PM5:I, etc.)
- Marker column for classification labels
The project includes three different dataset configurations:
- Location:
Class/binaryAllNaturalPlusNormalVsAttacks/
- Description: 37 event scenarios grouped as either attack (28 events) or normal operations (9 events)
- Use Case: Binary classification for attack detection
- Location:
Class/triple/
- Description: 37 event scenarios grouped into 3 classes:
- Attack events (28 events)
- Natural events (8 events)
- No events (1 event)
- Use Case: Triple classification for detailed event analysis
- Location:
Class/multiclass/
- Description: Each of the 37 event scenarios as its own class
- Use Case: Fine-grained classification for specific event identification
The system implements multiple machine learning algorithms:
- Random Forest Classifier: Ensemble method with high accuracy
- Neural Networks (MLP): Deep learning approach
- K-Nearest Neighbors: Distance-based classification
- Decision Trees: Interpretable tree-based model
- Logistic Regression: Linear classification
- AdaBoost: Boosting ensemble method
- Gradient Boosting: Advanced ensemble technique
- Extra Trees: Extremely randomized trees
- Binary Classification: High accuracy for attack vs. normal detection
- Multi-class Classification: Detailed classification of specific attack types
- Feature Selection: RFECV (Recursive Feature Elimination with Cross-Validation) for optimal feature selection
The web interface provides a user-friendly way to interact with the detection system:
- File Upload: Drag-and-drop or click-to-upload CSV files
- Real-time Processing: Immediate analysis and classification
- Results Display: Clear presentation of classification results
- Session Management: Secure handling of uploaded data
- Build the Docker image:
docker build -t attack_detection_ui
- Run the container:
docker run -it attack_detection_ui
The Dockerfile includes:
- Python 3.8 base image
- All required dependencies
- Automatic repository cloning
- Pre-configured execution environment
For Docker usage, datasets should be accessible via URL in the format:
https://raw.githubusercontent.com/VictoKu1/IndustrialControlSystemCyberAttackDetectingCourse/master/Class/binaryAllNaturalPlusNormalVsAttacks/data1.csv
def calculate(test_df):
"""
Main function for cyber attack detection
Args:
test_df (pandas.DataFrame): Input data with power system measurements
Returns:
list: Classification results for each data point
"""
def preprocess(df):
"""Feature engineering for power system data"""
def vectorize_df(df):
"""Data scaling and encoding"""
def remove_irrelevant_features(df):
"""Clean and prepare data for modeling"""
We welcome contributions to improve the project:
- Fork the repository
- Create a feature branch:
git checkout -b feature-name
- Make your changes and add tests
- Commit your changes:
git commit -am 'Add feature'
- Push to the branch:
git push origin feature-name
- Submit a pull request
For development, install additional dependencies:
pip install jupyter notebook matplotlib seaborn
This project is part of academic research at Ariel University, Israel. Please refer to the university's academic integrity policies for usage guidelines.
- Original Dataset Source: https://sites.google.com/a/uah.edu/tommy-morris-uah/ics-data-sets
Uttam Adhikari, Shengyi Pan, and Tommy Morris in collaboration with Raymond Borges and Justin Beaver of Oak Ridge National Laboratories (ORNL) have created 3 datasets which include measurements related to electric transmission system normal, disturbance, control, cyber attack behaviors. Measurements in the dataset include synchrophasor measurements and data logs from Snort, a simulated control panel, and relays.
The power system datasets have been used for multiple works related to power system cyber-attack classification.
-
Industrial Control System Traffic Datasets For Intrusion Detection Research
-
Measuring the Risk of Cyber Attack in Industrial Control Systems
-
An Ensemble Deep Learning-Based Cyber-Attack Detection in Industrial Control System
Note: This project is designed for educational and research purposes. For production deployment in critical infrastructure, additional security measures and validation should be implemented