This repository contains a Jupyter Notebook for analyzing and testing categorical features of the UNSW-NB15 dataset, a benchmark dataset widely used in network security and intrusion detection research.
- Focused on categorical preprocessing and evaluation for the UNSW dataset.
- Explores encoding techniques, feature separation, and classification performance.
- Provides a reproducible framework for testing categorical handling methods on security-related data.
unsw-categorical-final-separate-test.ipynb # Main notebook with preprocessing and experiments
data/ # (Optional) Folder for dataset storage
results/ # (Optional) Folder for experiment outputs
README.md # Project documentation
Make sure you have the following installed before running the notebook:
- Python 3.8+
- Jupyter Notebook or JupyterLab
- Required libraries:
- pandas
- numpy
- scikit-learn
- matplotlib
- seaborn
Install dependencies with:
pip install -r requirements.txt(If no requirements.txt is provided, install the libraries manually.)
-
Clone the repository:
git clone https://github.com/YOUR_USERNAME/unsw-categorical-analysis.git cd unsw-categorical-analysis -
Launch Jupyter Notebook:
jupyter notebook
-
Open
unsw-categorical-final-separate-test.ipynband run the cells in order.
- Data Cleaning β handling missing values, categorical separation.
- Encoding Methods β one-hot encoding, label encoding, frequency encoding.
- Model Training & Testing β supervised learning models on processed categorical features.
- Evaluation β accuracy, precision, recall, and F1-score metrics.
- Insights into the effect of different categorical encoding techniques.
- Model performance benchmarks on the UNSW-NB15 dataset.
- Framework extendable to other cybersecurity datasets.
This project was inspired by the need to better understand how categorical data preprocessing impacts intrusion detection and cybersecurity analytics.
Contributions are welcome! If youβd like to extend the project (e.g., new encoding techniques, model architectures, or visualizations), please fork the repo and submit a PR.
This project is licensed under the MIT License. Feel free to use, modify, and share with attribution.