⏱️ A machine learning project that predicts flight arrival delays and classifies flights as delayed or on-time based on various delay factors.
- 📉 Regression Modeling using Linear Regression to predict numeric arrival delay
- ✅ Classification Modeling with Decision Tree to detect delayed flights (accuracy: 98%)
- 📊 Visualization Dashboard with scatter plots, feature importance, and confusion matrix
- 🧼 Data cleaning, encoding, and feature engineering for improved model performance
- 🧪 Evaluated using R², MAE, MSE for regression and accuracy/F1-score for classification
Component | Tool/Library |
---|---|
Language | Python 3.10 |
ML Models | LinearRegression, DecisionTreeClassifier |
Data Handling | pandas, NumPy |
Visualization | Matplotlib, Seaborn |
Evaluation Metrics | scikit-learn (MAE, R², accuracy, F1-score) |
git clone https://github.com/akasha456/Flight-Delay-Detection
cd Flight-Delay-Detection
pip install -r requirements.txt
flowchart TD
A[Load Flight Dataset] --> B[Preprocess & Clean Data]
B --> C[Train Regression Model]
B --> D[Train Classification Model]
C --> E[Predict Arrival Delays]
D --> F[Classify Flights as Delayed/On-Time]
E --> G[Evaluate Regression Metrics]
F --> H[Evaluate Accuracy and Confusion Matrix]
G --> I[Visualize Predictions]
H --> I
Metric | Score |
---|---|
R² Score | 0.972 |
MAE | 6.97 |
MSE | 90.12 |
Explained Variance | 0.972 |
Metric | Score |
---|---|
Accuracy | 98% |
Precision | 1.00 (Not Delayed), 0.95 (Delayed) |
Recall | 0.97 (Not Delayed), 1.00 (Delayed) |
F1-Score | 0.98 |
Feature | Importance |
---|---|
NAS_Delay | 0.5882 |
Dep_Delay | 0.4118 |
Others | 0.0000 |
✈️ Integrate live flight data via airline APIs- 📍 Add geographical visualization of delays by airport
- 🧠 Explore ensemble models (Random Forest, XGBoost)
- 🗂️ Summarize delays by day, airline, or region
- 📱 Build a simple UI for user input and results visualization
This project is licensed under the MIT License.
- Scikit-learn for ML algorithms
- Matplotlib and Seaborn for visualizations
- Kaggle for access to flight datasets