This project predicts customer churn using the Telco Customer Churn dataset and a KNIME-based data science workflow. It was built as part of a graduate Business Analytics course at The University of Texas at Dallas.
- Build an end-to-end churn prediction workflow using KNIME
- Perform data preprocessing, feature engineering, and model training
- Evaluate Logistic Regression, Random Forest, and XGBoost models
- Generate insights and actionable business recommendations
- Source: Kaggle - Telco Customer Churn
- Size: ~7,043 rows
- Features: Demographics, services, billing, contract info
- Target:
Churn
(Yes/No)
Dataset not included due to licensing. Please download it manually from Kaggle.
- KNIME Analytics Platform
- PowerPoint (for reporting)
- Python (for exploratory comparison)
- Data Ingestion – File Reader, Data Explorer
- Preprocessing – One-hot encoding, null handling, Min-Max normalization
- Feature Engineering –
tenure_to_total_charges_ratio
- EDA – Bar Charts, Box Plots, Correlation Matrix
- Modeling – Logistic Regression, Random Forest, XGBoost
- Evaluation – Accuracy, F1 Score, ROC Curve
- Insights – Feature importance and churn trends
Model | F1 Score | Accuracy |
---|---|---|
Logistic Regression | 0.61 | ~74% |
Random Forest | 0.60 | ~72% |
XGBoost | 0.59 | ~73% |
Logistic Regression had the best F1 Score, balancing recall and precision.
- Churn is highest among month-to-month contract users
- Fiber optic customers churn more than DSL users
- Higher monthly charges lead to more churn
- Users without Tech Support or Online Security are at greater risk
A detailed report is available: Report/Telecom Churn Knime Final.pdf
Nipun Chauhan
📍 Dallas, TX
📧 [email protected]
🔗 LinkedIn
This project is open-sourced under the MIT License.