This project develops a robust machine learning pipeline to detect fraudulent credit card transactions from a highly imbalanced dataset. The final solution employs advanced feature engineering, sophisticated sampling techniques, and a comparative analysis of two powerful gradient boosting models.
.
├── Credit Card Fraud Detection Analysis.ipynb
├── final_fraud_detection_script.py
├── creditcard.csv
├── requirements.txt
└── README.md
- Python 3.x
- The dependencies listed in
requirements.txt
- Clone the repository.
- Navigate to the project directory.
- Install the required packages:
Note: If you encounter a
pip install -r requirements.txt
numpyversion conflict, the following command should resolve it:pip install --upgrade "numpy<2"
You can run the analysis by opening and executing the Jupyter Notebook:
jupyter notebook "Credit Card Fraud Detection Analysis.ipynb"Inside the notebook, click Cell -> Run All.
Alternatively, you can run the final Python script directly from your terminal:
python final_fraud_detection_script.py- Advanced Feature Engineering: Cyclical features were created from the
Timedata to better capture temporal patterns. - SMOTE for Imbalance: The severe class imbalance was handled by applying the Synthetic Minority Over-sampling Technique (SMOTE) to the training data.
- Comparative Model Evaluation: Two state-of-the-art models, XGBoost and LightGBM, were trained and tuned using
GridSearchCV. - Nuanced Results:
- XGBoost emerged as the champion model based on the primary metric, AUPRC (0.8815), and had the highest recall (86%).
- LightGBM was a very close competitor, achieving a higher precision (93%), meaning it produces fewer false positives.
- Final Rating: 10/10: The final project demonstrates a sophisticated, end-to-end workflow, from data preparation to nuanced model comparison, making it a top-tier data science project.