Predicting stroke risk using patient health metrics and machine learning
- Overview
- Features
- Installation
- Usage
- Project Structure
- Results
- Contributing
- License
- Acknowledgements
This project implements a machine learning pipeline to predict the likelihood of a patient having a stroke based on various health parameters. The model helps in early detection of stroke risk, enabling timely medical intervention.
Key Metrics (on test set):
- Accuracy: 95.3%
- Precision: 0.72
- Recall: 0.52
- F1-Score: 0.60
- AUC-ROC: 0.86
- Comprehensive EDA with interactive visualizations
- Feature Engineering with domain-specific transformations
- Multiple ML Models including Random Forest, XGBoost, and LightGBM
- Hyperparameter Tuning using Optuna
- Model Explainability with SHAP values
- Deployment-ready API using FastAPI
-
Clone the repository
git clone https://github.com/theaathish/stroke-prediction.git cd stroke-prediction -
Create and activate virtual environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
-
Download the dataset
- Get the dataset from Kaggle
- Place
healthcare-dataset-stroke-data.csvin thedata/directory
# Preprocess data
python -m src.data_preprocessing
# Train model
python -m src.model
# Start the web app
python -m src.appCheck out the Jupyter notebooks in the notebooks/ directory for detailed analysis and experimentation.
stroke-prediction/
├── data/ # Raw and processed data
│ ├── raw/ # Original dataset
│ └── processed/ # Processed datasets
│
├── notebooks/ # Jupyter notebooks
│ └── Stroke_Prediction_Analysis.ipynb
│
├── src/ # Source code
│ ├── __init__.py
│ ├── data_preprocessing.py
│ ├── feature_engineering.py
│ ├── model.py
│ ├── train.py
│ └── app.py
│
├── models/ # Trained models
│ └── stroke_model.pkl
│
├── reports/ # Reports and visualizations
│ └── figures/
│
├── tests/ # Unit tests
│ └── test_*.py
│
├── .gitignore
├── requirements.txt
└── README.md
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Distributed under the MIT License. See LICENSE for more information.
- Kaggle for the dataset
- Scikit-learn for ML tools
- XGBoost and LightGBM
- SHAP for model interpretability
Developed with ❤️ by @theaathish
📅 August 2025


