🛍️ Customer Churn Prediction in eCommerce with Google Cloud

Predicting when a customer is likely to stop buying is one of the most critical insights for any subscription-based or transactional business. This project uses real-world eCommerce data to develop a machine learning model capable of identifying churn risk — helping company to take action before losing valuable clients.

🚀 Project Overview

Goal:
Build a machine learning model to predict customer churn and uncover retention insights based on behavioral data from the TheLook eCommerce dataset.

Business Impact:
By identifying customers likely to churn, marketing teams can implement re-engagement strategies and loyalty campaigns, increasing Customer Lifetime Value (CLV) and Revenue Retention.

📦 Dataset: TheLook eCommerce (BigQuery Public Data)

The dataset consists of 7 structured tables related to users, orders, products, inventory, and transactions. Data was extracted using custom SQL queries, merged in Python (Pandas), and cleaned for modeling.

Key features include:

Customer demographics (age, gender, location)
Purchase behavior (order frequency, spend, recency)
Product types and categories
Delivery and return timestamps

🔧 Tools & Techniques

Area	Tools / Methods Used
Data Extraction	SQL on BigQuery
Data Wrangling	Pandas, NumPy
Churn Definition	Recency logic & Kaplan-Meier survival modeling
Exploratory Analysis	Seaborn, Matplotlib, descriptive statistics
Survival Analysis	Kaplan-Meier Estimator (lifelines)
Modeling (optional)	XGBoost
Evaluation	ROC-AUC, Confusion Matrix, Precision-Recall

📊 Key Analyses & Findings

Kaplan-Meier survival curve shows that ~50% of customers never return after their first purchase.
Customers who make a second purchase are far more likely to remain active for extended periods (500+ days).
A fixed churn threshold (e.g. 90 days) may underestimate customer lifetime for loyal buyers — suggesting the need for time-aware churn models.

📁 Folder Structure

├──
│ ├── 01_data preparation.ipynb     -> extraction from BigQuery, Feature Engineering and Dataset Consolidationg
│ ├── 02_model_development.ipynb    -> XGBoost training, parameters optimzation and validation
│ └── 03_model_interpretation.ipynb -> Interpreting model's results, Feature Imporance and SHAP
├── processed_data/
│ └── clients_info.csv
├── model
│ └── churn_model.pkl
├── app.py                       -> Streamlit interface as model's deploy
├── README.md
└── requirements.txt

✅ Skills Demonstrated

SQL data extraction from public cloud datasets
End-to-end churn analysis using Python
Kaplan-Meier and survival modeling
Feature engineering for customer behavior
Business-driven data storytelling and interpretation

Local Deployment with Streamlit

This project includes a web application built using Streamlit, allowing you to interact with the churn prediction model directly from your browser.

▶️ How to Run Locally

Install Dependencies
Make sure you have Python installed (version 3.8 or higher). Then, install the required packages with:
```
pip install -r requirements.txt
```
Run the App
In the root directory of the project, run:
```
streamlit run app.py
```
Access the app
After executing the command above, Streamlit will automatically start a local server and display a URL in your terminal, such as:
```
http://localhost:8501
```

App Features (Technical Overview)

🧠 Model Deployment: Integrates a production-ready XGBoost classification model for churn prediction
🧾 Manual Data Input: Accepts user-defined inputs including age, gender, number_of_orders, and total_spent
🧮 Dynamic Feature Engineering: Automatically computes average_ticket as a derived feature (total_spent / number_of_orders)
📈 Churn Inference: Outputs binary churn prediction (0 = active, 1 = churn) in real time
🧠 Model Explainability: Integrates SHAP (SHapley Additive Explanations) to generate global and local interpretability visualizations
📊 Visual Insights: Includes force plots and summary plots to showcase feature impact on predictions
🚀 End-to-End Pipeline: Demonstrates the full ML lifecycle — from data preprocessing to model inference and explainability — in a single interactive interface

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🛍️ Customer Churn Prediction in eCommerce with Google Cloud

🚀 Project Overview

📦 Dataset: TheLook eCommerce (BigQuery Public Data)

🔧 Tools & Techniques

📊 Key Analyses & Findings

📁 Folder Structure

✅ Skills Demonstrated

Local Deployment with Streamlit

▶️ How to Run Locally

App Features (Technical Overview)

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
model		model
processed_data		processed_data
.gitignore		.gitignore
01_data_preparation.ipynb		01_data_preparation.ipynb
02_model_development.ipynb		02_model_development.ipynb
03_model_interpretation.ipynb		03_model_interpretation.ipynb
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

adanSiqueira/customer-churn-prediction

Folders and files

Latest commit

History

Repository files navigation

🛍️ Customer Churn Prediction in eCommerce with Google Cloud

🚀 Project Overview

📦 Dataset: TheLook eCommerce (BigQuery Public Data)

🔧 Tools & Techniques

📊 Key Analyses & Findings

📁 Folder Structure

✅ Skills Demonstrated

Local Deployment with Streamlit

▶️ How to Run Locally

App Features (Technical Overview)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages