Skip to content

A Machine Learning project to detect spam messages using Natural Language Processing (NLP), TF-IDF vectorization, SMOTE for imbalance handling, and a Logistic Regression classifier — all wrapped up in Streamlit web app.

License

Notifications You must be signed in to change notification settings

sarfraspc/spam-detector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spam Detector App

A Machine Learning project to detect spam messages using Natural Language Processing (NLP), TF-IDF vectorization, SMOTE for imbalance handling, and a Logistic Regression classifier — all wrapped up in Streamlit web app.


Model Accuracy: ~99% on test set


Project Structure


project-root/
│
├── data/
│   ├── spam.csv                # Original dataset
│   ├── finalmodel.pkl          # Trained ML model
│   ├── vectorizer.pkl          # Saved TF-IDF vectorizer
│   ├── feature.pkl           
│   └── label.pkl         
│
├── notebook/
│   └── eda.ipynb               # Exploratory Data Analysis
│
├── src/
│   ├── preprocessing.ipynb     # Preprocessing pipeline
│   └── training.ipynb          # Model training, tuning, evaluation
│
├── app.py                      # Streamlit app
└── README.md                   # You are here!


Features

  • Cleans & lemmatizes text
  • TF-IDF vectorization
  • Text length feature
  • SMOTE for handling imbalanced classes
  • Classifies using Logistic Regression
  • Also tested with Multinomial Naive Bayes
  • Built with reusability using joblib
  • Streamlit app for user interaction

Tech Stack

  • Python
  • Pandas, NumPy, Matplotlib, Seaborn
  • scikit-learn, imblearn, nltk
  • Streamlit
  • joblib

Performance

Model Accuracy Precision Recall F1-Score
Logistic Regression 99% 0.99 0.99 0.99
MultinomialNB 96% 0.93 1.00 0.96

How to Run

  1. Clone the repo

    git clone https://github.com/sarfraspc/spam-detector.git
  2. Install requirements

    pip install -r requirements.txt
  3. Run the Streamlit app

    streamlit run app.py

Dataset

  • Source: spam.csv
  • 5572 messages labeled as ham or spam

Author

Sarfras LinkedIn


License

This project is open-source and available under the MIT License.


About

A Machine Learning project to detect spam messages using Natural Language Processing (NLP), TF-IDF vectorization, SMOTE for imbalance handling, and a Logistic Regression classifier — all wrapped up in Streamlit web app.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published