Skip to content

hasn77/spam-classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spam Classifier

A very Spam Classifer built using TF-IDF Vectorizer and tested on multiple split parameters.

  • Two models are compared for performance: Logistic Regression and Random Forest Classifier.
  • Both models are properly evaluated with classification reports to show performance based on precision/recall.
  • Classification capabilities are also analyzed using feature importances from both models, showing how agressive and conservative each model is in classifying certain words as spam.
  • The models have been run twice, both with and without stop words.

Logistic Regression Results:

Class Precision Recall F1-Score Support
ham 0.97 1.00 0.99 958
spam 0.98 0.83 0.90 157

TF-IDF + RandomForestClassifier:

Class Precision Recall F1-Score Support
ham 0.98 1.00 0.99 958
spam 1.00 0.85 0.92 157

About

Basic spam classification model using the Kaggle Spam Classifier dataset.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages