A very Spam Classifer built using TF-IDF Vectorizer and tested on multiple split parameters.
- Two models are compared for performance: Logistic Regression and Random Forest Classifier.
- Both models are properly evaluated with classification reports to show performance based on precision/recall.
- Classification capabilities are also analyzed using feature importances from both models, showing how agressive and conservative each model is in classifying certain words as spam.
- The models have been run twice, both with and without stop words.
| Class | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| ham | 0.97 | 1.00 | 0.99 | 958 |
| spam | 0.98 | 0.83 | 0.90 | 157 |
| Class | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| ham | 0.98 | 1.00 | 0.99 | 958 |
| spam | 1.00 | 0.85 | 0.92 | 157 |