This project trains a binary classification neural network to distinguish between spam and ham (non-spam) SMS messages using TensorFlow and Keras.
The data used is the SMS Spam Collection dataset from FreeCodeCamp, consisting of two .tsv
files:
train-data.tsv
โ training setvalid-data.tsv
โ validation/test set
Each line is formatted as:
<label> <TAB> <message>
Where <label>
is either "ham"
or "spam"
.
bash pip install tensorflow pip install tensorflow-datasets pip install pandas numpy matplotlib
!wget https://cdn.freecodecamp.org/project-data/sms/train-data.tsv
!wget https://cdn.freecodecamp.org/project-data/sms/valid-data.tsv
- Input: Raw SMS text string
- TextVectorization Layer: tokenizes and pads the input
- Embedding Layer: converts tokens to dense vectors (dim=16)
- GlobalAveragePooling1D: reduces sequence
- Dense Layer (24 units, ReLU): learns non-linear combinations
- Dropout (0.5): for regularization
- Output Layer (sigmoid): predicts probability of spam
- Loss: BinaryCrossentropy
- Optimizer: Adam
- Epochs: 20
- Metrics: Accuracy
- Validation Set:
valid-data.tsv
This function accepts a raw string message and returns:
[probability (0โ1), "ham" or "spam"]
predict_message("how are you doing today?")
# Output: [0.0123, "ham"]
def predict_message(pred_text):
input_data = np.array([pred_text], dtype=object)
prediction = model.predict(input_data)
spam_prob = float(prediction[0][0])
label = 'spam' if spam_prob > 0.5 else 'ham'
return [spam_prob, label]
Your model must correctly classify the following test messages:
[
"how are you doing today",
"sale today! to stop texts call 98912460324",
"i dont want to go. can we try it a different day? available sat",
"our new mobile video service is live. just install on your phone to start watching.",
"you have won ยฃ1000 cash! call to claim your prize.",
"i'll bring it tomorrow. don't forget the milk.",
"wow, is your arm alright. that happened to me one time too"
]
All predictions must match expected labels ("ham" or "spam") for success.
https://colab.research.google.com/drive/1s7oLDkVU_kBd1biHSb7nmPi6Coq1gkze?authuser=2#scrollTo=lMHwYXHXCar3
Upon passing the test suite, FreeCodeCamp awards a certificate for this project as part of their Machine Learning curriculum.