This project performs an exploratory data analysis (EDA) on the MNIST dataset using TensorFlow and related libraries such as tensorflow-datasets
, matplotlib
, seaborn
, and pandas
.
The MNIST dataset contains 70,000 grayscale images of handwritten digits (0β9), divided into:
- Training set: 60,000 images
- Test set: 10,000 images
Each image is 28x28 pixels in size.
This script analyzes both the training and test datasets by:
- Plotting class distributions
- Displaying example images
- Checking if the dataset is balanced
To run this code, you need the following libraries installed:
pip install tensorflow tensorflow-datasets matplotlib seaborn pandas numpy
dataset_analysis.py
: Main script to load and analyze the MNIST dataset.README.md
: This file.
A bar plot showing how many samples are present for each digit class.
A 3x3 grid displaying sample images from the dataset along with their corresponding labels.
A heatmap summarizing the count of images per class in a compact format.
To run the analysis:
python dataset_analysis.py
The script will display plots and print statistics about the dataset.
In future commits, we will add:
- A deep learning model built with Keras
- Model training and evaluation
- Model inference and saving/loading functionality
Stay tuned!