CNN-based image classification using a ResNet-Inception hybrid model on the TinyImageNet dataset, including extensive hyperparameter tuning and performance comparison
This section focuses on preparing the TinyImageNet-200 dataset for training a CNN classifier. It includes downloading the dataset from Kaggle, parsing and organizing the files, converting them into a standard structure for PyTorch ImageFolder
, and performing initial visualization for verification.
After running the preprocessing script, the dataset is reorganized into a custom folder structure:
custom_split/
├── train/
│ └── class_name/
│ └── images...
├── validation/
│ └── class_name/
│ └── images...
├── test/
│ └── unknown/
└── images...
This format allows efficient loading using torchvision.datasets.ImageFolder
.
- ✅ Download dataset using
kagglehub
- ✅ Parse
words.txt
and map WordNet IDs (WNIDs) to human-readable labels - ✅ Rebuild
val/
directory by splitting images by class usingval_annotations.txt
- ✅ Group and copy all images into a new directory structure at
/content/custom_split
- ✅ Display random sample images for visual verification of each split
Split | Number of Images |
---|---|
Train | 100,000 |
Validation | 10,000 |
Test | 10,000 |
Total | 120,000 |
This structured preprocessing step ensures that:
- The data is properly class-separated and labeled
- The format supports
ImageFolder
-based loading - The dataset is compatible with data augmentation and dataloader batching
- Human-readable class names are available for visualization and analysis
Figure: ResNet-Inception Hybrid Architecture
In this stage, we aimed to find the optimal learning rate (LR) for training our custom ResNet-Inception CNN on the TinyImageNet dataset. We performed experiments under three conditions:
- Model:
ResIncepCNNBase
- Optimizer: SGD with momentum = 0.9
- Loss: CrossEntropyLoss
- Batch size: 64
- Epochs: 30
- Hardware: GPU (CUDA enabled if available)
Experiment | Learning Rate | Scheduler | Description |
---|---|---|---|
1 | 0.01 |
❌ None | Baseline training |
2 | 0.001 |
❌ None | Lower learning rate |
3 | 0.01 |
✅ StepLR (step=5, gamma=0.1) | Learning rate decay applied |
- Train Accuracy improves steadily.
- Validation Accuracy plateaus early.
- Indication of possible overfitting.
- Training is very slow due to low LR.
- Performance is significantly worse than other setups.
- Model struggles to converge.
- Best performance overall.
- LR decay leads to smoother convergence.
- Validation accuracy improves with generalization.
- The best result was obtained with a learning rate of 0.01 using a StepLR scheduler.
- Using a scheduler effectively controls the training dynamics and prevents overfitting.
- This setup will be used as the baseline for the next steps in model tuning.
In this part, we explore the impact of incorporating Dropout and Batch Normalization into the ResIncepCNN model.
- Model:
ResIncepCNNWithBNDropout
- Dropout Probability: 0.5
- Batch Normalization: Applied after the first fully connected layer
- Learning Rate: 0.01
- Weight Decay: 0.0
- Scheduler: StepLR (step size = 5, gamma = 0.1)
- Epochs: 30
- Optimizer: SGD with momentum = 0.9
This model is an extension of the original ResIncepCNNBase
by:
- Integrating BatchNorm1d after the first dense layer (
fc1
) - Applying Dropout (p=0.5) before the final classification layer (
fc2
)
self.bnFc1 = nn.BatchNorm1d(512)
self.dropout = nn.Dropout(p=0.5)
The following figure shows the loss, Top-1 accuracy, and Top-5 accuracy across the 30 training epochs:
- Generalization improved: Dropout helps prevent overfitting by randomly deactivating neurons during training.
- Stabilized training: BatchNorm improves convergence speed and reduces internal covariate shift.
- Higher validation accuracy: Compared to previous settings without regularization, the model achieved better validation Top-1 and Top-5 accuracies.
Applying Dropout and Batch Normalization to our hybrid ResIncepCNN architecture effectively enhances model performance and generalization on the validation set. This setup can be a strong candidate for further experiments or deployment.
This stage enhances the model training by introducing Early Stopping to the best architecture so far (a combined ResNet-Inception CNN enhanced with Dropout and Batch Normalization) to prevent overfitting and reduce unnecessary training epochs.
The model used in this stage is ResIncepCNNWithBNDropout_All
, which includes:
- 📚 Residual Connections (ResNet blocks) for better gradient flow
- 🔍 Multi-scale feature extraction via Inception modules
- 🧪 Batch Normalization for stabilizing the learning process
- 💧 Dropout (p=0.5) for regularization
- 🛑 Early Stopping triggered after 5 consecutive epochs with no improvement in validation loss
Parameter | Value |
---|---|
Optimizer | SGD |
Learning Rate | 0.01 |
Momentum | 0.9 |
Weight Decay | 0.0 |
LR Scheduler | StepLR (γ=0.1, step=5) |
Epochs (max) | 30 |
Early Stopping Patience | 5 epochs |
Batch Size | 64 |
Input Size | 64 × 64 |
The training process was monitored with early stopping criteria. The performance plots for:
- Loss (Train vs Validation)
- Top-1 Accuracy
- **Top-5 Accuracy)
were visualized to track progress and convergence.
📌 Training automatically stopped early once the model failed to improve for 5 consecutive validation epochs.
Early stopping was implemented inside the trainModel()
function. The model stops training once validation loss fails to improve for 5 consecutive epochs, ensuring better generalization and reduced overfitting.
- The model benefits from better generalization and reduced training time.
- Early Stopping avoided overfitting by monitoring validation performance.
- Final accuracy remained competitive with fewer epochs.
The following figure shows the loss, Top-1 accuracy, and Top-5 accuracy with EarlyStopping: