This repo focuses on the development, training, evaluation, and deployment evaluation of lightweight deep learning models optimized for resource-constrained embedded devices.
Key features:
- Model Training: Modified MobileNetV2 architecture for small input sizes
- Testing & Evaluation: CPU-based evaluation using accuracy, confusion matrices, classification predictions, and inference latency.
- Post-Training Quantization: Reduces model size and improves inference speed on embedded devices with minimal accuracy loss.
- Dataset: CIFAR-10
Scripts/
│
├── main.py # Entry point for training, testing, quantization[or combined via argument at run time]
├── train.py # Model training module
├── test.py # Evaluation and testing
├── quantize_model.py # Post-training quantization pipeline
├── data_loader.py # Dataset loading
├── model_network.py # Model architecture
├── params_init.py # Model weights initializer
├── plot_log.py # Model architecture
├── helpers/ # utils functions
│ ├── global_import.py # packages
│ └── quantize_helper.py Follow the steps below to set up the required Python environment and install the dependencies.
- Clone this project repo:
git clone git@github.com:Go-Ab1/Split_DL.git- Create and activate a Python virtual environment[Recommended]
python3 -m venv pytorch_env
source pytorch_env/bin/activate - Install dependencies:
# The packages listed are used for testing
# Adjust the PyTorch installation to match your CUDA version.
pip install -r requirements.txtOnce the dependencies are installed, run the scripts to reproduce the experiments.
# ==========================================
# PROJECT USAGE
# (Training, Testing, Quantization )
# ==========================================
# ------------------------------------------
# 1. TRAINING
# ------------------------------------------
# Trains
# - Downloads dataset on first run
# - Logs training/validation loss & accuracy
# - Saves training log: media/
# - Saves trained model: models/
python3 Scripts/main.py train
# Visualize logs after training:
# - Plots loss curves
# - Plots accuracy curves
python3 Scripts/plot_log.py
# ------------------------------------------
# 2. TESTING
# ------------------------------------------
# Evaluates the trained model
# Produces:
# - Overall accuracy
# - Classification
# - Confusion matrix
# - Inference latency (batch & per-image)
python3 Scripts/main.py test
# ------------------------------------------
# 3. POST-TRAINING QUANTIZATION (PTQ)
# ------------------------------------------
# Runs quantization on the trained model.
# Produces:
# - Quantized model
# - Model size comparison
# - Latency comparison (FP32 vs INT8)
# - Accuracy comparison
python3 Scripts/main.py quant
# ------------------------------------------
# 4. CHAINED WORKFLOWS
# ------------------------------------------
# Train → Test
# (Complete pipeline for performance evaluation)
python3 Scripts/main.py train test
# Train → Quantize
# (Complete pipeline for optimizing deployed models)
python3 Scripts/main.py train quantAll parameters can be modified in config.yaml and can be adjusted as required.
data_dir: Dataset
batch_size: 64
num_epochs: 100
learning_rate: 0.01
alpha: 1.0
media_log_dir: media
model_log: training_log.txt
test_log: test_log.txt
val_split: 0.1
num_workers: 4
models_dir: models
compare_models: true
quant_backend: fbgemm
num_runs_latency: 5
comparison_log_name: comparison_log.txt
trained_model_name: final_model.pth
quantized_model_name: quantized_model.pthThe training and validation performance of the model is shown below.
Training/Validation Accuracy
|
Training/Validation Loss
|
Below are example predictions and the confusion matrix for CIFAR-10 test images.
Sample Predictions
|
Confusion Matrix
|
| Metric | Full-Precision Model | Quantized (PTQ) Model | Change |
|---|---|---|---|
| Accuracy | 0.9139 | 0.9125 | ↓ 0.15% |
| Model Size (MB) | 9.2820 | 2.6434 | ↓ 71.52% |
| Latency / Batch(32) (s) | 0.5283 | 0.2276 | ↓ 56.92% |
| Latency / Image (s) | 0.0165 | 0.0071 | ↓ 56.92% |
Nov, 2025!!!
@Goitom
@Goitom



