Edge-LLM is a lightweight and efficient deployment of the Qwen2.5-3B model, quantized using GPTQ to enable fast, low-memory inference on edge devices. This project is designed for real-time applications that require high-performance language understanding in constrained environments such as consumer GPUs or embedded systems.
- ✅ Qwen2.5-3B model integrated and quantized with GPTQ (4-bit precision).
- ✅ Achieved 66.5% size reduction: 5.75GB ➝ 1.93GB.
- ✅ Inference speed improved from 7.29s ➝ 5.99s (~18% faster).
- ✅ Optimized to run on consumer-grade edge GPUs (e.g., RTX 3050).
- ✅ Benchmarking and logging scripts integrated.
- ⏳ Hugging Face integration for seamless model download.
- ⏳ ONNX export and TensorRT optimization.
- ⏳ Quantization-aware fine-tuning to reduce response drift.
- ⏳ Comparative analysis: Qwen2.5-3B vs Phi-2 vs LLaMA2-7B on edge.
- 🤖 Edge Agent with vision+text multimodal capability.
- 📲 Integration with Android/iOS for mobile inference.
- ⚡ LoRA fine-tuning pipeline for domain-specific compression.
- 🧪 Evaluation suite with perplexity, latency, and accuracy tracking.
- 🛰️ Federated deployment across IoT medical/industrial nodes.
- Python 3.10+
- PyTorch + Transformers
- GPTQ (AutoGPTQ)
- Hugging Face 🤗 Hub
- Git LFS
- CUDA 11.8+
Edge-LLM/
├── models/ # Quantized model repo (git-lfs tracked)
├── scripts/
│ ├── benchmark.py # Inference benchmarking script
│ └── inference.py # Lightweight inference API
├── results/ # Output logs and generated responses
├── requirements.txt # Dependencies
└── README.md # This file# Clone repo
git clone https://github.com/STiFLeR7/Edge-LLM.git
cd Edge-LLM
# Create virtual environment
python -m venv env
source env/bin/activate # For Windows: .\env\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Download quantized model
git lfs install
git clone https://huggingface.co/<your-hf-repo> models/Qwen2.5-3B-GPTQ| Metric | Pre-Quantization | Post-Quantization |
|---|---|---|
| Model Size | 5.75 GB | 1.93 GB |
| Inference Time | 7.29 s | 5.99 s |
python scripts/benchmark.pyExpected Output:
🔹 Generated Response:
Black holes are regions of space where gravity is so strong that nothing, not even light, can escape...
⏳ Inference Time: ~5.99s
- Stifler – Researcher & Developer @ NIMS | AI/ML/DL | CudaBit Tech Lead
- Open to contributions! If you're passionate about model compression, edge deployment, or LLM optimization, feel free to raise issues or submit PRs.
This project is licensed under the MIT License. See LICENSE for more details.