Edge-LLM: Optimized Qwen2.5-3B with GPTQ ⚙️🧠

Edge-LLM is a lightweight and efficient deployment of the Qwen2.5-3B model, quantized using GPTQ to enable fast, low-memory inference on edge devices. This project is designed for real-time applications that require high-performance language understanding in constrained environments such as consumer GPUs or embedded systems.

📌 Current Scope (v0.1)

✅ Qwen2.5-3B model integrated and quantized with GPTQ (4-bit precision).
✅ Achieved 66.5% size reduction: 5.75GB ➝ 1.93GB.
✅ Inference speed improved from 7.29s ➝ 5.99s (~18% faster).
✅ Optimized to run on consumer-grade edge GPUs (e.g., RTX 3050).
✅ Benchmarking and logging scripts integrated.

🚧 Under Development

⏳ Hugging Face integration for seamless model download.
⏳ ONNX export and TensorRT optimization.
⏳ Quantization-aware fine-tuning to reduce response drift.
⏳ Comparative analysis: Qwen2.5-3B vs Phi-2 vs LLaMA2-7B on edge.

🔮 Future Roadmap

🤖 Edge Agent with vision+text multimodal capability.
📲 Integration with Android/iOS for mobile inference.
⚡ LoRA fine-tuning pipeline for domain-specific compression.
🧪 Evaluation suite with perplexity, latency, and accuracy tracking.
🛰️ Federated deployment across IoT medical/industrial nodes.

🧱 Tech Stack

Python 3.10+
PyTorch + Transformers
GPTQ (AutoGPTQ)
Hugging Face 🤗 Hub
Git LFS
CUDA 11.8+

📂 Project Structure

Edge-LLM/
├── models/                      # Quantized model repo (git-lfs tracked)
├── scripts/
│   ├── benchmark.py             # Inference benchmarking script
│   └── inference.py            # Lightweight inference API
├── results/                    # Output logs and generated responses
├── requirements.txt            # Dependencies
└── README.md                   # This file

🚀 Setup & Installation

# Clone repo
git clone https://github.com/STiFLeR7/Edge-LLM.git
cd Edge-LLM

# Create virtual environment
python -m venv env
source env/bin/activate  # For Windows: .\env\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Download quantized model
git lfs install
git clone https://huggingface.co/<your-hf-repo> models/Qwen2.5-3B-GPTQ

📊 Benchmark Results

Metric	Pre-Quantization	Post-Quantization
Model Size	5.75 GB	1.93 GB
Inference Time	7.29 s	5.99 s

🏃 Running Inference

python scripts/benchmark.py

Expected Output:

🔹 Generated Response:
Black holes are regions of space where gravity is so strong that nothing, not even light, can escape...
⏳ Inference Time: ~5.99s

🤝 Contributors

Stifler – Researcher & Developer @ NIMS | AI/ML/DL | CudaBit Tech Lead
Open to contributions! If you're passionate about model compression, edge deployment, or LLM optimization, feel free to raise issues or submit PRs.

📜 License

This project is licensed under the MIT License. See LICENSE for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
models		models
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Edge-LLM: Optimized Qwen2.5-3B with GPTQ ⚙️🧠

📌 Current Scope (v0.1)

🚧 Under Development

🔮 Future Roadmap

🧱 Tech Stack

📂 Project Structure

🚀 Setup & Installation

📊 Benchmark Results

🏃 Running Inference

🤝 Contributors

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

STiFLeR7/Edge-LLM

Folders and files

Latest commit

History

Repository files navigation

Edge-LLM: Optimized Qwen2.5-3B with GPTQ ⚙️🧠

📌 Current Scope (v0.1)

🚧 Under Development

🔮 Future Roadmap

🧱 Tech Stack

📂 Project Structure

🚀 Setup & Installation

📊 Benchmark Results

🏃 Running Inference

🤝 Contributors

📜 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages