Ellora (Enhancing LLMs with LoRA) is a collection of standardized, high-quality LoRA recipes for enhancing Large Language Model capabilities. Instead of building new frameworks, we focus on creating reproducible training methodologies that work with existing infrastructure.
The LLM ecosystem has amazing infrastructure (LoRAX, PEFT, vLLM), but lacks standardized, high-quality capability adapters. Ellora bridges this gap by providing:
- π Recipes, not frameworks - Reproducible training methodologies
- π― Quality-first approach - Rigorous evaluation and benchmarking
- π Self-supervised data generation - No dependency on external datasets
- ποΈ Infrastructure agnostic - Works with existing tools (PEFT, LoRAX, etc.)
- π Community-driven - Open recipes for the ecosystem
Recipe | Purpose | Key Achievement | Jump to |
---|---|---|---|
#1: Accuracy Recovery | Restore quantized model performance | <5% degradation from FP16 | Details |
#2: Reasoning Enhancement | Add structured thinking with <think> tags |
60% thinking usage, 75% quality boost | Details |
#3: Tool Calling | Enable effective development tool usage | 80% success rate on complex tasks | Details |
#4: Context Extension | Expand from 32K to 2M tokens | 61x context increase for full repos | Details |
Problem: Quantized models (INT4/INT8) lose accuracy compared to FP16 versions
Solution: Self-distillation LoRA adapter using Magpie-generated data
- π― Goal: <5% performance degradation from FP16 baseline
- πΎ Memory: ~75% reduction in model size
- β‘ Speed: 2-3x faster inference than FP16
- π Method: Teacher (FP16) β Student (INT4+LoRA) distillation
Key Innovation: Uses Magpie self-data generation for perfect domain alignment - no external datasets needed!
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
# Load quantized model
model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3-0.6B",
quantization_config=BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4")
)
# Load accuracy recovery adapter
model = PeftModel.from_pretrained(model, "codelion/Qwen3-0.6B-accuracy-recovery-lora")
# Use normally - now with recovered accuracy!
Model | Perplexity | Memory | Speed | Status |
---|---|---|---|---|
FP16 Baseline | 1.97 | 1.0GB | 1.0x | β |
INT4 Raw | 2.40 (+21.8%) | 0.25GB | 3.2x | |
INT4 + Ellora | 2.09 (+5.7%) | 0.28GB | 3.0x | β |
Problem: LLMs often lack structured thinking patterns for complex reasoning
Solution: GRPO-trained adapter that teaches chain-of-thought with <think></think>
tags
- π§ Goal: Enhance reasoning capabilities through preference learning
- π Method: GRPO (Group Relative Policy Optimization) with self-rewarding
- π― Feature: Teaches structured thinking with clear reasoning steps
- π‘ Output: Models that show their reasoning process transparently
Key Innovation: Self-generated preference data with automated quality scoring - no need for human annotations or external preference datasets!
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model
model = AutoModelForCausalLM.from_pretrained("google/gemma-3-1b-it")
tokenizer = AutoTokenizer.from_pretrained("google/gemma-3-1b-it")
# Load reasoning adapter
model = PeftModel.from_pretrained(model, "codelion/gemma-3-1b-it-reasoning-grpo-lora")
# Use with thinking prompt
prompt = '''Think step by step and use <think></think> tags to show your reasoning process.
Problem: If a train travels 120 miles in 2 hours, then increases its speed by 30 mph for the next hour, how many total miles does it travel?
Response:'''
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.2)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
Model | Thinking Usage | Quality Score | Training Method | Status |
---|---|---|---|---|
Gemma-3-1B Base | 0% | 3.2 | - | |
Gemma-3-1B + Ellora | 60% | 5.6 | GRPO | β |
Problem: LLMs struggle with effective tool usage for code exploration
Solution: Hybrid training with Magpie scenarios + real tool execution results
- π οΈ Goal: Teach models to use development tools effectively
- π Method: Generate scenarios with Magpie, execute on real codebases
- π― Feature: OpenAI-compatible function calling format
- π» Tools: File operations, search, code navigation, and more
Key Innovation: Combines synthetic scenario diversity with real execution feedback - ensuring models learn authentic tool usage patterns!
Problem: Base models limited to 32K context, need 2M tokens for large repositories
Solution: Progressive curriculum learning with vLLM + Unsloth hybrid approach
- π Goal: Extend context from 32K to 2M tokens (61x increase)
- π Method: Curriculum learning across 4 stages (32K β 128K β 512K β 2M)
- β‘ Innovation: vLLM for fast data generation, Unsloth for memory-efficient training
- π Feature: Single LoRA adapter progressively learns longer contexts
Key Innovation: Hybrid optimization combining vLLM's inference speed with Unsloth's training efficiency - achieving 61x context extension with minimal compute!
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct")
# Load progressive context adapter
model = PeftModel.from_pretrained(model, "codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora")
# Use with 2M token context - perfect for large repositories!
long_context_prompt = "Analyze this entire repository..." # Up to 2M tokens
inputs = tokenizer(long_context_prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=1024)
Model | Context Limit | Max Files | Use Case | Status |
---|---|---|---|---|
Qwen2.5-Coder Base | 32K tokens | ~10-20 files | Small projects | |
+ Stage 0 LoRA | 32K tokens | ~10-20 files | Single module analysis | β |
+ Stage 1 LoRA | 128K tokens | ~50-100 files | Medium repositories | β |
+ Stage 2 LoRA | 512K tokens | ~200-500 files | Large codebases | β |
+ Stage 3 LoRA | 2M tokens | ~1000+ files | Entire repositories | β |
All models trained using Ellora recipes are available on HuggingFace:
codelion/Qwen3-0.6B-accuracy-recovery-lora
- Accuracy recovery for Qwen3-0.6Bcodelion/gemma-3-1b-it-reasoning-grpo-lora
- Reasoning enhancement for Gemma-3-1Bcodelion/Llama-3.2-1B-Instruct-tool-calling-lora
- Tool calling for Llama-3.2-1Bcodelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora
- 2M context extension for Qwen2.5-Coder-0.5B- More models coming as we test recipes across different model families!
If you use Ellora recipes in your research, please cite:
@misc{ellora2024,
title={Ellora: Enhancing LLMs with LoRA - Standardized Recipes for Capability Enhancement},
author={Asankhaya Sharma},
year={2024},
url={https://github.com/codelion/ellora}
}