Skip to content

codelion/ellora

Repository files navigation

🎯 Ellora: Enhancing LLMs with LoRA

GitHub Models

Ellora (Enhancing LLMs with LoRA) is a collection of standardized, high-quality LoRA recipes for enhancing Large Language Model capabilities. Instead of building new frameworks, we focus on creating reproducible training methodologies that work with existing infrastructure.

🌟 Philosophy

The LLM ecosystem has amazing infrastructure (LoRAX, PEFT, vLLM), but lacks standardized, high-quality capability adapters. Ellora bridges this gap by providing:

  • πŸ“‹ Recipes, not frameworks - Reproducible training methodologies
  • 🎯 Quality-first approach - Rigorous evaluation and benchmarking
  • πŸ”„ Self-supervised data generation - No dependency on external datasets
  • πŸ—οΈ Infrastructure agnostic - Works with existing tools (PEFT, LoRAX, etc.)
  • 🌍 Community-driven - Open recipes for the ecosystem

πŸ“š Recipe Collection

Recipe Purpose Key Achievement Jump to
#1: Accuracy Recovery Restore quantized model performance <5% degradation from FP16 Details
#2: Reasoning Enhancement Add structured thinking with <think> tags 60% thinking usage, 75% quality boost Details
#3: Tool Calling Enable effective development tool usage 80% success rate on complex tasks Details
#4: Context Extension Expand from 32K to 2M tokens 61x context increase for full repos Details

🍳 Available Recipes

Recipe #1: Accuracy Recovery LoRA

Problem: Quantized models (INT4/INT8) lose accuracy compared to FP16 versions
Solution: Self-distillation LoRA adapter using Magpie-generated data

  • 🎯 Goal: <5% performance degradation from FP16 baseline
  • πŸ’Ύ Memory: ~75% reduction in model size
  • ⚑ Speed: 2-3x faster inference than FP16
  • πŸ“Š Method: Teacher (FP16) β†’ Student (INT4+LoRA) distillation

Open In Colab

Key Innovation: Uses Magpie self-data generation for perfect domain alignment - no external datasets needed!

Quick Start

from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel

# Load quantized model
model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-0.6B",
    quantization_config=BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4")
)

# Load accuracy recovery adapter
model = PeftModel.from_pretrained(model, "codelion/Qwen3-0.6B-accuracy-recovery-lora")

# Use normally - now with recovered accuracy!

Results

Model Perplexity Memory Speed Status
FP16 Baseline 1.97 1.0GB 1.0x βœ…
INT4 Raw 2.40 (+21.8%) 0.25GB 3.2x ⚠️
INT4 + Ellora 2.09 (+5.7%) 0.28GB 3.0x βœ…

Recipe #2: Reasoning LoRA with GRPO

Problem: LLMs often lack structured thinking patterns for complex reasoning
Solution: GRPO-trained adapter that teaches chain-of-thought with <think></think> tags

  • 🧠 Goal: Enhance reasoning capabilities through preference learning
  • πŸ“ Method: GRPO (Group Relative Policy Optimization) with self-rewarding
  • 🎯 Feature: Teaches structured thinking with clear reasoning steps
  • πŸ’‘ Output: Models that show their reasoning process transparently

Open In Colab

Key Innovation: Self-generated preference data with automated quality scoring - no need for human annotations or external preference datasets!

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model
model = AutoModelForCausalLM.from_pretrained("google/gemma-3-1b-it")
tokenizer = AutoTokenizer.from_pretrained("google/gemma-3-1b-it")

# Load reasoning adapter
model = PeftModel.from_pretrained(model, "codelion/gemma-3-1b-it-reasoning-grpo-lora")

# Use with thinking prompt
prompt = '''Think step by step and use <think></think> tags to show your reasoning process.

Problem: If a train travels 120 miles in 2 hours, then increases its speed by 30 mph for the next hour, how many total miles does it travel?

Response:'''

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.2)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

Results

Model Thinking Usage Quality Score Training Method Status
Gemma-3-1B Base 0% 3.2 - ⚠️
Gemma-3-1B + Ellora 60% 5.6 GRPO βœ…

Recipe #3: Tool Calling LoRA

Problem: LLMs struggle with effective tool usage for code exploration
Solution: Hybrid training with Magpie scenarios + real tool execution results

  • πŸ› οΈ Goal: Teach models to use development tools effectively
  • πŸ”„ Method: Generate scenarios with Magpie, execute on real codebases
  • 🎯 Feature: OpenAI-compatible function calling format
  • πŸ’» Tools: File operations, search, code navigation, and more

Open In Colab

Key Innovation: Combines synthetic scenario diversity with real execution feedback - ensuring models learn authentic tool usage patterns!

Recipe #4: Progressive Context Extension LoRA

Problem: Base models limited to 32K context, need 2M tokens for large repositories
Solution: Progressive curriculum learning with vLLM + Unsloth hybrid approach

  • πŸ“ˆ Goal: Extend context from 32K to 2M tokens (61x increase)
  • πŸŽ“ Method: Curriculum learning across 4 stages (32K β†’ 128K β†’ 512K β†’ 2M)
  • ⚑ Innovation: vLLM for fast data generation, Unsloth for memory-efficient training
  • πŸ” Feature: Single LoRA adapter progressively learns longer contexts

Open In Colab

Key Innovation: Hybrid optimization combining vLLM's inference speed with Unsloth's training efficiency - achieving 61x context extension with minimal compute!

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct")

# Load progressive context adapter
model = PeftModel.from_pretrained(model, "codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora")

# Use with 2M token context - perfect for large repositories!
long_context_prompt = "Analyze this entire repository..." # Up to 2M tokens
inputs = tokenizer(long_context_prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=1024)

Results

Model Context Limit Max Files Use Case Status
Qwen2.5-Coder Base 32K tokens ~10-20 files Small projects ⚠️
+ Stage 0 LoRA 32K tokens ~10-20 files Single module analysis βœ…
+ Stage 1 LoRA 128K tokens ~50-100 files Medium repositories βœ…
+ Stage 2 LoRA 512K tokens ~200-500 files Large codebases βœ…
+ Stage 3 LoRA 2M tokens ~1000+ files Entire repositories βœ…

πŸ† Model Zoo

All models trained using Ellora recipes are available on HuggingFace:

Models

Featured Models

πŸ”¬ Research & Citations

If you use Ellora recipes in your research, please cite:

@misc{ellora2024,
  title={Ellora: Enhancing LLMs with LoRA - Standardized Recipes for Capability Enhancement},
  author={Asankhaya Sharma},
  year={2024},
  url={https://github.com/codelion/ellora}
}

Key Papers & Inspirations