Skip to content

[feat] LoRA #149

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
4 tasks done
jlamypoirier opened this issue Feb 19, 2025 · 2 comments · Fixed by #182 · May be fixed by #180
Closed
4 tasks done

[feat] LoRA #149

jlamypoirier opened this issue Feb 19, 2025 · 2 comments · Fixed by #182 · May be fixed by #180
Assignees
Labels
enhancement New feature or request

Comments

@jlamypoirier
Copy link
Collaborator

jlamypoirier commented Feb 19, 2025

🎯 Goal (What & Why)

Add LoRA (Low-Rank Adaptation) support to Fast-LLM for flexible and memory-efficient fine-tuning.

Motivations:

🚀 Execution Plan

Step 1: What is the smallest working version?

  1. Minimal Integration: Add optional LoRA layers to Wq and Wv of each transformer layer in Fast-LLM.
  2. Configuration Design: Implement a minimal LoraConfig similar to PEFT's LoraConfig, focusing only on the essential parameters:
    • r (int): Lora attention dimension (the "rank").
    • lora_alpha (int): The alpha parameter for Lora scaling.
  3. MVP Approach: Keep the implementation simple:

Step 2: What additional optimizations are possible (later, out-of-scope for now)?

  1. Loading HF LoRA Models: Convert LoRA weights from HF hub to Fast-LLM LoRA weights.
  2. Advanced Configurations: Introduce more advanced LoRA configurations from PEFT's LoreConfig, e.g. to define which weights get LoRA adapters.
  3. Performance Optimization: Improve speed and memory efficiency. We shouldn't over-invest here, because LoRA is fast and memory-efficient by design already.
  4. Support for Complex Architectures: Extend LoRA to support token-switching (Phi-4) and MoEs, supplementing Fast-LLM's existing MoE approach.

📌 Acceptance Criteria (Must-Haves for Completion)

  • LoRA layers must be functional and tested in Fast-LLM.
  • The implementation must include clear documentation explaining the minimal viable setup and configurations.
  • The PR must include a tutorial for LoRA based fine-tuning.
  • The PR must provide a performance/impact summary demonstrating memory savings and fine-tuning flexibility.
  • No refactors unless directly necessary for feature completion.

🛠️ Project Management

  • Assign the project to the Fast-LLM project.
  • Set the Estimate field (in days) in the GitHub project.
  • Use the Size field to categorize the PR size (Small/Medium/Large).
  • Assign an owner when opening the issue.
@jlamypoirier jlamypoirier added the enhancement New feature or request label Feb 19, 2025
@tscholak
Copy link
Collaborator

tscholak commented Mar 7, 2025

Hey @jlamypoirier, checking in on LoRA progress. This was assigned last Tuesday, but I haven't seen updates yet. LoRA is a blocker for multiple upcoming projects, so we need execution now.
Can you share an update on what's done and when you expect it to be completed? Thanks

@jlamypoirier
Copy link
Collaborator Author

I had to address outstanding bugs and maintenance, so could only start today. I am working on a prototype for linear layers, following https://pytorch.org/torchtune/0.3/tutorials/lora_finetune.html. With a bit of luck I'll have something this week. But:

  • There are a bunch of linear implementations, ex. for tensor-parallel and mlp, that need to be adapted for lora (even if we don't use tensor parallel, we still use those implementations). I'll ignore most optional features and performance optimizations for now, but it will still take some work.
  • Implementing LoRA won't be enough, because we create gradient and optimizer buffers for all the weights no matter what. I'll need to find a way to exclude the full weights from the buffers, and this means working on the core of fast-llm which won't be easy. I'll have to think about it more before I can provide an estimate, but it could easily take as long as implementing LoRA itself.

@tscholak tscholak mentioned this issue Mar 10, 2025
20 tasks
@jlamypoirier jlamypoirier linked a pull request Mar 11, 2025 that will close this issue
17 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants