Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XLoRA: training issues, Gradients will be None #2015

Open
benjamin-marie opened this issue Aug 18, 2024 · 10 comments
Open

XLoRA: training issues, Gradients will be None #2015

benjamin-marie opened this issue Aug 18, 2024 · 10 comments

Comments

@benjamin-marie
Copy link

benjamin-marie commented Aug 18, 2024

I installed PEFT from source.
And use the latest versions of Transformers and TRL.
I passed the XLoRA model to TRL but the training doesn't seem to work (training loss doesn't decrease and validation loss remains constant). I get this warning:
UserWarning: None of the inputs have requires_grad=True. Gradients will be None

I load Llama 3.1 (without quantization) and then run this code:

adapters = dict()
adapters["0"] = './adapter1/'
adapters["1"] = './adapter2/'

peft_config = XLoraConfig(
  task_type=TaskType.CAUSAL_LM,
  peft_type=PeftType.XLORA,
  hidden_size=model.config.hidden_size,
  xlora_depth=8,
  adapters=adapters,
  xlora_size=2048,
  layerwise_scalings=True,
  xlora_dropout_p=0.2
)

xlora_model = get_peft_model(model, peft_config)

training_arguments = SFTConfig(
        output_dir="./output/",
        optim="paged_adamw_8bit",
        per_device_train_batch_size=2,
        gradient_accumulation_steps=16,
        save_strategy="epoch",
        log_level="debug",
        logging_steps=1,
        learning_rate=1e-5,
        bf16 = True,
        num_train_epochs=1,
        warmup_ratio=0.1,
        lr_scheduler_type="linear",
        dataset_text_field="text",
        max_seq_length=512,
)

trainer = SFTTrainer(
        model=xlora_model,
        train_dataset=ds,
        tokenizer=tokenizer,
        args=training_arguments,
)

trainer.train()

I also observed another bug: The adapters must be named "0", "1", etc in the adapters dict() otherwise training won't start and will say that the adapters don't exist.

Maybe @EricLBuehler can help with this?

@BenjaminBossan
Copy link
Member

This sounds like the X-LoRA classifier layers don't have requires_grad=True. Could you please print all parameter names with requires_grad=True on your model? What is your base model?

We're still working on a training example for X-LoRA, so it's possible that there are still some kinks that need to be ironed out.

@EricLBuehler
Copy link
Member

@benjamin-marie thanks for the example. I'll take a look this.

I also observed another bug: The adapters must be named "0", "1", etc in the adapters dict() otherwise training won't start and will say that the adapters don't exist.

Hmm ok, thanks for reporting this, I'll see what could be causing it.

@benjamin-marie
Copy link
Author

Here is my model (Llama 3.1 8B):

PeftModelForCausalLM(
  (base_model): XLoraModel(
    (lora_model): LoraModel(
      (model): LlamaForCausalLM(
        (model): LlamaModel(
          (embed_tokens): Embedding(128256, 4096)
          (layers): ModuleList(
            (0-31): 32 x LlamaDecoderLayer(
              (self_attn): LlamaFlashAttention2(
                (q_proj): lora.Linear(
                  (base_layer): Linear(in_features=4096, out_features=4096, bias=False)
                  (lora_dropout): ModuleDict(
                    (0): Dropout(p=0.05, inplace=False)
                    (1): Dropout(p=0.05, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (0): Linear(in_features=4096, out_features=16, bias=False)
                    (1): Linear(in_features=4096, out_features=16, bias=False)
                  )
                  (lora_B): ModuleDict(
                    (0): Linear(in_features=16, out_features=4096, bias=False)
                    (1): Linear(in_features=16, out_features=4096, bias=False)
                  )
                  (lora_embedding_A): ParameterDict()
                  (lora_embedding_B): ParameterDict()
                  (lora_magnitude_vector): ModuleDict()
                )
                (k_proj): lora.Linear(
                  (base_layer): Linear(in_features=4096, out_features=1024, bias=False)
                  (lora_dropout): ModuleDict(
                    (0): Dropout(p=0.05, inplace=False)
                    (1): Dropout(p=0.05, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (0): Linear(in_features=4096, out_features=16, bias=False)
                    (1): Linear(in_features=4096, out_features=16, bias=False)
                  )
                  (lora_B): ModuleDict(
                    (0): Linear(in_features=16, out_features=1024, bias=False)
                    (1): Linear(in_features=16, out_features=1024, bias=False)
                  )
                  (lora_embedding_A): ParameterDict()
                  (lora_embedding_B): ParameterDict()
                  (lora_magnitude_vector): ModuleDict()
                )
                (v_proj): lora.Linear(
                  (base_layer): Linear(in_features=4096, out_features=1024, bias=False)
                  (lora_dropout): ModuleDict(
                    (0): Dropout(p=0.05, inplace=False)
                    (1): Dropout(p=0.05, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (0): Linear(in_features=4096, out_features=16, bias=False)
                    (1): Linear(in_features=4096, out_features=16, bias=False)
                  )
                  (lora_B): ModuleDict(
                    (0): Linear(in_features=16, out_features=1024, bias=False)
                    (1): Linear(in_features=16, out_features=1024, bias=False)
                  )
                  (lora_embedding_A): ParameterDict()
                  (lora_embedding_B): ParameterDict()
                  (lora_magnitude_vector): ModuleDict()
                )
                (o_proj): lora.Linear(
                  (base_layer): Linear(in_features=4096, out_features=4096, bias=False)
                  (lora_dropout): ModuleDict(
                    (0): Dropout(p=0.05, inplace=False)
                    (1): Dropout(p=0.05, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (0): Linear(in_features=4096, out_features=16, bias=False)
                    (1): Linear(in_features=4096, out_features=16, bias=False)
                  )
                  (lora_B): ModuleDict(
                    (0): Linear(in_features=16, out_features=4096, bias=False)
                    (1): Linear(in_features=16, out_features=4096, bias=False)
                  )
                  (lora_embedding_A): ParameterDict()
                  (lora_embedding_B): ParameterDict()
                  (lora_magnitude_vector): ModuleDict()
                )
                (rotary_emb): LlamaRotaryEmbedding()
              )
              (mlp): LlamaMLP(
                (gate_proj): lora.Linear(
                  (base_layer): Linear(in_features=4096, out_features=14336, bias=False)
                  (lora_dropout): ModuleDict(
                    (0): Dropout(p=0.05, inplace=False)
                    (1): Dropout(p=0.05, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (0): Linear(in_features=4096, out_features=16, bias=False)
                    (1): Linear(in_features=4096, out_features=16, bias=False)
                  )
                  (lora_B): ModuleDict(
                    (0): Linear(in_features=16, out_features=14336, bias=False)
                    (1): Linear(in_features=16, out_features=14336, bias=False)
                  )
                  (lora_embedding_A): ParameterDict()
                  (lora_embedding_B): ParameterDict()
                  (lora_magnitude_vector): ModuleDict()
                )
                (up_proj): lora.Linear(
                  (base_layer): Linear(in_features=4096, out_features=14336, bias=False)
                  (lora_dropout): ModuleDict(
                    (0): Dropout(p=0.05, inplace=False)
                    (1): Dropout(p=0.05, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (0): Linear(in_features=4096, out_features=16, bias=False)
                    (1): Linear(in_features=4096, out_features=16, bias=False)
                  )
                  (lora_B): ModuleDict(
                    (0): Linear(in_features=16, out_features=14336, bias=False)
                    (1): Linear(in_features=16, out_features=14336, bias=False)
                  )
                  (lora_embedding_A): ParameterDict()
                  (lora_embedding_B): ParameterDict()
                  (lora_magnitude_vector): ModuleDict()
                )
                (down_proj): lora.Linear(
                  (base_layer): Linear(in_features=14336, out_features=4096, bias=False)
                  (lora_dropout): ModuleDict(
                    (0): Dropout(p=0.05, inplace=False)
                    (1): Dropout(p=0.05, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (0): Linear(in_features=14336, out_features=16, bias=False)
                    (1): Linear(in_features=14336, out_features=16, bias=False)
                  )
                  (lora_B): ModuleDict(
                    (0): Linear(in_features=16, out_features=4096, bias=False)
                    (1): Linear(in_features=16, out_features=4096, bias=False)
                  )
                  (lora_embedding_A): ParameterDict()
                  (lora_embedding_B): ParameterDict()
                  (lora_magnitude_vector): ModuleDict()
                )
                (act_fn): SiLU()
              )
              (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
              (post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
            )
          )
          (norm): LlamaRMSNorm((4096,), eps=1e-05)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (lm_head): Linear(in_features=4096, out_features=128256, bias=False)
      )
    )
    (internal_xlora_classifier): XLoraClassifier(
      (softmax): TemperatureScaledSoftmax(
        (softmax): Softmax(dim=-1)
      )
      (layers): Sequential(
        (0): Linear(in_features=4096, out_features=2048, bias=True)
        (1): ReLU()
        (2): Dropout(p=0.2, inplace=False)
        (3): Linear(in_features=2048, out_features=2048, bias=True)
        (4): ReLU()
        (5): Dropout(p=0.2, inplace=False)
        (6): Linear(in_features=2048, out_features=2048, bias=True)
        (7): ReLU()
        (8): Dropout(p=0.2, inplace=False)
        (9): Linear(in_features=2048, out_features=2048, bias=True)
        (10): ReLU()
        (11): Dropout(p=0.2, inplace=False)
        (12): Linear(in_features=2048, out_features=2048, bias=True)
        (13): ReLU()
        (14): Dropout(p=0.2, inplace=False)
        (15): Linear(in_features=2048, out_features=2048, bias=True)
        (16): ReLU()
        (17): Dropout(p=0.2, inplace=False)
        (18): Linear(in_features=2048, out_features=2048, bias=True)
        (19): ReLU()
        (20): Dropout(p=0.2, inplace=False)
        (21): Linear(in_features=2048, out_features=448, bias=True)
      )
    )
  )
)

Could you please print all parameter names with requires_grad=True on your model?

Sure, how do you do this? None of the params seems to have a "requires_grad" but I'm not sure whether I did it right.

@BenjaminBossan
Copy link
Member

how do you do this

First of all, you can run model.print_trainable_parameters() for a global overview. Then something like this should do:

for name, param in model.named_parameters():
    if param.requires_grad:
        print(name)

@benjamin-marie
Copy link
Author

I added this code:

print(xlora_model.print_trainable_parameters())
print("--- Require grad? ----")
for name, param in model.named_parameters():
    if param.requires_grad:
        print(name)
print("----------------------")

It prints:

trainable params: 118,372,800 || all params: 8,148,634,048 || trainable%: 1.4527
None
--- Require grad? ----
model.layers.0.self_attn.q_proj.lora_A.0.weight
model.layers.0.self_attn.q_proj.lora_A.1.weight
model.layers.0.self_attn.q_proj.lora_B.0.weight
model.layers.0.self_attn.q_proj.lora_B.1.weight
model.layers.0.self_attn.k_proj.lora_A.0.weight
model.layers.0.self_attn.k_proj.lora_A.1.weight
model.layers.0.self_attn.k_proj.lora_B.0.weight
model.layers.0.self_attn.k_proj.lora_B.1.weight
model.layers.0.self_attn.v_proj.lora_A.0.weight
model.layers.0.self_attn.v_proj.lora_A.1.weight
model.layers.0.self_attn.v_proj.lora_B.0.weight
model.layers.0.self_attn.v_proj.lora_B.1.weight
model.layers.0.self_attn.o_proj.lora_A.0.weight
model.layers.0.self_attn.o_proj.lora_A.1.weight
model.layers.0.self_attn.o_proj.lora_B.0.weight
model.layers.0.self_attn.o_proj.lora_B.1.weight
model.layers.0.mlp.gate_proj.lora_A.0.weight
model.layers.0.mlp.gate_proj.lora_A.1.weight
model.layers.0.mlp.gate_proj.lora_B.0.weight
model.layers.0.mlp.gate_proj.lora_B.1.weight
model.layers.0.mlp.up_proj.lora_A.0.weight
model.layers.0.mlp.up_proj.lora_A.1.weight
model.layers.0.mlp.up_proj.lora_B.0.weight
model.layers.0.mlp.up_proj.lora_B.1.weight
model.layers.0.mlp.down_proj.lora_A.0.weight
model.layers.0.mlp.down_proj.lora_A.1.weight
model.layers.0.mlp.down_proj.lora_B.0.weight
model.layers.0.mlp.down_proj.lora_B.1.weight
model.layers.1.self_attn.q_proj.lora_A.0.weight
model.layers.1.self_attn.q_proj.lora_A.1.weight
model.layers.1.self_attn.q_proj.lora_B.0.weight
model.layers.1.self_attn.q_proj.lora_B.1.weight
model.layers.1.self_attn.k_proj.lora_A.0.weight
model.layers.1.self_attn.k_proj.lora_A.1.weight
model.layers.1.self_attn.k_proj.lora_B.0.weight
model.layers.1.self_attn.k_proj.lora_B.1.weight
model.layers.1.self_attn.v_proj.lora_A.0.weight
model.layers.1.self_attn.v_proj.lora_A.1.weight
model.layers.1.self_attn.v_proj.lora_B.0.weight
model.layers.1.self_attn.v_proj.lora_B.1.weight
model.layers.1.self_attn.o_proj.lora_A.0.weight
model.layers.1.self_attn.o_proj.lora_A.1.weight
model.layers.1.self_attn.o_proj.lora_B.0.weight
model.layers.1.self_attn.o_proj.lora_B.1.weight
model.layers.1.mlp.gate_proj.lora_A.0.weight
model.layers.1.mlp.gate_proj.lora_A.1.weight
model.layers.1.mlp.gate_proj.lora_B.0.weight
model.layers.1.mlp.gate_proj.lora_B.1.weight
model.layers.1.mlp.up_proj.lora_A.0.weight
model.layers.1.mlp.up_proj.lora_A.1.weight
model.layers.1.mlp.up_proj.lora_B.0.weight
model.layers.1.mlp.up_proj.lora_B.1.weight
model.layers.1.mlp.down_proj.lora_A.0.weight
model.layers.1.mlp.down_proj.lora_A.1.weight
model.layers.1.mlp.down_proj.lora_B.0.weight
model.layers.1.mlp.down_proj.lora_B.1.weight
model.layers.2.self_attn.q_proj.lora_A.0.weight
model.layers.2.self_attn.q_proj.lora_A.1.weight
model.layers.2.self_attn.q_proj.lora_B.0.weight
model.layers.2.self_attn.q_proj.lora_B.1.weight
model.layers.2.self_attn.k_proj.lora_A.0.weight
model.layers.2.self_attn.k_proj.lora_A.1.weight
model.layers.2.self_attn.k_proj.lora_B.0.weight
model.layers.2.self_attn.k_proj.lora_B.1.weight
model.layers.2.self_attn.v_proj.lora_A.0.weight
model.layers.2.self_attn.v_proj.lora_A.1.weight
model.layers.2.self_attn.v_proj.lora_B.0.weight
model.layers.2.self_attn.v_proj.lora_B.1.weight
model.layers.2.self_attn.o_proj.lora_A.0.weight
model.layers.2.self_attn.o_proj.lora_A.1.weight
model.layers.2.self_attn.o_proj.lora_B.0.weight
model.layers.2.self_attn.o_proj.lora_B.1.weight
model.layers.2.mlp.gate_proj.lora_A.0.weight
model.layers.2.mlp.gate_proj.lora_A.1.weight
model.layers.2.mlp.gate_proj.lora_B.0.weight
model.layers.2.mlp.gate_proj.lora_B.1.weight
model.layers.2.mlp.up_proj.lora_A.0.weight
model.layers.2.mlp.up_proj.lora_A.1.weight
model.layers.2.mlp.up_proj.lora_B.0.weight
model.layers.2.mlp.up_proj.lora_B.1.weight
model.layers.2.mlp.down_proj.lora_A.0.weight
model.layers.2.mlp.down_proj.lora_A.1.weight
model.layers.2.mlp.down_proj.lora_B.0.weight
model.layers.2.mlp.down_proj.lora_B.1.weight
model.layers.3.self_attn.q_proj.lora_A.0.weight
model.layers.3.self_attn.q_proj.lora_A.1.weight
model.layers.3.self_attn.q_proj.lora_B.0.weight
model.layers.3.self_attn.q_proj.lora_B.1.weight
model.layers.3.self_attn.k_proj.lora_A.0.weight
model.layers.3.self_attn.k_proj.lora_A.1.weight
model.layers.3.self_attn.k_proj.lora_B.0.weight
model.layers.3.self_attn.k_proj.lora_B.1.weight
model.layers.3.self_attn.v_proj.lora_A.0.weight
model.layers.3.self_attn.v_proj.lora_A.1.weight
model.layers.3.self_attn.v_proj.lora_B.0.weight
model.layers.3.self_attn.v_proj.lora_B.1.weight
model.layers.3.self_attn.o_proj.lora_A.0.weight
model.layers.3.self_attn.o_proj.lora_A.1.weight
model.layers.3.self_attn.o_proj.lora_B.0.weight
model.layers.3.self_attn.o_proj.lora_B.1.weight
model.layers.3.mlp.gate_proj.lora_A.0.weight
model.layers.3.mlp.gate_proj.lora_A.1.weight
model.layers.3.mlp.gate_proj.lora_B.0.weight
model.layers.3.mlp.gate_proj.lora_B.1.weight
model.layers.3.mlp.up_proj.lora_A.0.weight
model.layers.3.mlp.up_proj.lora_A.1.weight
model.layers.3.mlp.up_proj.lora_B.0.weight
model.layers.3.mlp.up_proj.lora_B.1.weight
model.layers.3.mlp.down_proj.lora_A.0.weight
model.layers.3.mlp.down_proj.lora_A.1.weight
model.layers.3.mlp.down_proj.lora_B.0.weight
model.layers.3.mlp.down_proj.lora_B.1.weight
model.layers.4.self_attn.q_proj.lora_A.0.weight
model.layers.4.self_attn.q_proj.lora_A.1.weight
model.layers.4.self_attn.q_proj.lora_B.0.weight
model.layers.4.self_attn.q_proj.lora_B.1.weight
model.layers.4.self_attn.k_proj.lora_A.0.weight
model.layers.4.self_attn.k_proj.lora_A.1.weight
model.layers.4.self_attn.k_proj.lora_B.0.weight
model.layers.4.self_attn.k_proj.lora_B.1.weight
model.layers.4.self_attn.v_proj.lora_A.0.weight
model.layers.4.self_attn.v_proj.lora_A.1.weight
model.layers.4.self_attn.v_proj.lora_B.0.weight
model.layers.4.self_attn.v_proj.lora_B.1.weight
model.layers.4.self_attn.o_proj.lora_A.0.weight
model.layers.4.self_attn.o_proj.lora_A.1.weight
model.layers.4.self_attn.o_proj.lora_B.0.weight
model.layers.4.self_attn.o_proj.lora_B.1.weight
model.layers.4.mlp.gate_proj.lora_A.0.weight
model.layers.4.mlp.gate_proj.lora_A.1.weight
model.layers.4.mlp.gate_proj.lora_B.0.weight
model.layers.4.mlp.gate_proj.lora_B.1.weight
model.layers.4.mlp.up_proj.lora_A.0.weight
model.layers.4.mlp.up_proj.lora_A.1.weight
model.layers.4.mlp.up_proj.lora_B.0.weight
model.layers.4.mlp.up_proj.lora_B.1.weight
model.layers.4.mlp.down_proj.lora_A.0.weight
model.layers.4.mlp.down_proj.lora_A.1.weight
model.layers.4.mlp.down_proj.lora_B.0.weight
model.layers.4.mlp.down_proj.lora_B.1.weight
model.layers.5.self_attn.q_proj.lora_A.0.weight
model.layers.5.self_attn.q_proj.lora_A.1.weight
model.layers.5.self_attn.q_proj.lora_B.0.weight
model.layers.5.self_attn.q_proj.lora_B.1.weight
model.layers.5.self_attn.k_proj.lora_A.0.weight
model.layers.5.self_attn.k_proj.lora_A.1.weight
model.layers.5.self_attn.k_proj.lora_B.0.weight
model.layers.5.self_attn.k_proj.lora_B.1.weight
model.layers.5.self_attn.v_proj.lora_A.0.weight
model.layers.5.self_attn.v_proj.lora_A.1.weight
model.layers.5.self_attn.v_proj.lora_B.0.weight
model.layers.5.self_attn.v_proj.lora_B.1.weight
model.layers.5.self_attn.o_proj.lora_A.0.weight
model.layers.5.self_attn.o_proj.lora_A.1.weight
model.layers.5.self_attn.o_proj.lora_B.0.weight
model.layers.5.self_attn.o_proj.lora_B.1.weight
model.layers.5.mlp.gate_proj.lora_A.0.weight
model.layers.5.mlp.gate_proj.lora_A.1.weight
model.layers.5.mlp.gate_proj.lora_B.0.weight
model.layers.5.mlp.gate_proj.lora_B.1.weight
model.layers.5.mlp.up_proj.lora_A.0.weight
model.layers.5.mlp.up_proj.lora_A.1.weight
model.layers.5.mlp.up_proj.lora_B.0.weight
model.layers.5.mlp.up_proj.lora_B.1.weight
model.layers.5.mlp.down_proj.lora_A.0.weight
model.layers.5.mlp.down_proj.lora_A.1.weight
model.layers.5.mlp.down_proj.lora_B.0.weight
model.layers.5.mlp.down_proj.lora_B.1.weight
model.layers.6.self_attn.q_proj.lora_A.0.weight
model.layers.6.self_attn.q_proj.lora_A.1.weight
model.layers.6.self_attn.q_proj.lora_B.0.weight
model.layers.6.self_attn.q_proj.lora_B.1.weight
model.layers.6.self_attn.k_proj.lora_A.0.weight
model.layers.6.self_attn.k_proj.lora_A.1.weight
model.layers.6.self_attn.k_proj.lora_B.0.weight
model.layers.6.self_attn.k_proj.lora_B.1.weight
model.layers.6.self_attn.v_proj.lora_A.0.weight
model.layers.6.self_attn.v_proj.lora_A.1.weight
model.layers.6.self_attn.v_proj.lora_B.0.weight
model.layers.6.self_attn.v_proj.lora_B.1.weight
model.layers.6.self_attn.o_proj.lora_A.0.weight
model.layers.6.self_attn.o_proj.lora_A.1.weight
model.layers.6.self_attn.o_proj.lora_B.0.weight
model.layers.6.self_attn.o_proj.lora_B.1.weight
model.layers.6.mlp.gate_proj.lora_A.0.weight
model.layers.6.mlp.gate_proj.lora_A.1.weight
model.layers.6.mlp.gate_proj.lora_B.0.weight
model.layers.6.mlp.gate_proj.lora_B.1.weight
model.layers.6.mlp.up_proj.lora_A.0.weight
model.layers.6.mlp.up_proj.lora_A.1.weight
model.layers.6.mlp.up_proj.lora_B.0.weight
model.layers.6.mlp.up_proj.lora_B.1.weight
model.layers.6.mlp.down_proj.lora_A.0.weight
model.layers.6.mlp.down_proj.lora_A.1.weight
model.layers.6.mlp.down_proj.lora_B.0.weight
model.layers.6.mlp.down_proj.lora_B.1.weight
model.layers.7.self_attn.q_proj.lora_A.0.weight
model.layers.7.self_attn.q_proj.lora_A.1.weight
model.layers.7.self_attn.q_proj.lora_B.0.weight
model.layers.7.self_attn.q_proj.lora_B.1.weight
model.layers.7.self_attn.k_proj.lora_A.0.weight
model.layers.7.self_attn.k_proj.lora_A.1.weight
model.layers.7.self_attn.k_proj.lora_B.0.weight
model.layers.7.self_attn.k_proj.lora_B.1.weight
model.layers.7.self_attn.v_proj.lora_A.0.weight
model.layers.7.self_attn.v_proj.lora_A.1.weight
model.layers.7.self_attn.v_proj.lora_B.0.weight
model.layers.7.self_attn.v_proj.lora_B.1.weight
model.layers.7.self_attn.o_proj.lora_A.0.weight
model.layers.7.self_attn.o_proj.lora_A.1.weight
model.layers.7.self_attn.o_proj.lora_B.0.weight
model.layers.7.self_attn.o_proj.lora_B.1.weight
model.layers.7.mlp.gate_proj.lora_A.0.weight
model.layers.7.mlp.gate_proj.lora_A.1.weight
model.layers.7.mlp.gate_proj.lora_B.0.weight
model.layers.7.mlp.gate_proj.lora_B.1.weight
model.layers.7.mlp.up_proj.lora_A.0.weight
model.layers.7.mlp.up_proj.lora_A.1.weight
model.layers.7.mlp.up_proj.lora_B.0.weight
model.layers.7.mlp.up_proj.lora_B.1.weight
model.layers.7.mlp.down_proj.lora_A.0.weight
model.layers.7.mlp.down_proj.lora_A.1.weight
model.layers.7.mlp.down_proj.lora_B.0.weight
model.layers.7.mlp.down_proj.lora_B.1.weight
model.layers.8.self_attn.q_proj.lora_A.0.weight
model.layers.8.self_attn.q_proj.lora_A.1.weight
model.layers.8.self_attn.q_proj.lora_B.0.weight
model.layers.8.self_attn.q_proj.lora_B.1.weight
model.layers.8.self_attn.k_proj.lora_A.0.weight
model.layers.8.self_attn.k_proj.lora_A.1.weight
model.layers.8.self_attn.k_proj.lora_B.0.weight
model.layers.8.self_attn.k_proj.lora_B.1.weight
model.layers.8.self_attn.v_proj.lora_A.0.weight
model.layers.8.self_attn.v_proj.lora_A.1.weight
model.layers.8.self_attn.v_proj.lora_B.0.weight
model.layers.8.self_attn.v_proj.lora_B.1.weight
model.layers.8.self_attn.o_proj.lora_A.0.weight
model.layers.8.self_attn.o_proj.lora_A.1.weight
model.layers.8.self_attn.o_proj.lora_B.0.weight
model.layers.8.self_attn.o_proj.lora_B.1.weight
model.layers.8.mlp.gate_proj.lora_A.0.weight
model.layers.8.mlp.gate_proj.lora_A.1.weight
model.layers.8.mlp.gate_proj.lora_B.0.weight
model.layers.8.mlp.gate_proj.lora_B.1.weight
model.layers.8.mlp.up_proj.lora_A.0.weight
model.layers.8.mlp.up_proj.lora_A.1.weight
model.layers.8.mlp.up_proj.lora_B.0.weight
model.layers.8.mlp.up_proj.lora_B.1.weight
model.layers.8.mlp.down_proj.lora_A.0.weight
model.layers.8.mlp.down_proj.lora_A.1.weight
model.layers.8.mlp.down_proj.lora_B.0.weight
model.layers.8.mlp.down_proj.lora_B.1.weight
model.layers.9.self_attn.q_proj.lora_A.0.weight
model.layers.9.self_attn.q_proj.lora_A.1.weight
model.layers.9.self_attn.q_proj.lora_B.0.weight
model.layers.9.self_attn.q_proj.lora_B.1.weight
model.layers.9.self_attn.k_proj.lora_A.0.weight
model.layers.9.self_attn.k_proj.lora_A.1.weight
model.layers.9.self_attn.k_proj.lora_B.0.weight
model.layers.9.self_attn.k_proj.lora_B.1.weight
model.layers.9.self_attn.v_proj.lora_A.0.weight
model.layers.9.self_attn.v_proj.lora_A.1.weight
model.layers.9.self_attn.v_proj.lora_B.0.weight
model.layers.9.self_attn.v_proj.lora_B.1.weight
model.layers.9.self_attn.o_proj.lora_A.0.weight
model.layers.9.self_attn.o_proj.lora_A.1.weight
model.layers.9.self_attn.o_proj.lora_B.0.weight
model.layers.9.self_attn.o_proj.lora_B.1.weight
model.layers.9.mlp.gate_proj.lora_A.0.weight
model.layers.9.mlp.gate_proj.lora_A.1.weight
model.layers.9.mlp.gate_proj.lora_B.0.weight
model.layers.9.mlp.gate_proj.lora_B.1.weight
model.layers.9.mlp.up_proj.lora_A.0.weight
model.layers.9.mlp.up_proj.lora_A.1.weight
model.layers.9.mlp.up_proj.lora_B.0.weight
model.layers.9.mlp.up_proj.lora_B.1.weight
model.layers.9.mlp.down_proj.lora_A.0.weight
model.layers.9.mlp.down_proj.lora_A.1.weight
model.layers.9.mlp.down_proj.lora_B.0.weight
model.layers.9.mlp.down_proj.lora_B.1.weight
model.layers.10.self_attn.q_proj.lora_A.0.weight
model.layers.10.self_attn.q_proj.lora_A.1.weight
model.layers.10.self_attn.q_proj.lora_B.0.weight
model.layers.10.self_attn.q_proj.lora_B.1.weight
model.layers.10.self_attn.k_proj.lora_A.0.weight
model.layers.10.self_attn.k_proj.lora_A.1.weight
model.layers.10.self_attn.k_proj.lora_B.0.weight
model.layers.10.self_attn.k_proj.lora_B.1.weight
model.layers.10.self_attn.v_proj.lora_A.0.weight
model.layers.10.self_attn.v_proj.lora_A.1.weight
model.layers.10.self_attn.v_proj.lora_B.0.weight
model.layers.10.self_attn.v_proj.lora_B.1.weight
model.layers.10.self_attn.o_proj.lora_A.0.weight
model.layers.10.self_attn.o_proj.lora_A.1.weight
model.layers.10.self_attn.o_proj.lora_B.0.weight
model.layers.10.self_attn.o_proj.lora_B.1.weight
model.layers.10.mlp.gate_proj.lora_A.0.weight
model.layers.10.mlp.gate_proj.lora_A.1.weight
model.layers.10.mlp.gate_proj.lora_B.0.weight
model.layers.10.mlp.gate_proj.lora_B.1.weight
model.layers.10.mlp.up_proj.lora_A.0.weight
model.layers.10.mlp.up_proj.lora_A.1.weight
model.layers.10.mlp.up_proj.lora_B.0.weight
model.layers.10.mlp.up_proj.lora_B.1.weight
model.layers.10.mlp.down_proj.lora_A.0.weight
model.layers.10.mlp.down_proj.lora_A.1.weight
model.layers.10.mlp.down_proj.lora_B.0.weight
model.layers.10.mlp.down_proj.lora_B.1.weight
model.layers.11.self_attn.q_proj.lora_A.0.weight
model.layers.11.self_attn.q_proj.lora_A.1.weight
model.layers.11.self_attn.q_proj.lora_B.0.weight
model.layers.11.self_attn.q_proj.lora_B.1.weight
model.layers.11.self_attn.k_proj.lora_A.0.weight
model.layers.11.self_attn.k_proj.lora_A.1.weight
model.layers.11.self_attn.k_proj.lora_B.0.weight
model.layers.11.self_attn.k_proj.lora_B.1.weight
model.layers.11.self_attn.v_proj.lora_A.0.weight
model.layers.11.self_attn.v_proj.lora_A.1.weight
model.layers.11.self_attn.v_proj.lora_B.0.weight
model.layers.11.self_attn.v_proj.lora_B.1.weight
model.layers.11.self_attn.o_proj.lora_A.0.weight
model.layers.11.self_attn.o_proj.lora_A.1.weight
model.layers.11.self_attn.o_proj.lora_B.0.weight
model.layers.11.self_attn.o_proj.lora_B.1.weight
model.layers.11.mlp.gate_proj.lora_A.0.weight
model.layers.11.mlp.gate_proj.lora_A.1.weight
model.layers.11.mlp.gate_proj.lora_B.0.weight
model.layers.11.mlp.gate_proj.lora_B.1.weight
model.layers.11.mlp.up_proj.lora_A.0.weight
model.layers.11.mlp.up_proj.lora_A.1.weight
model.layers.11.mlp.up_proj.lora_B.0.weight
model.layers.11.mlp.up_proj.lora_B.1.weight
model.layers.11.mlp.down_proj.lora_A.0.weight
model.layers.11.mlp.down_proj.lora_A.1.weight
model.layers.11.mlp.down_proj.lora_B.0.weight
model.layers.11.mlp.down_proj.lora_B.1.weight
model.layers.12.self_attn.q_proj.lora_A.0.weight
model.layers.12.self_attn.q_proj.lora_A.1.weight
model.layers.12.self_attn.q_proj.lora_B.0.weight
model.layers.12.self_attn.q_proj.lora_B.1.weight
model.layers.12.self_attn.k_proj.lora_A.0.weight
model.layers.12.self_attn.k_proj.lora_A.1.weight
model.layers.12.self_attn.k_proj.lora_B.0.weight
model.layers.12.self_attn.k_proj.lora_B.1.weight
model.layers.12.self_attn.v_proj.lora_A.0.weight
model.layers.12.self_attn.v_proj.lora_A.1.weight
model.layers.12.self_attn.v_proj.lora_B.0.weight
model.layers.12.self_attn.v_proj.lora_B.1.weight
model.layers.12.self_attn.o_proj.lora_A.0.weight
model.layers.12.self_attn.o_proj.lora_A.1.weight
model.layers.12.self_attn.o_proj.lora_B.0.weight
model.layers.12.self_attn.o_proj.lora_B.1.weight
model.layers.12.mlp.gate_proj.lora_A.0.weight
model.layers.12.mlp.gate_proj.lora_A.1.weight
model.layers.12.mlp.gate_proj.lora_B.0.weight
model.layers.12.mlp.gate_proj.lora_B.1.weight
model.layers.12.mlp.up_proj.lora_A.0.weight
model.layers.12.mlp.up_proj.lora_A.1.weight
model.layers.12.mlp.up_proj.lora_B.0.weight
model.layers.12.mlp.up_proj.lora_B.1.weight
model.layers.12.mlp.down_proj.lora_A.0.weight
model.layers.12.mlp.down_proj.lora_A.1.weight
model.layers.12.mlp.down_proj.lora_B.0.weight
model.layers.12.mlp.down_proj.lora_B.1.weight
model.layers.13.self_attn.q_proj.lora_A.0.weight
model.layers.13.self_attn.q_proj.lora_A.1.weight
model.layers.13.self_attn.q_proj.lora_B.0.weight
model.layers.13.self_attn.q_proj.lora_B.1.weight
model.layers.13.self_attn.k_proj.lora_A.0.weight
model.layers.13.self_attn.k_proj.lora_A.1.weight
model.layers.13.self_attn.k_proj.lora_B.0.weight
model.layers.13.self_attn.k_proj.lora_B.1.weight
model.layers.13.self_attn.v_proj.lora_A.0.weight
model.layers.13.self_attn.v_proj.lora_A.1.weight
model.layers.13.self_attn.v_proj.lora_B.0.weight
model.layers.13.self_attn.v_proj.lora_B.1.weight
model.layers.13.self_attn.o_proj.lora_A.0.weight
model.layers.13.self_attn.o_proj.lora_A.1.weight
model.layers.13.self_attn.o_proj.lora_B.0.weight
model.layers.13.self_attn.o_proj.lora_B.1.weight
model.layers.13.mlp.gate_proj.lora_A.0.weight
model.layers.13.mlp.gate_proj.lora_A.1.weight
model.layers.13.mlp.gate_proj.lora_B.0.weight
model.layers.13.mlp.gate_proj.lora_B.1.weight
model.layers.13.mlp.up_proj.lora_A.0.weight
model.layers.13.mlp.up_proj.lora_A.1.weight
model.layers.13.mlp.up_proj.lora_B.0.weight
model.layers.13.mlp.up_proj.lora_B.1.weight
model.layers.13.mlp.down_proj.lora_A.0.weight
model.layers.13.mlp.down_proj.lora_A.1.weight
model.layers.13.mlp.down_proj.lora_B.0.weight
model.layers.13.mlp.down_proj.lora_B.1.weight
model.layers.14.self_attn.q_proj.lora_A.0.weight
model.layers.14.self_attn.q_proj.lora_A.1.weight
model.layers.14.self_attn.q_proj.lora_B.0.weight
model.layers.14.self_attn.q_proj.lora_B.1.weight
model.layers.14.self_attn.k_proj.lora_A.0.weight
model.layers.14.self_attn.k_proj.lora_A.1.weight
model.layers.14.self_attn.k_proj.lora_B.0.weight
model.layers.14.self_attn.k_proj.lora_B.1.weight
model.layers.14.self_attn.v_proj.lora_A.0.weight
model.layers.14.self_attn.v_proj.lora_A.1.weight
model.layers.14.self_attn.v_proj.lora_B.0.weight
model.layers.14.self_attn.v_proj.lora_B.1.weight
model.layers.14.self_attn.o_proj.lora_A.0.weight
model.layers.14.self_attn.o_proj.lora_A.1.weight
model.layers.14.self_attn.o_proj.lora_B.0.weight
model.layers.14.self_attn.o_proj.lora_B.1.weight
model.layers.14.mlp.gate_proj.lora_A.0.weight
model.layers.14.mlp.gate_proj.lora_A.1.weight
model.layers.14.mlp.gate_proj.lora_B.0.weight
model.layers.14.mlp.gate_proj.lora_B.1.weight
model.layers.14.mlp.up_proj.lora_A.0.weight
model.layers.14.mlp.up_proj.lora_A.1.weight
model.layers.14.mlp.up_proj.lora_B.0.weight
model.layers.14.mlp.up_proj.lora_B.1.weight
model.layers.14.mlp.down_proj.lora_A.0.weight
model.layers.14.mlp.down_proj.lora_A.1.weight
model.layers.14.mlp.down_proj.lora_B.0.weight
model.layers.14.mlp.down_proj.lora_B.1.weight
model.layers.15.self_attn.q_proj.lora_A.0.weight
model.layers.15.self_attn.q_proj.lora_A.1.weight
model.layers.15.self_attn.q_proj.lora_B.0.weight
model.layers.15.self_attn.q_proj.lora_B.1.weight
model.layers.15.self_attn.k_proj.lora_A.0.weight
model.layers.15.self_attn.k_proj.lora_A.1.weight
model.layers.15.self_attn.k_proj.lora_B.0.weight
model.layers.15.self_attn.k_proj.lora_B.1.weight
model.layers.15.self_attn.v_proj.lora_A.0.weight
model.layers.15.self_attn.v_proj.lora_A.1.weight
model.layers.15.self_attn.v_proj.lora_B.0.weight
model.layers.15.self_attn.v_proj.lora_B.1.weight
model.layers.15.self_attn.o_proj.lora_A.0.weight
model.layers.15.self_attn.o_proj.lora_A.1.weight
model.layers.15.self_attn.o_proj.lora_B.0.weight
model.layers.15.self_attn.o_proj.lora_B.1.weight
model.layers.15.mlp.gate_proj.lora_A.0.weight
model.layers.15.mlp.gate_proj.lora_A.1.weight
model.layers.15.mlp.gate_proj.lora_B.0.weight
model.layers.15.mlp.gate_proj.lora_B.1.weight
model.layers.15.mlp.up_proj.lora_A.0.weight
model.layers.15.mlp.up_proj.lora_A.1.weight
model.layers.15.mlp.up_proj.lora_B.0.weight
model.layers.15.mlp.up_proj.lora_B.1.weight
model.layers.15.mlp.down_proj.lora_A.0.weight
model.layers.15.mlp.down_proj.lora_A.1.weight
model.layers.15.mlp.down_proj.lora_B.0.weight
model.layers.15.mlp.down_proj.lora_B.1.weight
model.layers.16.self_attn.q_proj.lora_A.0.weight
model.layers.16.self_attn.q_proj.lora_A.1.weight
model.layers.16.self_attn.q_proj.lora_B.0.weight
model.layers.16.self_attn.q_proj.lora_B.1.weight
model.layers.16.self_attn.k_proj.lora_A.0.weight
model.layers.16.self_attn.k_proj.lora_A.1.weight
model.layers.16.self_attn.k_proj.lora_B.0.weight
model.layers.16.self_attn.k_proj.lora_B.1.weight
model.layers.16.self_attn.v_proj.lora_A.0.weight
model.layers.16.self_attn.v_proj.lora_A.1.weight
model.layers.16.self_attn.v_proj.lora_B.0.weight
model.layers.16.self_attn.v_proj.lora_B.1.weight
model.layers.16.self_attn.o_proj.lora_A.0.weight
model.layers.16.self_attn.o_proj.lora_A.1.weight
model.layers.16.self_attn.o_proj.lora_B.0.weight
model.layers.16.self_attn.o_proj.lora_B.1.weight
model.layers.16.mlp.gate_proj.lora_A.0.weight
model.layers.16.mlp.gate_proj.lora_A.1.weight
model.layers.16.mlp.gate_proj.lora_B.0.weight
model.layers.16.mlp.gate_proj.lora_B.1.weight
model.layers.16.mlp.up_proj.lora_A.0.weight
model.layers.16.mlp.up_proj.lora_A.1.weight
model.layers.16.mlp.up_proj.lora_B.0.weight
model.layers.16.mlp.up_proj.lora_B.1.weight
model.layers.16.mlp.down_proj.lora_A.0.weight
model.layers.16.mlp.down_proj.lora_A.1.weight
model.layers.16.mlp.down_proj.lora_B.0.weight
model.layers.16.mlp.down_proj.lora_B.1.weight
model.layers.17.self_attn.q_proj.lora_A.0.weight
model.layers.17.self_attn.q_proj.lora_A.1.weight
model.layers.17.self_attn.q_proj.lora_B.0.weight
model.layers.17.self_attn.q_proj.lora_B.1.weight
model.layers.17.self_attn.k_proj.lora_A.0.weight
model.layers.17.self_attn.k_proj.lora_A.1.weight
model.layers.17.self_attn.k_proj.lora_B.0.weight
model.layers.17.self_attn.k_proj.lora_B.1.weight
model.layers.17.self_attn.v_proj.lora_A.0.weight
model.layers.17.self_attn.v_proj.lora_A.1.weight
model.layers.17.self_attn.v_proj.lora_B.0.weight
model.layers.17.self_attn.v_proj.lora_B.1.weight
model.layers.17.self_attn.o_proj.lora_A.0.weight
model.layers.17.self_attn.o_proj.lora_A.1.weight
model.layers.17.self_attn.o_proj.lora_B.0.weight
model.layers.17.self_attn.o_proj.lora_B.1.weight
model.layers.17.mlp.gate_proj.lora_A.0.weight
model.layers.17.mlp.gate_proj.lora_A.1.weight
model.layers.17.mlp.gate_proj.lora_B.0.weight
model.layers.17.mlp.gate_proj.lora_B.1.weight
model.layers.17.mlp.up_proj.lora_A.0.weight
model.layers.17.mlp.up_proj.lora_A.1.weight
model.layers.17.mlp.up_proj.lora_B.0.weight
model.layers.17.mlp.up_proj.lora_B.1.weight
model.layers.17.mlp.down_proj.lora_A.0.weight
model.layers.17.mlp.down_proj.lora_A.1.weight
model.layers.17.mlp.down_proj.lora_B.0.weight
model.layers.17.mlp.down_proj.lora_B.1.weight
model.layers.18.self_attn.q_proj.lora_A.0.weight
model.layers.18.self_attn.q_proj.lora_A.1.weight
model.layers.18.self_attn.q_proj.lora_B.0.weight
model.layers.18.self_attn.q_proj.lora_B.1.weight
model.layers.18.self_attn.k_proj.lora_A.0.weight
model.layers.18.self_attn.k_proj.lora_A.1.weight
model.layers.18.self_attn.k_proj.lora_B.0.weight
model.layers.18.self_attn.k_proj.lora_B.1.weight
model.layers.18.self_attn.v_proj.lora_A.0.weight
model.layers.18.self_attn.v_proj.lora_A.1.weight
model.layers.18.self_attn.v_proj.lora_B.0.weight
model.layers.18.self_attn.v_proj.lora_B.1.weight
model.layers.18.self_attn.o_proj.lora_A.0.weight
model.layers.18.self_attn.o_proj.lora_A.1.weight
model.layers.18.self_attn.o_proj.lora_B.0.weight
model.layers.18.self_attn.o_proj.lora_B.1.weight
model.layers.18.mlp.gate_proj.lora_A.0.weight
model.layers.18.mlp.gate_proj.lora_A.1.weight
model.layers.18.mlp.gate_proj.lora_B.0.weight
model.layers.18.mlp.gate_proj.lora_B.1.weight
model.layers.18.mlp.up_proj.lora_A.0.weight
model.layers.18.mlp.up_proj.lora_A.1.weight
model.layers.18.mlp.up_proj.lora_B.0.weight
model.layers.18.mlp.up_proj.lora_B.1.weight
model.layers.18.mlp.down_proj.lora_A.0.weight
model.layers.18.mlp.down_proj.lora_A.1.weight
model.layers.18.mlp.down_proj.lora_B.0.weight
model.layers.18.mlp.down_proj.lora_B.1.weight
model.layers.19.self_attn.q_proj.lora_A.0.weight
model.layers.19.self_attn.q_proj.lora_A.1.weight
model.layers.19.self_attn.q_proj.lora_B.0.weight
model.layers.19.self_attn.q_proj.lora_B.1.weight
model.layers.19.self_attn.k_proj.lora_A.0.weight
model.layers.19.self_attn.k_proj.lora_A.1.weight
model.layers.19.self_attn.k_proj.lora_B.0.weight
model.layers.19.self_attn.k_proj.lora_B.1.weight
model.layers.19.self_attn.v_proj.lora_A.0.weight
model.layers.19.self_attn.v_proj.lora_A.1.weight
model.layers.19.self_attn.v_proj.lora_B.0.weight
model.layers.19.self_attn.v_proj.lora_B.1.weight
model.layers.19.self_attn.o_proj.lora_A.0.weight
model.layers.19.self_attn.o_proj.lora_A.1.weight
model.layers.19.self_attn.o_proj.lora_B.0.weight
model.layers.19.self_attn.o_proj.lora_B.1.weight
model.layers.19.mlp.gate_proj.lora_A.0.weight
model.layers.19.mlp.gate_proj.lora_A.1.weight
model.layers.19.mlp.gate_proj.lora_B.0.weight
model.layers.19.mlp.gate_proj.lora_B.1.weight
model.layers.19.mlp.up_proj.lora_A.0.weight
model.layers.19.mlp.up_proj.lora_A.1.weight
model.layers.19.mlp.up_proj.lora_B.0.weight
model.layers.19.mlp.up_proj.lora_B.1.weight
model.layers.19.mlp.down_proj.lora_A.0.weight
model.layers.19.mlp.down_proj.lora_A.1.weight
model.layers.19.mlp.down_proj.lora_B.0.weight
model.layers.19.mlp.down_proj.lora_B.1.weight
model.layers.20.self_attn.q_proj.lora_A.0.weight
model.layers.20.self_attn.q_proj.lora_A.1.weight
model.layers.20.self_attn.q_proj.lora_B.0.weight
model.layers.20.self_attn.q_proj.lora_B.1.weight
model.layers.20.self_attn.k_proj.lora_A.0.weight
model.layers.20.self_attn.k_proj.lora_A.1.weight
model.layers.20.self_attn.k_proj.lora_B.0.weight
model.layers.20.self_attn.k_proj.lora_B.1.weight
model.layers.20.self_attn.v_proj.lora_A.0.weight
model.layers.20.self_attn.v_proj.lora_A.1.weight
model.layers.20.self_attn.v_proj.lora_B.0.weight
model.layers.20.self_attn.v_proj.lora_B.1.weight
model.layers.20.self_attn.o_proj.lora_A.0.weight
model.layers.20.self_attn.o_proj.lora_A.1.weight
model.layers.20.self_attn.o_proj.lora_B.0.weight
model.layers.20.self_attn.o_proj.lora_B.1.weight
model.layers.20.mlp.gate_proj.lora_A.0.weight
model.layers.20.mlp.gate_proj.lora_A.1.weight
model.layers.20.mlp.gate_proj.lora_B.0.weight
model.layers.20.mlp.gate_proj.lora_B.1.weight
model.layers.20.mlp.up_proj.lora_A.0.weight
model.layers.20.mlp.up_proj.lora_A.1.weight
model.layers.20.mlp.up_proj.lora_B.0.weight
model.layers.20.mlp.up_proj.lora_B.1.weight
model.layers.20.mlp.down_proj.lora_A.0.weight
model.layers.20.mlp.down_proj.lora_A.1.weight
model.layers.20.mlp.down_proj.lora_B.0.weight
model.layers.20.mlp.down_proj.lora_B.1.weight
model.layers.21.self_attn.q_proj.lora_A.0.weight
model.layers.21.self_attn.q_proj.lora_A.1.weight
model.layers.21.self_attn.q_proj.lora_B.0.weight
model.layers.21.self_attn.q_proj.lora_B.1.weight
model.layers.21.self_attn.k_proj.lora_A.0.weight
model.layers.21.self_attn.k_proj.lora_A.1.weight
model.layers.21.self_attn.k_proj.lora_B.0.weight
model.layers.21.self_attn.k_proj.lora_B.1.weight
model.layers.21.self_attn.v_proj.lora_A.0.weight
model.layers.21.self_attn.v_proj.lora_A.1.weight
model.layers.21.self_attn.v_proj.lora_B.0.weight
model.layers.21.self_attn.v_proj.lora_B.1.weight
model.layers.21.self_attn.o_proj.lora_A.0.weight
model.layers.21.self_attn.o_proj.lora_A.1.weight
model.layers.21.self_attn.o_proj.lora_B.0.weight
model.layers.21.self_attn.o_proj.lora_B.1.weight
model.layers.21.mlp.gate_proj.lora_A.0.weight
model.layers.21.mlp.gate_proj.lora_A.1.weight
model.layers.21.mlp.gate_proj.lora_B.0.weight
model.layers.21.mlp.gate_proj.lora_B.1.weight
model.layers.21.mlp.up_proj.lora_A.0.weight
model.layers.21.mlp.up_proj.lora_A.1.weight
model.layers.21.mlp.up_proj.lora_B.0.weight
model.layers.21.mlp.up_proj.lora_B.1.weight
model.layers.21.mlp.down_proj.lora_A.0.weight
model.layers.21.mlp.down_proj.lora_A.1.weight
model.layers.21.mlp.down_proj.lora_B.0.weight
model.layers.21.mlp.down_proj.lora_B.1.weight
model.layers.22.self_attn.q_proj.lora_A.0.weight
model.layers.22.self_attn.q_proj.lora_A.1.weight
model.layers.22.self_attn.q_proj.lora_B.0.weight
model.layers.22.self_attn.q_proj.lora_B.1.weight
model.layers.22.self_attn.k_proj.lora_A.0.weight
model.layers.22.self_attn.k_proj.lora_A.1.weight
model.layers.22.self_attn.k_proj.lora_B.0.weight
model.layers.22.self_attn.k_proj.lora_B.1.weight
model.layers.22.self_attn.v_proj.lora_A.0.weight
model.layers.22.self_attn.v_proj.lora_A.1.weight
model.layers.22.self_attn.v_proj.lora_B.0.weight
model.layers.22.self_attn.v_proj.lora_B.1.weight
model.layers.22.self_attn.o_proj.lora_A.0.weight
model.layers.22.self_attn.o_proj.lora_A.1.weight
model.layers.22.self_attn.o_proj.lora_B.0.weight
model.layers.22.self_attn.o_proj.lora_B.1.weight
model.layers.22.mlp.gate_proj.lora_A.0.weight
model.layers.22.mlp.gate_proj.lora_A.1.weight
model.layers.22.mlp.gate_proj.lora_B.0.weight
model.layers.22.mlp.gate_proj.lora_B.1.weight
model.layers.22.mlp.up_proj.lora_A.0.weight
model.layers.22.mlp.up_proj.lora_A.1.weight
model.layers.22.mlp.up_proj.lora_B.0.weight
model.layers.22.mlp.up_proj.lora_B.1.weight
model.layers.22.mlp.down_proj.lora_A.0.weight
model.layers.22.mlp.down_proj.lora_A.1.weight
model.layers.22.mlp.down_proj.lora_B.0.weight
model.layers.22.mlp.down_proj.lora_B.1.weight
model.layers.23.self_attn.q_proj.lora_A.0.weight
model.layers.23.self_attn.q_proj.lora_A.1.weight
model.layers.23.self_attn.q_proj.lora_B.0.weight
model.layers.23.self_attn.q_proj.lora_B.1.weight
model.layers.23.self_attn.k_proj.lora_A.0.weight
model.layers.23.self_attn.k_proj.lora_A.1.weight
model.layers.23.self_attn.k_proj.lora_B.0.weight
model.layers.23.self_attn.k_proj.lora_B.1.weight
model.layers.23.self_attn.v_proj.lora_A.0.weight
model.layers.23.self_attn.v_proj.lora_A.1.weight
model.layers.23.self_attn.v_proj.lora_B.0.weight
model.layers.23.self_attn.v_proj.lora_B.1.weight
model.layers.23.self_attn.o_proj.lora_A.0.weight
model.layers.23.self_attn.o_proj.lora_A.1.weight
model.layers.23.self_attn.o_proj.lora_B.0.weight
model.layers.23.self_attn.o_proj.lora_B.1.weight
model.layers.23.mlp.gate_proj.lora_A.0.weight
model.layers.23.mlp.gate_proj.lora_A.1.weight
model.layers.23.mlp.gate_proj.lora_B.0.weight
model.layers.23.mlp.gate_proj.lora_B.1.weight
model.layers.23.mlp.up_proj.lora_A.0.weight
model.layers.23.mlp.up_proj.lora_A.1.weight
model.layers.23.mlp.up_proj.lora_B.0.weight
model.layers.23.mlp.up_proj.lora_B.1.weight
model.layers.23.mlp.down_proj.lora_A.0.weight
model.layers.23.mlp.down_proj.lora_A.1.weight
model.layers.23.mlp.down_proj.lora_B.0.weight
model.layers.23.mlp.down_proj.lora_B.1.weight
model.layers.24.self_attn.q_proj.lora_A.0.weight
model.layers.24.self_attn.q_proj.lora_A.1.weight
model.layers.24.self_attn.q_proj.lora_B.0.weight
model.layers.24.self_attn.q_proj.lora_B.1.weight
model.layers.24.self_attn.k_proj.lora_A.0.weight
model.layers.24.self_attn.k_proj.lora_A.1.weight
model.layers.24.self_attn.k_proj.lora_B.0.weight
model.layers.24.self_attn.k_proj.lora_B.1.weight
model.layers.24.self_attn.v_proj.lora_A.0.weight
model.layers.24.self_attn.v_proj.lora_A.1.weight
model.layers.24.self_attn.v_proj.lora_B.0.weight
model.layers.24.self_attn.v_proj.lora_B.1.weight
model.layers.24.self_attn.o_proj.lora_A.0.weight
model.layers.24.self_attn.o_proj.lora_A.1.weight
model.layers.24.self_attn.o_proj.lora_B.0.weight
model.layers.24.self_attn.o_proj.lora_B.1.weight
model.layers.24.mlp.gate_proj.lora_A.0.weight
model.layers.24.mlp.gate_proj.lora_A.1.weight
model.layers.24.mlp.gate_proj.lora_B.0.weight
model.layers.24.mlp.gate_proj.lora_B.1.weight
model.layers.24.mlp.up_proj.lora_A.0.weight
model.layers.24.mlp.up_proj.lora_A.1.weight
model.layers.24.mlp.up_proj.lora_B.0.weight
model.layers.24.mlp.up_proj.lora_B.1.weight
model.layers.24.mlp.down_proj.lora_A.0.weight
model.layers.24.mlp.down_proj.lora_A.1.weight
model.layers.24.mlp.down_proj.lora_B.0.weight
model.layers.24.mlp.down_proj.lora_B.1.weight
model.layers.25.self_attn.q_proj.lora_A.0.weight
model.layers.25.self_attn.q_proj.lora_A.1.weight
model.layers.25.self_attn.q_proj.lora_B.0.weight
model.layers.25.self_attn.q_proj.lora_B.1.weight
model.layers.25.self_attn.k_proj.lora_A.0.weight
model.layers.25.self_attn.k_proj.lora_A.1.weight
model.layers.25.self_attn.k_proj.lora_B.0.weight
model.layers.25.self_attn.k_proj.lora_B.1.weight
model.layers.25.self_attn.v_proj.lora_A.0.weight
model.layers.25.self_attn.v_proj.lora_A.1.weight
model.layers.25.self_attn.v_proj.lora_B.0.weight
model.layers.25.self_attn.v_proj.lora_B.1.weight
model.layers.25.self_attn.o_proj.lora_A.0.weight
model.layers.25.self_attn.o_proj.lora_A.1.weight
model.layers.25.self_attn.o_proj.lora_B.0.weight
model.layers.25.self_attn.o_proj.lora_B.1.weight
model.layers.25.mlp.gate_proj.lora_A.0.weight
model.layers.25.mlp.gate_proj.lora_A.1.weight
model.layers.25.mlp.gate_proj.lora_B.0.weight
model.layers.25.mlp.gate_proj.lora_B.1.weight
model.layers.25.mlp.up_proj.lora_A.0.weight
model.layers.25.mlp.up_proj.lora_A.1.weight
model.layers.25.mlp.up_proj.lora_B.0.weight
model.layers.25.mlp.up_proj.lora_B.1.weight
model.layers.25.mlp.down_proj.lora_A.0.weight
model.layers.25.mlp.down_proj.lora_A.1.weight
model.layers.25.mlp.down_proj.lora_B.0.weight
model.layers.25.mlp.down_proj.lora_B.1.weight
model.layers.26.self_attn.q_proj.lora_A.0.weight
model.layers.26.self_attn.q_proj.lora_A.1.weight
model.layers.26.self_attn.q_proj.lora_B.0.weight
model.layers.26.self_attn.q_proj.lora_B.1.weight
model.layers.26.self_attn.k_proj.lora_A.0.weight
model.layers.26.self_attn.k_proj.lora_A.1.weight
model.layers.26.self_attn.k_proj.lora_B.0.weight
model.layers.26.self_attn.k_proj.lora_B.1.weight
model.layers.26.self_attn.v_proj.lora_A.0.weight
model.layers.26.self_attn.v_proj.lora_A.1.weight
model.layers.26.self_attn.v_proj.lora_B.0.weight
model.layers.26.self_attn.v_proj.lora_B.1.weight
model.layers.26.self_attn.o_proj.lora_A.0.weight
model.layers.26.self_attn.o_proj.lora_A.1.weight
model.layers.26.self_attn.o_proj.lora_B.0.weight
model.layers.26.self_attn.o_proj.lora_B.1.weight
model.layers.26.mlp.gate_proj.lora_A.0.weight
model.layers.26.mlp.gate_proj.lora_A.1.weight
model.layers.26.mlp.gate_proj.lora_B.0.weight
model.layers.26.mlp.gate_proj.lora_B.1.weight
model.layers.26.mlp.up_proj.lora_A.0.weight
model.layers.26.mlp.up_proj.lora_A.1.weight
model.layers.26.mlp.up_proj.lora_B.0.weight
model.layers.26.mlp.up_proj.lora_B.1.weight
model.layers.26.mlp.down_proj.lora_A.0.weight
model.layers.26.mlp.down_proj.lora_A.1.weight
model.layers.26.mlp.down_proj.lora_B.0.weight
model.layers.26.mlp.down_proj.lora_B.1.weight
model.layers.27.self_attn.q_proj.lora_A.0.weight
model.layers.27.self_attn.q_proj.lora_A.1.weight
model.layers.27.self_attn.q_proj.lora_B.0.weight
model.layers.27.self_attn.q_proj.lora_B.1.weight
model.layers.27.self_attn.k_proj.lora_A.0.weight
model.layers.27.self_attn.k_proj.lora_A.1.weight
model.layers.27.self_attn.k_proj.lora_B.0.weight
model.layers.27.self_attn.k_proj.lora_B.1.weight
model.layers.27.self_attn.v_proj.lora_A.0.weight
model.layers.27.self_attn.v_proj.lora_A.1.weight
model.layers.27.self_attn.v_proj.lora_B.0.weight
model.layers.27.self_attn.v_proj.lora_B.1.weight
model.layers.27.self_attn.o_proj.lora_A.0.weight
model.layers.27.self_attn.o_proj.lora_A.1.weight
model.layers.27.self_attn.o_proj.lora_B.0.weight
model.layers.27.self_attn.o_proj.lora_B.1.weight
model.layers.27.mlp.gate_proj.lora_A.0.weight
model.layers.27.mlp.gate_proj.lora_A.1.weight
model.layers.27.mlp.gate_proj.lora_B.0.weight
model.layers.27.mlp.gate_proj.lora_B.1.weight
model.layers.27.mlp.up_proj.lora_A.0.weight
model.layers.27.mlp.up_proj.lora_A.1.weight
model.layers.27.mlp.up_proj.lora_B.0.weight
model.layers.27.mlp.up_proj.lora_B.1.weight
model.layers.27.mlp.down_proj.lora_A.0.weight
model.layers.27.mlp.down_proj.lora_A.1.weight
model.layers.27.mlp.down_proj.lora_B.0.weight
model.layers.27.mlp.down_proj.lora_B.1.weight
model.layers.28.self_attn.q_proj.lora_A.0.weight
model.layers.28.self_attn.q_proj.lora_A.1.weight
model.layers.28.self_attn.q_proj.lora_B.0.weight
model.layers.28.self_attn.q_proj.lora_B.1.weight
model.layers.28.self_attn.k_proj.lora_A.0.weight
model.layers.28.self_attn.k_proj.lora_A.1.weight
model.layers.28.self_attn.k_proj.lora_B.0.weight
model.layers.28.self_attn.k_proj.lora_B.1.weight
model.layers.28.self_attn.v_proj.lora_A.0.weight
model.layers.28.self_attn.v_proj.lora_A.1.weight
model.layers.28.self_attn.v_proj.lora_B.0.weight
model.layers.28.self_attn.v_proj.lora_B.1.weight
model.layers.28.self_attn.o_proj.lora_A.0.weight
model.layers.28.self_attn.o_proj.lora_A.1.weight
model.layers.28.self_attn.o_proj.lora_B.0.weight
model.layers.28.self_attn.o_proj.lora_B.1.weight
model.layers.28.mlp.gate_proj.lora_A.0.weight
model.layers.28.mlp.gate_proj.lora_A.1.weight
model.layers.28.mlp.gate_proj.lora_B.0.weight
model.layers.28.mlp.gate_proj.lora_B.1.weight
model.layers.28.mlp.up_proj.lora_A.0.weight
model.layers.28.mlp.up_proj.lora_A.1.weight
model.layers.28.mlp.up_proj.lora_B.0.weight
model.layers.28.mlp.up_proj.lora_B.1.weight
model.layers.28.mlp.down_proj.lora_A.0.weight
model.layers.28.mlp.down_proj.lora_A.1.weight
model.layers.28.mlp.down_proj.lora_B.0.weight
model.layers.28.mlp.down_proj.lora_B.1.weight
model.layers.29.self_attn.q_proj.lora_A.0.weight
model.layers.29.self_attn.q_proj.lora_A.1.weight
model.layers.29.self_attn.q_proj.lora_B.0.weight
model.layers.29.self_attn.q_proj.lora_B.1.weight
model.layers.29.self_attn.k_proj.lora_A.0.weight
model.layers.29.self_attn.k_proj.lora_A.1.weight
model.layers.29.self_attn.k_proj.lora_B.0.weight
model.layers.29.self_attn.k_proj.lora_B.1.weight
model.layers.29.self_attn.v_proj.lora_A.0.weight
model.layers.29.self_attn.v_proj.lora_A.1.weight
model.layers.29.self_attn.v_proj.lora_B.0.weight
model.layers.29.self_attn.v_proj.lora_B.1.weight
model.layers.29.self_attn.o_proj.lora_A.0.weight
model.layers.29.self_attn.o_proj.lora_A.1.weight
model.layers.29.self_attn.o_proj.lora_B.0.weight
model.layers.29.self_attn.o_proj.lora_B.1.weight
model.layers.29.mlp.gate_proj.lora_A.0.weight
model.layers.29.mlp.gate_proj.lora_A.1.weight
model.layers.29.mlp.gate_proj.lora_B.0.weight
model.layers.29.mlp.gate_proj.lora_B.1.weight
model.layers.29.mlp.up_proj.lora_A.0.weight
model.layers.29.mlp.up_proj.lora_A.1.weight
model.layers.29.mlp.up_proj.lora_B.0.weight
model.layers.29.mlp.up_proj.lora_B.1.weight
model.layers.29.mlp.down_proj.lora_A.0.weight
model.layers.29.mlp.down_proj.lora_A.1.weight
model.layers.29.mlp.down_proj.lora_B.0.weight
model.layers.29.mlp.down_proj.lora_B.1.weight
model.layers.30.self_attn.q_proj.lora_A.0.weight
model.layers.30.self_attn.q_proj.lora_A.1.weight
model.layers.30.self_attn.q_proj.lora_B.0.weight
model.layers.30.self_attn.q_proj.lora_B.1.weight
model.layers.30.self_attn.k_proj.lora_A.0.weight
model.layers.30.self_attn.k_proj.lora_A.1.weight
model.layers.30.self_attn.k_proj.lora_B.0.weight
model.layers.30.self_attn.k_proj.lora_B.1.weight
model.layers.30.self_attn.v_proj.lora_A.0.weight
model.layers.30.self_attn.v_proj.lora_A.1.weight
model.layers.30.self_attn.v_proj.lora_B.0.weight
model.layers.30.self_attn.v_proj.lora_B.1.weight
model.layers.30.self_attn.o_proj.lora_A.0.weight
model.layers.30.self_attn.o_proj.lora_A.1.weight
model.layers.30.self_attn.o_proj.lora_B.0.weight
model.layers.30.self_attn.o_proj.lora_B.1.weight
model.layers.30.mlp.gate_proj.lora_A.0.weight
model.layers.30.mlp.gate_proj.lora_A.1.weight
model.layers.30.mlp.gate_proj.lora_B.0.weight
model.layers.30.mlp.gate_proj.lora_B.1.weight
model.layers.30.mlp.up_proj.lora_A.0.weight
model.layers.30.mlp.up_proj.lora_A.1.weight
model.layers.30.mlp.up_proj.lora_B.0.weight
model.layers.30.mlp.up_proj.lora_B.1.weight
model.layers.30.mlp.down_proj.lora_A.0.weight
model.layers.30.mlp.down_proj.lora_A.1.weight
model.layers.30.mlp.down_proj.lora_B.0.weight
model.layers.30.mlp.down_proj.lora_B.1.weight
model.layers.31.self_attn.q_proj.lora_A.0.weight
model.layers.31.self_attn.q_proj.lora_A.1.weight
model.layers.31.self_attn.q_proj.lora_B.0.weight
model.layers.31.self_attn.q_proj.lora_B.1.weight
model.layers.31.self_attn.k_proj.lora_A.0.weight
model.layers.31.self_attn.k_proj.lora_A.1.weight
model.layers.31.self_attn.k_proj.lora_B.0.weight
model.layers.31.self_attn.k_proj.lora_B.1.weight
model.layers.31.self_attn.v_proj.lora_A.0.weight
model.layers.31.self_attn.v_proj.lora_A.1.weight
model.layers.31.self_attn.v_proj.lora_B.0.weight
model.layers.31.self_attn.v_proj.lora_B.1.weight
model.layers.31.self_attn.o_proj.lora_A.0.weight
model.layers.31.self_attn.o_proj.lora_A.1.weight
model.layers.31.self_attn.o_proj.lora_B.0.weight
model.layers.31.self_attn.o_proj.lora_B.1.weight
model.layers.31.mlp.gate_proj.lora_A.0.weight
model.layers.31.mlp.gate_proj.lora_A.1.weight
model.layers.31.mlp.gate_proj.lora_B.0.weight
model.layers.31.mlp.gate_proj.lora_B.1.weight
model.layers.31.mlp.up_proj.lora_A.0.weight
model.layers.31.mlp.up_proj.lora_A.1.weight
model.layers.31.mlp.up_proj.lora_B.0.weight
model.layers.31.mlp.up_proj.lora_B.1.weight
model.layers.31.mlp.down_proj.lora_A.0.weight
model.layers.31.mlp.down_proj.lora_A.1.weight
model.layers.31.mlp.down_proj.lora_B.0.weight
model.layers.31.mlp.down_proj.lora_B.1.weight
----------------------

And then, just after that, I run the SFTTrainer, which prints, exactly:

Using auto half precision backend
Currently training with a batch size of: 2
***** Running training *****
  Num examples = 1,053
  Num Epochs = 1
  Instantaneous batch size per device = 2
  Total train batch size (w. parallel, distributed & accumulation) = 32
  Gradient Accumulation steps = 16
  Total optimization steps = 32
  Number of trainable parameters = 118,372,800
Detected flash_attn version: 2.6.3
/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:91: UserWarning: None of the inputs have requires_grad=True. Gradients will be None

@BenjaminBossan
Copy link
Member

Thanks @benjamin-marie. The internal_xlora_classifier does not appear among the trainable parameters, whereas the LoRAs should be frozen, right @EricLBuehler?

@EricLBuehler
Copy link
Member

Yes, exactly. I'll try to reproduce and fix this!

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@EricLBuehler
Copy link
Member

Not stale

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants