AttributeError when accessing .logits from BLIP-2 model output during conversion #34704

thisisiron · 2024-11-12T18:12:38Z

System Info

transformers version: 4.47.0.dev0
Platform: Linux-5.15.0-94-generic-x86_64-with-glibc2.35
Python version: 3.10.15
Huggingface_hub version: 0.26.2
Safetensors version: 0.4.5
Accelerate version: 1.1.1
Accelerate config: not found
PyTorch version (GPU?): 2.4.0+cu118 (True)
Tensorflow version (GPU?): 2.15.1 (False)
Flax version (CPU?/GPU?/TPU?): 0.7.0 (cpu)
Jax version: 0.4.13
JaxLib version: 0.4.13
Using distributed or parallel set-up in script?:
Using GPU in script?:
GPU type: NVIDIA H100 80GB HBM3

Who can help?

@amyeroberts, @qubvel

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

When trying to convert BLIP-2 checkpoints using the transformers conversion script(convert_blip_2_original_to_pytorch.py), the following error occurs:

Traceback (most recent call last):
  File "/workspace/image_captioning/eon/transformers/src/transformers/models/blip_2/convert_blip_2_original_to_pytorch.py", line 388, in <module>
    convert_blip2_checkpoint(
  File "/opt/conda/envs/llava/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/workspace/image_captioning/eon/transformers/src/transformers/models/blip_2/convert_blip_2_original_to_pytorch.py", line 298, in convert_blip2_checkpoint
    original_logits = original_model({"image": original_pixel_values, "text_input": [""]}).logits
AttributeError: 'dict' object has no attribute 'logits'

The error occurs because the LAVIS BLIP-2 implementation's forward method only returns a dictionary containing the loss value.

...
    def forward(self, samples):
        image = samples["image"]
        with self.maybe_autocast():
            image_embeds = self.ln_vision(self.visual_encoder(image))
        image_atts = torch.ones(image_embeds.size()[:-1], dtype=torch.long).to(
            image.device
        )

        query_tokens = self.query_tokens.expand(image_embeds.shape[0], -1, -1)
        query_output = self.Qformer.bert(
            query_embeds=query_tokens,
            encoder_hidden_states=image_embeds,
            encoder_attention_mask=image_atts,
            return_dict=True,
        )

        inputs_opt = self.opt_proj(query_output.last_hidden_state)
        atts_opt = torch.ones(inputs_opt.size()[:-1], dtype=torch.long).to(image.device)

        self.opt_tokenizer.padding_side = "right"

        text = [t + "\n" for t in samples["text_input"]]

        opt_tokens = self.opt_tokenizer(
            text,
            return_tensors="pt",
            padding="longest",
            truncation=True,
            max_length=self.max_txt_len,
        ).to(image.device)

        targets = opt_tokens.input_ids.masked_fill(
            opt_tokens.input_ids == self.opt_tokenizer.pad_token_id, -100
        )
        if self.prompt:
            targets[:, : self.prompt_length] = -100  # do not apply loss to the prompt

        empty_targets = (
            torch.ones(atts_opt.size(), dtype=torch.long).to(image.device).fill_(-100)
        )
        targets = torch.cat([empty_targets, targets], dim=1)

        inputs_embeds = self.opt_model.model.decoder.embed_tokens(opt_tokens.input_ids)
        inputs_embeds = torch.cat([inputs_opt, inputs_embeds], dim=1)
        attention_mask = torch.cat([atts_opt, opt_tokens.attention_mask], dim=1)

        with self.maybe_autocast():
            outputs = self.opt_model(
                inputs_embeds=inputs_embeds,
                attention_mask=attention_mask,
                return_dict=True,
                labels=targets,
            )
        loss = outputs.loss

        return {"loss": loss}

Expected behavior

The BLIP-2 model forward pass should return both loss and logits in its output dictionary to be compatible with the transformers conversion script, like this:

return {
    "loss": loss,
    "logits": outputs.logits  # Currently missing in LAVIS implementation
}

The text was updated successfully, but these errors were encountered:

thisisiron added the bug label Nov 12, 2024

thisisiron mentioned this issue Nov 12, 2024

fix: Handle BLIP-2 model output format #34705

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AttributeError when accessing .logits from BLIP-2 model output during conversion #34704

AttributeError when accessing .logits from BLIP-2 model output during conversion #34704

thisisiron commented Nov 12, 2024 •

edited

Loading

AttributeError when accessing .logits from BLIP-2 model output during conversion #34704

AttributeError when accessing .logits from BLIP-2 model output during conversion #34704

Comments

thisisiron commented Nov 12, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

thisisiron commented Nov 12, 2024 •

edited

Loading