Skip to content

Padding free feature #4439

Open
Open
@VietDunghacker

Description

@VietDunghacker

Padding free seems to be an excellent idea, but how do you prevent the information leakage? Since the batch is flatten, the later item in the batch may attend to the information from previous items, which I don't think it is supposed to in any case, even in pretraining step.
I have not thoroughly test the feature, but I did a finetune on Qwen2.5VL and found out that padding-free will negatively affect the performance, causing the model to hallucinate information that is not even in the image.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions