Padding free feature

Padding free seems to be an excellent idea, but how do you prevent the information leakage? Since the batch is flatten, the later item in the batch may attend to the information from previous items, which I don't think it is supposed to in any case, even in pretraining step.
I have not thoroughly test the feature, but I did a finetune on Qwen2.5VL and found out that padding-free will negatively affect the performance, causing the model to hallucinate information that is not even in the image.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Padding free feature #4439

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Padding free feature #4439

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions