[BUG]When I use deepspeed ZeRO3 to train the vision-language-action model ,it met error of loading weights

**Describe the bug**
I want train  vision-language-action model openvla with ZeRO3 and it cant load weight
Loading checkpoint shards: 100%|██████████| 3/3 [00:26<00:00,  7.72s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:26<00:00,  8.99s/it]
[rank0]: Traceback (most recent call last):
[rank0]:   File "/fs1/private/user/chenzengjue/hdp/openvla/vla-scripts/grpo.py", line 136, in <module>
[rank0]:     main(script_args, training_args, model_args)
[rank0]:   File "/fs1/private/user/chenzengjue/hdp/openvla/vla-scripts/grpo.py", line 112, in main
[rank0]:     trainer = trainer_cls(
[rank0]:               ^^^^^^^^^^^^
[rank0]:   File "/fs1/private/user/chenzengjue/hdp/openvla/vla-scripts/grpo_trainer.py", line 186, in __init__
[rank0]:     vla = AutoModelForVision2Seq.from_pretrained(
[rank0]:           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/chenzengjue/anaconda3/envs/r1-v/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 559, in from_pretrained
[rank0]:     return model_class.from_pretrained(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/chenzengjue/anaconda3/envs/r1-v/lib/python3.11/site-packages/transformers/modeling_utils.py", line 262, in _wrapper
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/chenzengjue/anaconda3/envs/r1-v/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4313, in from_pretrained
[rank0]:     ) = cls._load_pretrained_model(
[rank0]:         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/chenzengjue/anaconda3/envs/r1-v/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4949, in _load_pretrained_model
[rank0]:     raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}")
[rank0]: RuntimeError: Error(s) in loading state_dict for OpenVLAForActionPrediction:
[rank0]: 	size mismatch for vision_backbone.featurizer.blocks.0.ls1.scale_factor: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([0]).
[rank0]: 	size mismatch for vision_backbone.featurizer.blocks.0.ls2.scale_factor: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([0]).
[rank0]: 	size mismatch for vision_backbone.featurizer.blocks.1.ls1.scale_factor: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([0]).
[rank0]: 	size mismatch for vision_backbone.featurizer.blocks.1.ls2.scale_factor: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([0]).
[rank0]: 	size mismatch for vision_backbone.featurizer.blocks.2.ls1.scale_factor: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([0]).
[rank0]: 	size mismatch for vision_backbone.featurizer.blocks.2.ls2.scale_factor: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([0]).
[rank0]: 	size mismatch for vision_backbone.featurizer.blocks.3.ls1.scale_factor: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([0]).
[rank0]: 	size mismatch for vision_backbone.featurizer.blocks.3.ls2.scale_factor: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([0]).
[rank0]: 	size mismatch for vision_backbone.featurizer.blocks.4.ls1.scale_factor: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([0]).
[rank0]: 	size mismatch for vision_backbone.featurizer.blocks.4.ls2.scale_factor: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([0]).
[rank0]: 	size mismatch for vision_backbone.featurizer.blocks.5.ls1.scale_factor: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([0]).
[rank0]: 	size mismatch for vision_backbone.featurizer.blocks.5.ls2.scale_factor: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([0]).
[rank0]: 	size mismatch for vision_backbone.featurizer.blocks.6.ls1.scale_factor:

**To Reproduce**
I try to rewrite  the code of GRPO Trainer and also use deepspeed ZeRO3 to train the Openvla model.And follow the original way to load OpenVLA:
        AutoConfig.register("openvla", OpenVLAConfig)
        AutoImageProcessor.register(OpenVLAConfig, PrismaticImageProcessor)
        AutoProcessor.register(OpenVLAConfig, PrismaticProcessor)
        AutoModelForVision2Seq.register(OpenVLAConfig, OpenVLAForActionPrediction)

        # Load OpenVLA Processor and Model using HF AutoClasses
        quantization_config = None
        print(f"Loading model {model}...")
        processing_class = AutoProcessor.from_pretrained('/home/chenzengjue/hdp/openvla/openvla-7b', trust_remote_code=True)
        vla = AutoModelForVision2Seq.from_pretrained(
            '/home/chenzengjue/hdp/openvla/openvla-7b',
            torch_dtype=torch.bfloat16,
            quantization_config=quantization_config,
            # low_cpu_mem_usage=True,
            trust_remote_code=True,
        )
        vla = vla.to(device_id) 

Then it met error in AutoModelForVision2Seq.from_pretrained()

If  we don't use deepspeed ,it will work and the error disappear.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG]When I use deepspeed ZeRO3 to train the vision-language-action model ,it met error of loading weights #7136

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG]When I use deepspeed ZeRO3 to train the vision-language-action model ,it met error of loading weights #7136

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions