Changes required to save_model
for certain models (e.g., Phi 3.5 Vision)
#34690
Labels
Feature request
Request for a new feature
Feature request
This request proposes one of three changes (see Motivation for background, and Your contribution more thoughts on possible solutions) in order to allow saving of a certain class of models, including but not limited to Phi 3.5 Vision.
state_dict
argument in theTrainer
class'ssave_model()
method (https://github.com/huggingface/transformers/blob/main/src/transformers/trainer.py#L3719-L3768). Thisstate_dict
parameter should then be passed down to the call to the private_save()
method (https://github.com/huggingface/transformers/blob/main/src/transformers/trainer.py#L3842), which does accept astate_dict
argument.state_dict
as an argument tosave_model()
, determine the appropriate heuristic such that we can successfully save Phi 3.5 Vision and other architecturally similar models.transformers
handles shared tensors...?Motivation
I encountered an issue while trying to fine-tune Phi 3.5 Vision using the
Trainer
class fromtransformers
. In particular, when trying to callsave()
orsave_pretrained()
, transformers throws the following error:Below are two minimal reproducible examples:
Example #1
Example #2
It looks like others have also encountered this issue. See the list of reference issues below in "Issues".
A contributor to the Phi 3 Vision cookbook suggested the following solution, stating "You need to remove the wte weight. It's okay because when the model is loaded from the checkpoint, it will automatically copy the weight from the embedding weight."
This does indeed seem to work. However, it doesn't exactly fit into a use case that relies on the
Trainer
abstraction. The call to theTrainer
class'ssave_model()
method doesn't accommodate a state_dict argument (see https://github.com/huggingface/transformers/blob/main/src/transformers/trainer.py#L3719-L3768).Issues
Your contribution
I'd be glad to submit a PR, but I think some discussion is needed from the appropriate
transformers
stakeholders.It's not clear to me whether the most appropriate change here is to modify the function signature.
Alternatively, maybe there's a heuristic by which we could determine whether the architecture is such that one needs to save everything but the
wte
weights. I don't know the answer to that off-hand. It may require a deep dive from Phi 3/3.5 Vision SMEs.Or more broadly, perhaps there's some change to the way
transformers
handles shared tensors in the base configuration that would be most appropriate.The text was updated successfully, but these errors were encountered: