-
Notifications
You must be signed in to change notification settings - Fork 30.3k
Description
Feature request
Hi all, first of all if this feature already exists I apologise!
With the rise of multimodal LLMs if would be great if we could add extra outputs to GenerationMixin.generate
results. For instance if we implement a model like Janus from DeepSeek, there are two output heads. One lm_head
, and one image_head
. The outputs of forward
method have extra attributes that can't be passed to the generate
results.
I know these multimodal models are not common within this repo so this is pretty bleeding edge, but I'm working on research in this domain and it would be great if we could forward all model outputs to the generate
result. Maybe through an attribute like kwarg_outputs
in classes like GenerateDecoderOnlyOutput
?
Motivation
As far as I understand it's possible to feed the extra output during the autoregressive loop, through prepare_inputs_for_generation
and _update_model_kwargs_for_generation
where we can forward model outputs to the next forward call.
But when it comes to forward these outputs to the result of generate
, it doesn't seem possible? I know the generation mixin is geared towards text generation, but it would be great to be able to forward extra model outputs
Your contribution
Happy to have a try but not sure how big of a PR it would be, especially if it touches the pytorch / tf / flax implementations.