Skip to content

Expected behavior of compute_result is hard to expect and inconsistent #39842

@MilkClouds

Description

@MilkClouds

In trainer there exists a parameter compute_result given to compute_metrics when batch_eval_metrics is given to True.

compute_metrics (`Callable[[EvalPrediction], Dict]`, *optional*):
The function that will be used to compute metrics at evaluation. Must take a [`EvalPrediction`] and return
a dictionary string to metric values. *Note* When passing TrainingArgs with `batch_eval_metrics` set to
`True`, your compute_metrics function must take a boolean `compute_result` argument. This will be triggered
after the last eval batch to signal that the function needs to calculate and return the global summary
statistics rather than accumulating the batch-level statistics

I think there are several problems for compute_result,

  1. User can't expect (1) what happen if batch_eval_metrics is given (2) what is given to compute_result and when it change from True or False (3) what's HF's intention to implement compute_metrics with compute_result. since there are very few (only 3 line) instruction for this.
  2. compute_metrics sometimes called with compute_result and sometimes not, EVEN WHEN batch_eval_metrics is present. See below lines.

# Metrics!
if (
self.compute_metrics is not None
and all_preds is not None
and all_labels is not None
and not self.args.batch_eval_metrics
):
eval_set_kwargs["losses"] = all_losses if "loss" in args.include_for_metrics else None
eval_set_kwargs["inputs"] = all_inputs if "inputs" in args.include_for_metrics else None
metrics = self.compute_metrics(
EvalPrediction(predictions=all_preds, label_ids=all_labels, **eval_set_kwargs)
)
elif metrics is None:
metrics = {}

Creating this issue because I spend long time figuring out this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions