Expected behavior of `compute_result` is hard to expect and inconsistent

In trainer there exists a parameter `compute_result` given to `compute_metrics` when `batch_eval_metrics` is given to True.

https://github.com/huggingface/transformers/blob/1e0665a191f73f6b002209c3dfcda478baac6bac/src/transformers/trainer.py#L370-L375

I think there are several problems for `compute_result`,
1. User can't expect (1) what happen if `batch_eval_metrics` is given (2) what is given to `compute_result`  and when it change from True or False (3) what's HF's intention to implement `compute_metrics` with `compute_result`. since there are very few (only 3 line) instruction for this.
2. `compute_metrics` sometimes called with `compute_result` and sometimes not, EVEN WHEN `batch_eval_metrics` is present. See below lines. 

https://github.com/huggingface/transformers/blob/1e0665a191f73f6b002209c3dfcda478baac6bac/src/transformers/trainer.py#L4534-L4547

Creating this issue because I spend long time figuring out this.

	compute_metrics (`Callable[[EvalPrediction], Dict]`, optional):
	The function that will be used to compute metrics at evaluation. Must take a [`EvalPrediction`] and return
	a dictionary string to metric values. Note When passing TrainingArgs with `batch_eval_metrics` set to
	`True`, your compute_metrics function must take a boolean `compute_result` argument. This will be triggered
	after the last eval batch to signal that the function needs to calculate and return the global summary
	statistics rather than accumulating the batch-level statistics

	# Metrics!
	if (
	self.compute_metrics is not None
	and all_preds is not None
	and all_labels is not None
	and not self.args.batch_eval_metrics
	):
	eval_set_kwargs["losses"] = all_losses if "loss" in args.include_for_metrics else None
	eval_set_kwargs["inputs"] = all_inputs if "inputs" in args.include_for_metrics else None
	metrics = self.compute_metrics(
	EvalPrediction(predictions=all_preds, label_ids=all_labels, **eval_set_kwargs)
	)
	elif metrics is None:
	metrics = {}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Expected behavior of `compute_result` is hard to expect and inconsistent #39842

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Expected behavior of compute_result is hard to expect and inconsistent #39842

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Expected behavior of `compute_result` is hard to expect and inconsistent #39842