Skip to content
Discussion options

You must be logged in to vote

When num_classes=2 is used with task='multiclass', metrics like accuracy, F1, and recall are computed in a macro-averaged multiclass way — each class is treated separately, and results are averaged. In a perfectly balanced binary dataset with symmetric performance, these macro-averaged values can end up identical across metrics.

When task='binary', metrics use positive-class-focused definitions (e.g., precision/recall relative to the positive label only), which often produces different numbers. This isn’t a bug — it’s due to the difference between binary vs macro‑multiclass computation strategies

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by SkafteNicki
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants