You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Datasets have a new version created every time there is any change in the dataset. Most commonly this would be adding a new test case. Right now, evals comparison feature allow comparison only between the same versions of a dataset, but this is too restrictive as a single test case changes also makes the evals runs un-comparable. Can we allow comparison over different versions?
For supporting this, we could do an outer join in the eval results from multiple runs, but we need to figure out how to display changes in the Dataset input and reference fields:
Input and reference are taken from the baseline results currently. If the comparison runs have modified inputs / additional testcases, how do we surface them?
Designs
TBD
The text was updated successfully, but these errors were encountered:
Overview
Datasets have a new version created every time there is any change in the dataset. Most commonly this would be adding a new test case. Right now, evals comparison feature allow comparison only between the same versions of a dataset, but this is too restrictive as a single test case changes also makes the evals runs un-comparable. Can we allow comparison over different versions?
For supporting this, we could do an outer join in the eval results from multiple runs, but we need to figure out how to display changes in the Dataset input and reference fields:

Input and reference are taken from the baseline results currently. If the comparison runs have modified inputs / additional testcases, how do we surface them?
Designs
TBD
The text was updated successfully, but these errors were encountered: