Fix benchmark status reporting to accurately reflect trial completion #847

ay-bh · 2025-03-10T20:10:10Z

Changes

Modified the benchmark status determination logic in common.py to:

Added a new helper method _get_expected_trials_count() that scans benchmark directories to determine the total expected number of trials
Updated match_benchmark() to mark benchmarks as "Done" only when all expected trials are complete

Testing

Tested the solution by creating benchmarks in various states using a local script:

Complete benchmark with all trials present and successful
Partial benchmark with only some trials having results
Failed benchmark with all trials present but some failing

Verified the status was correctly reported as "Running" for partial and failed benchmarks, and "Done" only for the complete benchmark

Fixes #721

DonggeLiu · 2025-03-12T00:02:56Z

report/common.py

+    fuzz_targets_dir = os.path.join(self._results_dir, benchmark_id,
+                                    'fuzz_targets')
+    if not FileSystem(fuzz_targets_dir).exists():
+      # Counting files in raw_targets directory for older experiments


It's great to see you've noticed the difference ways to store fuzz targets between new/old experiments.

How about organize the logic to:

if FileSystem(fuzz_targets_dir).exists(): # Check and return the trial count in new experiments. # Check and return the trial count in old experiments.

There is a catch in counting the number of trials in the new experiment: Not all fuzz targets will be returned immediately at once. For example, we may have fuzz targets from trial 8 and 10 first, then from trial 2 and 5, and later the rest.
There are 2 solutions:

In most cases, we can assume run_all_experiments.py (which takes the trial count from --num-samples) is run under the same fresh environment as report generation. Hence we can add a line in run_all_experiments.py to record num-samples in an ENV VAR, and reuse it in report generation.

In rare cases where the ENV VAR is not set, we can assume the max trial ID so far is the trial count as a temp solution. E.g., if we have trial 01, 05, 10, then we assume 10 trials in total.

The workflow would be:

Check and use ENV VAR.

If not available, check fuzz_targets dir and count trials in new experiments.

If not available, check raw_targets dir and count trials in old experiments.

return a default value (0 or 1).

I reckon this is the minimum changes required to solve this, but please feel free to let me know if you can think of a more elegant/complete/sound solution : )

BTW, thanks for addressing this so quickly!

Fix benchmark status reporting to accurately reflect trial completion

8874e59

DonggeLiu reviewed Mar 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix benchmark status reporting to accurately reflect trial completion #847

Fix benchmark status reporting to accurately reflect trial completion #847

ay-bh commented Mar 10, 2025

DonggeLiu Mar 12, 2025

DonggeLiu Mar 12, 2025

Fix benchmark status reporting to accurately reflect trial completion #847

Are you sure you want to change the base?

Fix benchmark status reporting to accurately reflect trial completion #847

Conversation

ay-bh commented Mar 10, 2025

Changes

Testing

DonggeLiu Mar 12, 2025

Choose a reason for hiding this comment

DonggeLiu Mar 12, 2025

Choose a reason for hiding this comment