Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix benchmark status reporting to accurately reflect trial completion #847

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ay-bh
Copy link

@ay-bh ay-bh commented Mar 10, 2025

Changes

Modified the benchmark status determination logic in common.py to:

  1. Added a new helper method _get_expected_trials_count() that scans benchmark directories to determine the total expected number of trials
  2. Updated match_benchmark() to mark benchmarks as "Done" only when all expected trials are complete

Testing

Tested the solution by creating benchmarks in various states using a local script:

  • Complete benchmark with all trials present and successful
  • Partial benchmark with only some trials having results
  • Failed benchmark with all trials present but some failing

Verified the status was correctly reported as "Running" for partial and failed benchmarks, and "Done" only for the complete benchmark

Fixes #721

fuzz_targets_dir = os.path.join(self._results_dir, benchmark_id,
'fuzz_targets')
if not FileSystem(fuzz_targets_dir).exists():
# Counting files in raw_targets directory for older experiments
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great to see you've noticed the difference ways to store fuzz targets between new/old experiments.

How about organize the logic to:

if FileSystem(fuzz_targets_dir).exists():
  # Check and return the trial count in new experiments.

# Check and return the trial count in old experiments.

There is a catch in counting the number of trials in the new experiment: Not all fuzz targets will be returned immediately at once. For example, we may have fuzz targets from trial 8 and 10 first, then from trial 2 and 5, and later the rest.
There are 2 solutions:

  1. In most cases, we can assume run_all_experiments.py (which takes the trial count from --num-samples) is run under the same fresh environment as report generation. Hence we can add a line in run_all_experiments.py to record num-samples in an ENV VAR, and reuse it in report generation.
  2. In rare cases where the ENV VAR is not set, we can assume the max trial ID so far is the trial count as a temp solution. E.g., if we have trial 01, 05, 10, then we assume 10 trials in total.

The workflow would be:

  1. Check and use ENV VAR.
  2. If not available, check fuzz_targets dir and count trials in new experiments.
  3. If not available, check raw_targets dir and count trials in old experiments.
  4. return a default value (0 or 1).

I reckon this is the minimum changes required to solve this, but please feel free to let me know if you can think of a more elegant/complete/sound solution : )

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, thanks for addressing this so quickly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect Benchmark Status on report's Index Page
2 participants