Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix benchmark status reporting to accurately reflect trial completion #847

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 33 additions & 1 deletion report/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -269,7 +269,10 @@ def list_benchmark_ids(self) -> List[str]:
def match_benchmark(self, benchmark_id: str, results: list[evaluator.Result],
targets: list[str]) -> Benchmark:
"""Returns a benchmark class based on |benchmark_id|."""
status = 'Done' if results and all(results) else 'Running'
expected_trials = self._get_expected_trials_count(benchmark_id)
status = 'Done' if (results and len(results) == expected_trials and
all(results)) else 'Running'

filtered_results = [(i, stat) for i, stat in enumerate(results) if stat]

if filtered_results:
Expand All @@ -279,6 +282,35 @@ def match_benchmark(self, benchmark_id: str, results: list[evaluator.Result],

return self._create_benchmark(benchmark_id, status, result)

def _get_expected_trials_count(self, benchmark_id: str) -> int:
"""Returns the expected number of trials for a benchmark."""
fuzz_targets_dir = os.path.join(self._results_dir, benchmark_id,
'fuzz_targets')
if not FileSystem(fuzz_targets_dir).exists():
# Counting files in raw_targets directory for older experiments
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great to see you've noticed the difference ways to store fuzz targets between new/old experiments.

How about organize the logic to:

if FileSystem(fuzz_targets_dir).exists():
  # Check and return the trial count in new experiments.

# Check and return the trial count in old experiments.

There is a catch in counting the number of trials in the new experiment: Not all fuzz targets will be returned immediately at once. For example, we may have fuzz targets from trial 8 and 10 first, then from trial 2 and 5, and later the rest.
There are 2 solutions:

  1. In most cases, we can assume run_all_experiments.py (which takes the trial count from --num-samples) is run under the same fresh environment as report generation. Hence we can add a line in run_all_experiments.py to record num-samples in an ENV VAR, and reuse it in report generation.
  2. In rare cases where the ENV VAR is not set, we can assume the max trial ID so far is the trial count as a temp solution. E.g., if we have trial 01, 05, 10, then we assume 10 trials in total.

The workflow would be:

  1. Check and use ENV VAR.
  2. If not available, check fuzz_targets dir and count trials in new experiments.
  3. If not available, check raw_targets dir and count trials in old experiments.
  4. return a default value (0 or 1).

I reckon this is the minimum changes required to solve this, but please feel free to let me know if you can think of a more elegant/complete/sound solution : )

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, thanks for addressing this so quickly!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DonggeLiu Thanks, I find the work you are doing really interesting.

We can also use a metadata approach to write experiment parameters (like the expected trial count) to a dedicated file during initialization, ensuring a persistent source for reporting. File-based record decouples reporting from runtime quirks such as environment variables, etc. But both approaches work well, while the metadata method feels more sophisticated, I believe they are equally effective overall, what do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DonggeLiu I have a doubt: can a trial ID be '00', or do they always start at '01'? If '00' is possible, then in a case with trials '01', '05', and '10', the total trial count would be 11 rather than 10.

raw_targets_dir = os.path.join(self._results_dir, benchmark_id,
'raw_targets')
if not FileSystem(raw_targets_dir).exists():
return 0
targets = [
f for f in FileSystem(raw_targets_dir).listdir()
if os.path.splitext(f)[1] in TARGET_EXTS
]
return len(targets)

# Count unique trial IDs in the fuzz_targets directory
trial_ids = set()
for filename in FileSystem(fuzz_targets_dir).listdir():
if os.path.splitext(filename)[1] in TARGET_EXTS:
# Extract trial ID from filenames like "01.fuzz_target"
try:
trial_id = os.path.splitext(filename)[0]
trial_ids.add(trial_id)
except (ValueError, IndexError):
continue

return len(trial_ids)

def get_final_target_code(self, benchmark: str, sample: str) -> str:
"""Gets the targets of benchmark |benchmark| with sample ID |sample|."""
targets_dir = os.path.join(self._results_dir, benchmark, 'fixed_targets')
Expand Down