Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generated target antipattern: misuse of ConsumeData #575

Open
oliverchang opened this issue Sep 5, 2024 · 5 comments
Open

Generated target antipattern: misuse of ConsumeData #575

oliverchang opened this issue Sep 5, 2024 · 5 comments

Comments

@oliverchang
Copy link
Collaborator

oliverchang commented Sep 5, 2024

From https://llm-exp.oss-fuzz.com/Result-reports/scheduled/2024-08-31-weekly-all/sample/output-libtheora-th_comment_query/02.html

  // Generate fuzzed comments.
  for (int i = 0; i < tc.comments; ++i) {
    tc.comment_lengths[i] = fuzzed_data.ConsumeIntegralInRange<int>(1, 128);
    tc.user_comments[i] = (char *)malloc(tc.comment_lengths[i]);
    if (!tc.user_comments[i]) {
      continue;
    }
    fuzzed_data.ConsumeData(tc.user_comments[i], tc.comment_lengths[i]);
  }
    fuzzed_data.ConsumeData(tc.user_comments[i], tc.comment_lengths[i]);

is wrong, because ConsumeData doesn't guarantee it'll write tc.comment_lengths[i] bytes of data, leading to potential false positive OOB reads afterwards.

This should instead be:

    tc.comment_lengths[i] = fuzzed_data.ConsumeData(tc.user_comments[i], tc.comment_lengths[i]);

Is there some instruction we can give to prevent this misuse of FDP ? @DavidKorczynski @DonggeLiu

@oliverchang oliverchang changed the title Generated target antipattern: Generated target antipattern: misuse of ConsumeData Sep 5, 2024
@oliverchang
Copy link
Collaborator Author

Similar issues: #301 and #164. We need to see if we can fix them now with our new models, otherwise we'll have way too many false positive crash reports.

@DonggeLiu
Copy link
Collaborator

Could we document this somewhere (e.g., a file in the repo/bucket)?
Then I can ask LLM agents to read the file.

@oliverchang
Copy link
Collaborator Author

We can probably base this on

1. <code>ConsumeBytes</code> and <code>ConsumeBytesWithTerminator</code> methods return a <code>std::vector</code> of AT MOST UP TO the requested size. These methods are helpful when you know how long a certain part of the fuzz input should be. Use <code>.data()</code> and <code>.size()</code> methods of the resulting object if your API works with raw memory arguments.
? And add some more emphasis on avoiding these bad patterns.

@oliverchang
Copy link
Collaborator Author

oliverchang commented Sep 5, 2024

It might also be worth considering have a separate "validator" agent that focuses on finding these documented bad patterns and fixing them.

@DavidKorczynski
Copy link
Collaborator

DavidKorczynski commented Sep 26, 2024

It might also be worth considering have a separate "validator" agent that focuses on finding these documented bad patterns and fixing them.

An important point here is that the triage correctly labels this a "Bug" (bug here means bug in OFG generated harness, and specifically means no bug in the target code), so our existing agent already get's it correct in that sense. We could try and get the triage_crash into our build/run loop in a similar fashion to fix_generated_fuzz_target and if triage_crash determines the harness to be erroneous then we ask it to fix it:

self._fix_generated_fuzz_target(ai_binary, generated_oss_fuzz_project,
target_path, llm_fix_count,
build_result, run_result, dual_logger)
except Exception as e:
dual_logger.log('Exception occurred when fixing fuzz target in attempt '
f'{llm_fix_count}: {e}')
break
# Logs and returns the result.
if not build_result.succeeded:
dual_logger.log(f'Failed to build {target_path} with '
f'{self.builder_runner.fixer_model_name} in '
f'{llm_fix_count} iterations of fixing.')
return dual_logger.return_result(
Result(False, False, 0.0, 0.0, '', '', False,
SemanticCheckResult.NOT_APPLICABLE,
TriageResult.NOT_APPLICABLE))
dual_logger.log(f'Successfully built {target_path} with '
f'{self.builder_runner.fixer_model_name} in '
f'{llm_fix_count} iterations of fixing.')
if not run_result:
dual_logger.log(
f'Warning: no run result in {generated_oss_fuzz_project}.')
return dual_logger.return_result(
Result(True, False, 0.0, 0.0, '', '', False,
SemanticCheckResult.NOT_APPLICABLE,
TriageResult.NOT_APPLICABLE))
# Triage the crash with LLM
dual_logger.log(f'Triaging the crash related to {target_path} with '
f'{self.builder_runner.fixer_model_name}.')
run_result.triage = self.triage_crash(
ai_binary,
generated_oss_fuzz_project,
target_path,
run_result,
dual_logger,
)

A caveat here is I'm not sure how often the triaging get's it wrong, which we should watch out for if adding it into the loop.

AdamKorcz pushed a commit that referenced this issue Oct 2, 2024
Ref: #575
Ref: #301

---------

Signed-off-by: David Korczynski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants