Generated target antipattern: misuse of `ConsumeData` #575

oliverchang · 2024-09-05T05:18:00Z

From https://llm-exp.oss-fuzz.com/Result-reports/scheduled/2024-08-31-weekly-all/sample/output-libtheora-th_comment_query/02.html

  // Generate fuzzed comments.
  for (int i = 0; i < tc.comments; ++i) {
    tc.comment_lengths[i] = fuzzed_data.ConsumeIntegralInRange<int>(1, 128);
    tc.user_comments[i] = (char *)malloc(tc.comment_lengths[i]);
    if (!tc.user_comments[i]) {
      continue;
    }
    fuzzed_data.ConsumeData(tc.user_comments[i], tc.comment_lengths[i]);
  }

    fuzzed_data.ConsumeData(tc.user_comments[i], tc.comment_lengths[i]);

is wrong, because ConsumeData doesn't guarantee it'll write tc.comment_lengths[i] bytes of data, leading to potential false positive OOB reads afterwards.

This should instead be:

    tc.comment_lengths[i] = fuzzed_data.ConsumeData(tc.user_comments[i], tc.comment_lengths[i]);

Is there some instruction we can give to prevent this misuse of FDP ? @DavidKorczynski @DonggeLiu

The text was updated successfully, but these errors were encountered:

oliverchang · 2024-09-05T05:44:36Z

Similar issues: #301 and #164. We need to see if we can fix them now with our new models, otherwise we'll have way too many false positive crash reports.

DonggeLiu · 2024-09-05T05:46:54Z

Could we document this somewhere (e.g., a file in the repo/bucket)?
Then I can ask LLM agents to read the file.

oliverchang · 2024-09-05T05:48:05Z

We can probably base this on

oss-fuzz-gen/prompts/template_xml/cpp-specific-priming-filler.txt

Line 30 in 931da89

    
           1. <code>ConsumeBytes</code> and <code>ConsumeBytesWithTerminator</code> methods return a <code>std::vector</code> of AT MOST UP TO the requested size. These methods are helpful when you know how long a certain part of the fuzz input should be. Use <code>.data()</code> and <code>.size()</code> methods of the resulting object if your API works with raw memory arguments.

? And add some more emphasis on avoiding these bad patterns.

oliverchang · 2024-09-05T06:00:14Z

It might also be worth considering have a separate "validator" agent that focuses on finding these documented bad patterns and fixing them.

DavidKorczynski · 2024-09-26T09:44:07Z

It might also be worth considering have a separate "validator" agent that focuses on finding these documented bad patterns and fixing them.

An important point here is that the triage correctly labels this a "Bug" (bug here means bug in OFG generated harness, and specifically means no bug in the target code), so our existing agent already get's it correct in that sense. We could try and get the triage_crash into our build/run loop in a similar fashion to fix_generated_fuzz_target and if triage_crash determines the harness to be erroneous then we ask it to fix it:

oss-fuzz-gen/experiment/evaluator.py

Lines 386 to 425 in 43bf4db

    
               self._fix_generated_fuzz_target(ai_binary, generated_oss_fuzz_project, 
        
                                               target_path, llm_fix_count, 
        
                                               build_result, run_result, dual_logger) 
        
             except Exception as e: 
        
               dual_logger.log('Exception occurred when fixing fuzz target in attempt ' 
        
                               f'{llm_fix_count}: {e}') 
        
               break 
        
           # Logs and returns the result. 
        
           if not build_result.succeeded: 
        
             dual_logger.log(f'Failed to build {target_path} with ' 
        
                             f'{self.builder_runner.fixer_model_name} in ' 
        
                             f'{llm_fix_count} iterations of fixing.') 
        
             return dual_logger.return_result( 
        
                 Result(False, False, 0.0, 0.0, '', '', False, 
        
                        SemanticCheckResult.NOT_APPLICABLE, 
        
                        TriageResult.NOT_APPLICABLE)) 
        
           dual_logger.log(f'Successfully built {target_path} with ' 
        
                           f'{self.builder_runner.fixer_model_name} in ' 
        
                           f'{llm_fix_count} iterations of fixing.') 
        
           if not run_result: 
        
             dual_logger.log( 
        
                 f'Warning: no run result in {generated_oss_fuzz_project}.') 
        
             return dual_logger.return_result( 
        
                 Result(True, False, 0.0, 0.0, '', '', False, 
        
                        SemanticCheckResult.NOT_APPLICABLE, 
        
                        TriageResult.NOT_APPLICABLE)) 
        
           # Triage the crash with LLM 
        
           dual_logger.log(f'Triaging the crash related to {target_path} with ' 
        
                           f'{self.builder_runner.fixer_model_name}.') 
        
           run_result.triage = self.triage_crash( 
        
               ai_binary, 
        
               generated_oss_fuzz_project, 
        
               target_path, 
        
               run_result, 
        
               dual_logger, 
        
           )

A caveat here is I'm not sure how often the triaging get's it wrong, which we should watch out for if adding it into the loop.

Ref: #575 Ref: #301 --------- Signed-off-by: David Korczynski <[email protected]>

oliverchang changed the title ~~Generated target antipattern:~~ Generated target antipattern: misuse of ConsumeData Sep 5, 2024

oliverchang added the prompt-engineering label Sep 5, 2024

DavidKorczynski mentioned this issue Sep 25, 2024

code-fixer: extend for ConsumeBytes #652

Merged

AdamKorcz pushed a commit that referenced this issue Oct 2, 2024

code-fixer: extend for ConsumeBytes (#652)

78f04b6

Ref: #575 Ref: #301 --------- Signed-off-by: David Korczynski <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generated target antipattern: misuse of `ConsumeData` #575

Generated target antipattern: misuse of `ConsumeData` #575

oliverchang commented Sep 5, 2024 •

edited

Loading

oliverchang commented Sep 5, 2024

DonggeLiu commented Sep 5, 2024

oliverchang commented Sep 5, 2024

oliverchang commented Sep 5, 2024 •

edited

Loading

DavidKorczynski commented Sep 26, 2024 •

edited

Loading

Generated target antipattern: misuse of ConsumeData #575

Generated target antipattern: misuse of ConsumeData #575

Comments

oliverchang commented Sep 5, 2024 • edited Loading

oliverchang commented Sep 5, 2024

DonggeLiu commented Sep 5, 2024

oliverchang commented Sep 5, 2024

oliverchang commented Sep 5, 2024 • edited Loading

DavidKorczynski commented Sep 26, 2024 • edited Loading

Generated target antipattern: misuse of `ConsumeData` #575

Generated target antipattern: misuse of `ConsumeData` #575

oliverchang commented Sep 5, 2024 •

edited

Loading

oliverchang commented Sep 5, 2024 •

edited

Loading

DavidKorczynski commented Sep 26, 2024 •

edited

Loading