-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix bugs && prepare 50 cases with headerfiles #643
Conversation
fix bugs && prepare 50 cases with headerfiles
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting, thanks for the PR!
A couple of comments for now:
Changed c projects like picotls, libvnc's language setting in yaml from c++ to c. Otherwise, the prompt will provide an example of c++, and the large model imitation the c++ example to include FuzzydDataProvider. h (c++).
How about just changing this directly in OSS-Fuzz? Alternatively, we could add an API to Fuzz Introspector that determines the language based on the source files of the module rather than the project.yaml
in OSS-Fuzz.
Added headerfiles project as a module in the oss-fuzz-gen project. Therefore, the include statement changed from import headerfiles.api as headerfiles to from headerfiles.headerfiles import api as headerfiles. (This facilitates us to adjust the code in the headerfiles at any time, and eventually, we will package it as an external library)
Interesting! Could you write a bit about how you generated these? I'm curious if it's automated and if so, which heuristics? -- projects such as libsodium
you have included only a single file despite the project having many header files. I can see the harnesses in OSS-Fuzz uses the same header file as yours, so I guess that's one way of extracting this info.
When I look at the stats I see:
Do you have intuition about which of your changes is most responsible (how often did the language impact and how often was the header files the most important)? for example, Interestingly the |
This is based on file extension counting. Ref: google/oss-fuzz-gen#643 Signed-off-by: David Korczynski <[email protected]>
Thanks for your quick response. Our general idea here is that:
Therefore, we PLAN to automatically figure out the For this PR, we want to demonstrate the effectiveness of this idea (general build script inference). Therefore, we randomly picked 50 projects, manually prepared its Since the main goal of this PR is to understand its effectiveness on gpt4o and gemini, many code are far from ready for final merge (such as we skips the fuzzing process by adding option Another thing we would like to mention here is that build success rate in the table may not be a perfect metric here to demonstrate the effectiveness. This is because the generated fuzz target can still raise build error when |
Thanks for the suggestion, we will see how to figure this out.
Please see the above reply for high-level explanation.
Please see the above reply. We want to provide a correct and general |
Thanks sooo much, @Once2gain and @occia ! A bit background: @DavidKorczynski helps OFG support arbitrary C/C++ projects. |
Thanks for the link, we haven't noticed that code before, let us learn david's code! |
+1 see https://github.com/google/oss-fuzz-gen/blob/main/experimental/c-cpp/build_generator.py in particular and https://blog.oss-fuzz.com/posts/introducing-llm-based-harness-synthesis-for-unfuzzed-projects/ There's likely a lot that can be reused? |
This is based on file extension counting. Ref: google/oss-fuzz-gen#643 Signed-off-by: David Korczynski <[email protected]>
Statement: Most modifications to the original code of
oss-fuzz-gen
(including items 2. and 3. below) are for the convenience of current testing and performance comparison based on Gemini. The current changes will not be the final merge changes.Modifications to be noted:
Changed c projects like picotls, libvnc's language setting in yaml from c++ to c. Otherwise, the prompt will provide an example of c++, and the large model imitation the c++ example to include FuzzydDataProvider. h (c++).
Added
headerfiles
project as a module in the oss-fuzz-gen project. Therefore, the include statement changed from import headerfiles.api as headerfiles to from headerfiles.headerfiles import api as headerfiles. (This facilitates us to adjust the code in theheaderfiles
at any time, and eventually, we will package it as an external library)Changed the function: https://github.com/occia/oss-fuzz-gen/blob/e71091bab8b4ac20a2e575ee9f7cbce91a987fdd/data_prep/project_src.py#L238 to avoid the bug: "docker: Error response from daemon: Conflict."
Project bind9: Execute make "-j$(nproc)" in original build.sh sometimes cause link errors, related to the setting of multithreading in the project. Execute make produce no errors (by headersfile_updated_script).
Project openexr: The header files introduced by
headerfiles
will be part of the prompt, occasionally affecting the generation of LLM. Haven't found a solution yet. "We have prepared the following list of headers which covers all target project APIs and will prepend them as #include statments at the beginning of your generated fuzz target. Therefore, you only need to include the headers of non-target-project APIs used in your fuzz target. <code> dns/acl.h...".Overall Results:
(Based on GPT-4o)