Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix bugs && prepare 50 cases with headerfiles #643

Merged
merged 5 commits into from
Sep 23, 2024

Conversation

Once2gain
Copy link

Statement: Most modifications to the original code of oss-fuzz-gen (including items 2. and 3. below) are for the convenience of current testing and performance comparison based on Gemini. The current changes will not be the final merge changes.

Modifications to be noted:

  1. Changed c projects like picotls, libvnc's language setting in yaml from c++ to c. Otherwise, the prompt will provide an example of c++, and the large model imitation the c++ example to include FuzzydDataProvider. h (c++).

  2. Added headerfiles project as a module in the oss-fuzz-gen project. Therefore, the include statement changed from import headerfiles.api as headerfiles to from headerfiles.headerfiles import api as headerfiles. (This facilitates us to adjust the code in the headerfiles at any time, and eventually, we will package it as an external library)

  3. Changed the function: https://github.com/occia/oss-fuzz-gen/blob/e71091bab8b4ac20a2e575ee9f7cbce91a987fdd/data_prep/project_src.py#L238 to avoid the bug: "docker: Error response from daemon: Conflict."

  4. Project bind9: Execute make "-j$(nproc)" in original build.sh sometimes cause link errors, related to the setting of multithreading in the project. Execute make produce no errors (by headersfile_updated_script).

  5. Project openexr: The header files introduced by headerfiles will be part of the prompt, occasionally affecting the generation of LLM. Haven't found a solution yet. "We have prepared the following list of headers which covers all target project APIs and will prepend them as #include statments at the beginning of your generated fuzz target. Therefore, you only need to include the headers of non-target-project APIs used in your fuzz target. <code> dns/acl.h...".

Overall Results:
(Based on GPT-4o)

PROJ ORI FIX
avahi 16 18
bind9 21 14
bluez 0 0
brotli 0 1
capstone 35 50
coturn 16 18
croaring 42 50
igraph 0 0
kamailio 6 28
krb5 0 0
lcms 0 37
libbpf 0 39
libcoap 0 0
libevent 4 28
libfido2 0 50
libical 6 16
libjpeg-turbo 39 50
libpcap 50 50
librdkafka 0 0
libsndfile 39 45
libsodium 0 0
libssh2 26 25
libssh 22 32
libtpms 40 40
libusb 1 27
libvnc 0 24
libxls 0 33
libyang 1 0
lwan 0 8
mbedtls 0 16
mdbtools 0 0
minizip 50 50
ndpi 1 7
njs 2 0
oniguruma 20 20
openexr 17 0
opusfile 23 36
picotls 43 41
pjsip 14 19
proftpd 23 44
pupnp 29 29
sleuthkit 0 0
tidy-html5 35 39
unicorn 0 0
unit 15 13
utf8proc 20 20
vlc 6 9
w3m 34 30
wasm3 5 15
zydis 0 0

Copy link

google-cla bot commented Sep 23, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Copy link
Collaborator

@DavidKorczynski DavidKorczynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting, thanks for the PR!

A couple of comments for now:

Changed c projects like picotls, libvnc's language setting in yaml from c++ to c. Otherwise, the prompt will provide an example of c++, and the large model imitation the c++ example to include FuzzydDataProvider. h (c++).

How about just changing this directly in OSS-Fuzz? Alternatively, we could add an API to Fuzz Introspector that determines the language based on the source files of the module rather than the project.yaml in OSS-Fuzz.

Added headerfiles project as a module in the oss-fuzz-gen project. Therefore, the include statement changed from import headerfiles.api as headerfiles to from headerfiles.headerfiles import api as headerfiles. (This facilitates us to adjust the code in the headerfiles at any time, and eventually, we will package it as an external library)

Interesting! Could you write a bit about how you generated these? I'm curious if it's automated and if so, which heuristics? -- projects such as libsodium you have included only a single file despite the project having many header files. I can see the harnesses in OSS-Fuzz uses the same header file as yours, so I guess that's one way of extracting this info.

@DavidKorczynski
Copy link
Collaborator

When I look at the stats I see:

  • The original had a success rate of 701 whereas with headerfiles is 1071.
  • In 26 projects headerfiles performed better
  • In 6 projects the original performed better

Do you have intuition about which of your changes is most responsible (how often did the language impact and how often was the header files the most important)? for example, Interestingly the picotls project where you changed the language performs worse in your set up.

DavidKorczynski added a commit to ossf/fuzz-introspector that referenced this pull request Sep 23, 2024
This is based on file extension counting.

Ref: google/oss-fuzz-gen#643

Signed-off-by: David Korczynski <[email protected]>
@DonggeLiu DonggeLiu merged commit b36f237 into google:Test-build Sep 23, 2024
1 of 4 checks passed
@occia
Copy link

occia commented Sep 23, 2024

Thanks for your quick response.

Our general idea here is that:

  • Existing build.sh is too hacky and specifically written for the API target testing at that time. Therefore, it is usually not a general way to build the fuzz target for new APIs and even misses basic operations such as make install.
  • Our basic intuition is that, for most APIs/projects, a general build script should exist, especially when considering the fact that the testing target API is a part of the interface designed by developer to the outside. The developer is very likely assumed a way to build and use the API from outside.

Therefore, we PLAN to automatically figure out the general build script (current technical plan is to use LLM agents to infer the general build script, which is modified from the existing build.sh. The inference process abstracts and mimics human expert's behaviour). Our goal here is to minimize or eliminate the build error caused by oss-fuzz-gen build.sh.

For this PR, we want to demonstrate the effectiveness of this idea (general build script inference). Therefore, we randomly picked 50 projects, manually prepared its general build script to show its at best usefulness.

Since the main goal of this PR is to understand its effectiveness on gpt4o and gemini, many code are far from ready for final merge (such as we skips the fuzzing process by adding option -runs=0, put all code of our general build script inference tool headerfile into oss-fuzz-gen, etc).

Another thing we would like to mention here is that build success rate in the table may not be a perfect metric here to demonstrate the effectiveness. This is because the generated fuzz target can still raise build error when build.sh is correct, e.g., LLM imaged some non-existent headers, made grammartical errors, etc. Therefore, in some projects, it may shows higher build error rate, but that is of fuzz target build error, the build script caused build error is eliminated.

@occia
Copy link

occia commented Sep 23, 2024

How about just changing this directly in OSS-Fuzz? Alternatively, we could add an API to Fuzz Introspector that determines the language based on the source files of the module rather than the project.yaml in OSS-Fuzz.

Thanks for the suggestion, we will see how to figure this out.

Interesting! Could you write a bit about how you generated these? I'm curious if it's automated and if so, which heuristics? -- projects such as libsodium you have included only a single file despite the project having many header files. I can see the harnesses in OSS-Fuzz uses the same header file as yours, so I guess that's one way of extracting this info.

Please see the above reply for high-level explanation.

Do you have intuition about which of your changes is most responsible (how often did the language impact and how often was the header files the most important)? for example, Interestingly the picotls project where you changed the language performs worse in your set up.

Please see the above reply. We want to provide a correct and general build.sh that is usable for compiling the generated fuzz targets for any API inside a given project. The general build script can minimize the build.sh caused build error.

@DonggeLiu
Copy link
Collaborator

Thanks sooo much, @Once2gain and @occia !
I've merged the PR into #641 and started an experiment there.

A bit background: @DavidKorczynski helps OFG support arbitrary C/C++ projects.
I think your work can benefit each other at some point, and look forward to seeing if we can combine your efforts in generating build scripts. : )

@occia
Copy link

occia commented Sep 23, 2024

Thanks for the link, we haven't noticed that code before, let us learn david's code!

@oliverchang
Copy link
Collaborator

AdamKorcz pushed a commit to ossf/fuzz-introspector that referenced this pull request Sep 24, 2024
This is based on file extension counting.

Ref: google/oss-fuzz-gen#643

Signed-off-by: David Korczynski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants