Usage guide

Running an experiment requires three steps:

Install dependencies.
Setting up LLM access.
Launch experiment.

Prerequisites

Dependencies

You must install:

Python 3.11
pip
python3.11-venv
Git
Docker
Google Cloud SDK
c++filt must be available in PATH.
(optional for project_src.py) clang-format

Python Dependencies

Install required dependencies in a Python virtual environment:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

LLM Access

Setup Vertex AI or OpenAI with the following steps.

Vertex AI

Accessing Vertex AI models require a Google Cloud Project (GCP) with Vertex AI enabled.

Then auth to GCP:

gcloud auth login
gcloud auth application-default login
gcloud auth application-default set-quota-project <your-project>

You'll also need to specify the GCP projects and locations where you have Vertex AI quota (comma delimited):

export CLOUD_ML_PROJECT_ID=<gcp-project-id>
export VERTEX_AI_LOCATIONS=us-west1,us-west4,us-east4,us-central1

OpenAI

There are two ways to access OpenAI models.

OpenAI API Key on OpenAI: This is the default way for using OpenAI models.
OpenAI API Key on Azure: Please refer to this section if you are using OpenAI models on Azure.

OpenAI API Key on OpenAI

OpenAI requires an API key.

Then set it as an ENV variable:

export OPENAI_API_KEY='<your-api-key>'

OpenAI API Key on Azure

If your OpenAI API key is hosted on Azure, you need the specified Endpoint, API key, and the API version (optional).

Then set them as ENV variables:

export AZURE_OPENAI_API_KEY='<your-azure-api-key>'
export AZURE_OPENAI_ENDPOINT='<your-azure-endpoint>'
export AZURE_OPENAI_API_VERSION='<your-azure-api-version>' # default is '2024-02-01'

Tip: To distinguish between the two ways of accessing OpenAI models, you need to add -azure to the model name when using OpenAI on Azure. For example, gpt-3.5-turbo-azure will use OpenAI on Azure, while gpt-3.5-turbo will use OpenAI on OpenAI.

Running experiments

To generate and evaluate the fuzz targets in a benchmark set via local experiments:

./run_all_experiments.py \
    --model=<model-name> \
    --benchmarks-directory='./benchmark-sets/comparison' \
    [--ai-binary=<llm-access-binary>] \
    [--template-directory=prompts/custom_template] \
    [--work-dir=results-dir]
    [...]
# E.g., generate fuzz targets for TinyXML-2 with default template and fuzz for 30 seconds.
# ./run_all_experiments.py -y ./benchmark-sets/all/tinyxml2.yaml

where the <model-name> must be the name of one of the supported models. The list of models supported by OSS-Fuzz-gen expands on a regular basis, and all of the models can be listed with run_all_experiments.py --help. At the time of writing the following models are supported, where vertex in the name means the model is supported by way of Vertex AI:

vertex_ai_code-bison
vertex_ai_code-bison-32k
vertex_ai_gemini-pro
vertex_ai_gemini-1-5-chat
vertex_ai_gemini-1-5
vertex_ai_gemini-experimental
vertex_ai_gemini-ultra
vertex_ai_claude-3-5-sonnet
vertex_ai_claude-3-opus
vertex_ai_claude-3-haiku
gpt-3.5-turbo-azure
gpt-3.5-turbo
gpt-4
gpt-4o
gpt-4o-azure
gpt-4-azure

Experiments can also be run on Google Cloud using Google Cloud Build. You can do this by passing --cloud <experiment-name> --cloud-experiment-bucket <bucket>, where <bucket> is the name of a Google Cloud Storage bucket your Google Cloud project.

Benchmarks

In order to leverage LLMs for harness generation a set of code targets are needed. In OFG terminology we consider these "benchmarks" and they are basically target functions in a given OSS-Fuzz project or test-cases in a given OSS-Fuzz project. We need these benchmarks to direct the auto-harness approach towards a specific part of some project.

We currently offer a variety of benchmark sets:

comparison: A small selection of OSS-Fuzz C/C++ projects.
all: All benchmarks across all OSS-Fuzz C/C++ projects.
c-specific: A benchmark set focused on C projects.
from-test-large: A benchmark set comprising many test-cases for test-to-harness LLM generation.
from-test-small: A benchmark set used for test-to-harness generation, including a limited number of projects.
jvm-all: A large set of Java targets
jvm-medium: A medium set of Java targets
jvm-small: A small set of Java targets
python-small: A small set of Python targets
test-and-func-mix: A set of targets that mixes function-level targets and test-to-harness targets.
test-to-harness-jvm-small: A small set of Java targets focused on test-to-harness generation.

Visualizing Results

Once finished, the framework will output experiment results like this:

================================================================================
*<project-name>, <function-name>*
build success rate: <build-rate>, crash rate: <crash-rate>, max coverage: <max-coverage>, max line coverage diff: <max-coverage-diff>
max coverage sample: <results-dir>/<benchmark-dir>/fixed_targets/<LLM-generated-fuzz-target>
max coverage diff sample: <results-dir>/<benchmark-dir>/fixed_targets/<LLM-generated-fuzz-target>

where <build-rate> is the number of the fuzz targets that can compile over the total number of fuzz target generated by LLM (e.g., 0.5 if 4 out of 8 fuzz targets can build), <crash-rate> is the run-time crash rate, <max-coverage> measures the maximum line coverage of all targets, and <max-coverage-diff> shows the max new line coverage of LLM-generated targets against existing human-written targets in OSS-Fuzz.

Note that <max-coverage> and <max-coverage-diff> are computed based on the code linked against the fuzz target, not the whole project. For example:

================================================================================
*tinyxml2, tinyxml2::XMLDocument::Print*
build success rate: 1.0, crash rate: 0.125, max coverage: 0.29099427381572096, max line coverage diff: 0.11301753077209996
max coverage sample: <result-dir>/output-tinyxml2-tinyxml2-xmldocument-print/fixed_targets/08.cpp
max coverage diff sample: <result-dir>/output-tinyxml2-tinyxml2-xmldocument-print/fixed_targets/08.cpp

Results report

To visualize these results via a web UI, with more details on the exact prompts used, samples generated, and other logs, run:

python -m report.web -r <results-dir> -o <output-dir>
python -m http.server <port> -d <output-dir>

Where <results-dir> is the directory passed to --work-dir in your experiments (default value ./results).

Then navigate to http://localhost:<port> to view the result in a table.

Detailed workflows

Configure and use framework in the following steps:

Configure benchmark
Setup prompt template
Generate fuzz target
Fix compilation error
Evaluate fuzz target
Using local Fuzz Introspector instance

Configure Benchmark

Prepare a benchmark YAML that specifies the function to test, here is an example. Follow the link above to automatically generate one for a C/C++ project in OSS-Fuzz. Note that the project under test needs to be integrated into OSS-Fuzz to build.

Setup Prompt Templates

Prepare prompt templates. The LLM prompt will be constructed based on the files in this directory. It starts with a priming to define the main goal and important notices, followed by some example problems and solutions. Each example problem is in the same format as the final problem (i.e., a unction signature to fuzz), and the solution is the corresponding human-written fuzz target for different functions from the same project or other projects. Prompt can also include more information of the function (e.g., its usage, source code, or parameter type definitions), and model-specific notes (e.g., common pitfalls to avoid).

You can pass an alternative template directory via --template-directory. The new template directory does not have to include all files: The framework will use files from template_xml/ by default when they are missing. The default prompt is structured as follows:

<Priming>
<Model-specific notes>
<Examples>
<Final question + Function information>

Generate Fuzz Target

The script run_all_experiments.py will generate fuzz targets via LLM using the prompt constructed above and measure their code coverage. All experiment data will be saved into the --work-dir.

Fix Compilation Error

When a fuzz target fails to build, the framework will automatically make five attempts to fix it before terminate. Each attempt asks LLM to fix the fuzz target based on the build failure from OSS-Fuzz, parses source code from the response, and re-compiles it.

Evaluate Fuzz Target

If the fuzz target compiles successfully, the framework fuzzes it with libFuzzer and measures its line coverage. The fuzzing timeout is specified by --run-timeout flag. Its line coverage is also compared against existing human-written fuzz targets from OSS-Fuzz in production.

Using Local Fuzz Introspector Instance

OSS-Fuzz-gen relies on Fuzz Introspector to extract information about the projects under analysis. This is done by querying https://introspector.oss-fuzz.com which offers a set of APIs to inspect OSS-Fuzz projects in a programmatic way.

It may be suited to run a local version of the Fuzz Introspector web application instead of directly querying https://introspector.oss-fuzz.com. This can be useful in scenarios such as testing extension to OSS-Fuzz-gen that requires new program analysis data, network bandwidth needs to be limited or perhaps the website is down. It's possible to set OSS-Fuzz-gen to use a local version of https://introspector.oss-fuzz.com by passing the -e flag to run_all_experiments.py. However, in order to do this, a local instance of the Fuzz Introspector endpoint will first need to be initialized locally. This is simple to do and we reference the Fuzz Introspector guide here for this.

Development

Contribution process

Development environment

Auto Format / Lint

You can a Git pre-push hook to auto-format/-lint your code:

./helper/add_pre-push_hook

Or manually run the formater/linter by running:

.github/helper/presubmit

Updating Dependencies

We use https://github.com/jazzband/pip-tools to manage our Python dependencies.

# Edit requirements.in
pip install pip-tools  # Required to re-generate requirements.txt from requirements.in
pip-compile requirements.in > requirements.txt
pip install -r requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

USAGE.md

USAGE.md

Usage guide

Prerequisites

Dependencies

Python Dependencies

LLM Access

Vertex AI

OpenAI

OpenAI API Key on OpenAI

OpenAI API Key on Azure

Running experiments

Benchmarks

Visualizing Results

Results report

Detailed workflows

Configure Benchmark

Setup Prompt Templates

Generate Fuzz Target

Fix Compilation Error

Evaluate Fuzz Target

Using Local Fuzz Introspector Instance

Development

Contribution process

Development environment

Auto Format / Lint

Updating Dependencies

Files

USAGE.md

Latest commit

History

USAGE.md

File metadata and controls

Usage guide

Prerequisites

Dependencies

Python Dependencies

LLM Access

Vertex AI

OpenAI

OpenAI API Key on OpenAI

OpenAI API Key on Azure

Running experiments

Benchmarks

Visualizing Results

Results report

Detailed workflows

Configure Benchmark

Setup Prompt Templates

Generate Fuzz Target

Fix Compilation Error

Evaluate Fuzz Target

Using Local Fuzz Introspector Instance

Development

Contribution process

Development environment

Auto Format / Lint

Updating Dependencies