Skip to content

Simula-COMPLEX/DriveRLR

Repository files navigation

A Tool for Benchmarking Large Language Models' Robustness in Assessing the Realism of Driving Scenarios

Abstract

In recent years, autonomous driving systems have made significant progress, yet ensuring their safety remains a key challenge. To this end, scenario-based testing offers a practical solution, and simulation-based methods have gained traction due to the high cost and risk of real-world testing. However, evaluating the realism of simulated scenarios remains difficult, creating demand for effective assessment methods. Recent advances show that Large Language Models (LLMs) possess strong reasoning and generalization capabilities, suggesting their potential in assessing scenario realism through scenario-related textual prompts. Motivated by this, we propose DriveRLR, a benchmark tool to assess the robustness of LLMs in evaluating the realism of driving scenarios. DriveRLR generates mutated scenario variants, constructs prompts, which are then used to assess a given LLM's ability and robustness in determining the realism of driving scenarios. We validate DriveRLR on the DeepScenario dataset using three state-of-the-art LLMs: GPT-5, Llama 4 Maverick, and Mistral Small 3.2. Results show that DriveRLR effectively reveals differences in the robustness of various LLMs, demonstrating its effectiveness and practical value in scenario realism assessment. Beyond LLM robustness evaluation, DriveRLR can serve as a practical component in applications such as an objective function to guide scenario generation, supporting simulation-based ADS testing workflows.

pipeline

Setup

Follow the instructions below to set up and configure the environment.

# 1) Enter the project directory
cd DriveRLR

# 2) Create and activate a conda environment
conda create -n DriveRLR python=3.9 -y
conda activate DriveRLR

# 3) Install dependencies
pip install -r requirements.txt

# 4) Install build tooling and build the wheel/sdist from source
python -m pip install --upgrade pip build
python -m build

# 5) Install the built wheel locally
pip install dist/driverlr-0.1.0-py3-none-any.whl

Usage

There are three ways to use this tool:

1. Run in Terminal

Run the tool directly without writing any code. Each parameter can be set via command-line input, with default values shown. Final output location will also be displayed.

python tool.py

2. Use in Python Script

Call specific functions in your own Python code. We provide several callable functions. See example below:

python example.py

3. Modify the Source Code

You can modify the source code to fit your needs. For example:

  • Change how scenario parameters are mutated
  • Modify prompt templates
  • Add new evaluation metrics

After modification, recompile or run as needed.

Project Structure

DriveRLR/
├── assets/                     # Images or other static resources
├── data/                       # Input/output data files
├── dist/                       # Built distributions (.whl, etc.)
├── src/                        # Source code directory
├── example.py                  # Example usage script
├── LICENSE                     # License file
├── pyproject.toml              # Build/configuration file for Python packaging
├── README.md                   # Project documentation
├── requirements.txt            # Python dependencies
├── scenario-toolset.tar.gz     # Archived toolset package
└── tool.py                     # Main script to run the tool

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages