ai-sdk-bench

AI SDK benchmarking tool that tests AI agents with MCP (Model Context Protocol) integration. Automatically discovers and runs all tests in the tests/ directory, verifying LLM-generated Svelte components against test suites.

Installation

To install dependencies:

bun install

Setup

To set up .env:

cp .env.example .env

Then configure your API keys and model in .env:

# Required: Choose your model
MODEL=anthropic/claude-sonnet-4
ANTHROPIC_API_KEY=your_key_here

# Optional: Enable MCP integration (leave empty to disable)
MCP_SERVER_URL=https://mcp.svelte.dev/mcp

Environment Variables

Required:

MODEL: The AI model to use (e.g., anthropic/claude-sonnet-4, openai/gpt-5, openrouter/anthropic/claude-sonnet-4, lmstudio/model-name)
Corresponding API key (ANTHROPIC_API_KEY, OPENAI_API_KEY, or OPENROUTER_API_KEY)
- Note: No API key required for lmstudio/* models (runs locally)

Optional:

MCP_SERVER_URL: MCP server URL (leave empty to disable MCP integration)

Supported Providers

Cloud Providers:

anthropic/* - Direct Anthropic API (requires ANTHROPIC_API_KEY)
openai/* - Direct OpenAI API (requires OPENAI_API_KEY)
openrouter/* - OpenRouter unified API (requires OPENROUTER_API_KEY)

Local Providers:

lmstudio/* - LM Studio local server (requires LM Studio running on http://localhost:1234)

Example configurations:

# Anthropic
MODEL=anthropic/claude-sonnet-4
ANTHROPIC_API_KEY=sk-ant-...

# OpenAI
MODEL=openai/gpt-5
OPENAI_API_KEY=sk-...

# OpenRouter
MODEL=openrouter/anthropic/claude-sonnet-4
OPENROUTER_API_KEY=sk-or-...

# LM Studio (local)
MODEL=lmstudio/llama-3-8b
# No API key needed - make sure LM Studio is running!

Usage

To run the benchmark (automatically discovers and runs all tests):

bun run index.ts

The benchmark will:

Discover all tests in tests/ directory
For each test:
- Run the AI agent with the test's prompt
- Extract the generated Svelte component
- Verify the component against the test suite
Generate a combined report with all results

Results are saved to the results/ directory with timestamped filenames:

results/result-2024-12-07-14-30-45.json - Full execution trace with all test results
results/result-2024-12-07-14-30-45.html - Interactive HTML report with expandable test sections

The HTML report includes:

Summary bar showing passed/failed/skipped counts
Expandable sections for each test
Step-by-step execution trace
Generated component code
Test verification results with pass/fail details
Token usage statistics
MCP status badge
Dark/light theme toggle

To regenerate an HTML report from a JSON file:

# Regenerate most recent result
bun run generate-report.ts

# Regenerate specific result
bun run generate-report.ts results/result-2024-12-07-14-30-45.json

Test Structure

Each test in the tests/ directory should have:

tests/
  {test-name}/
    Reference.svelte  - Reference implementation (known-good solution)
    test.ts          - Vitest test file (imports "./Component.svelte")
    prompt.md        - Prompt for the AI agent

The benchmark:

Reads the prompt from prompt.md
Asks the agent to generate a component
Writes the generated component to a temporary location
Runs the tests against the generated component
Reports pass/fail status

Verifying Reference Implementations

To verify that all reference implementations pass their tests:

bun run verify-tests

This copies each Reference.svelte to Component.svelte temporarily and runs the tests.

MCP Integration

The tool supports optional integration with MCP (Model Context Protocol) servers:

Enabled: Set MCP_SERVER_URL to a valid MCP server URL
Disabled: Leave MCP_SERVER_URL empty or unset

MCP status is documented in both the JSON metadata and displayed as a badge in the HTML report.

Exit Codes

0: All tests passed
1: One or more tests failed

Documentation

See AGENTS.md for detailed documentation on:

Architecture and components
Environment variables and model configuration
MCP integration details
Development commands
Multi-test result format

This project was created using bun init in bun v1.3.3. Bun is a fast all-in-one JavaScript runtime.

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
lib		lib
patches		patches
results		results
tests		tests
.cocoignore		.cocoignore
.cocominify		.cocominify
.env.example		.env.example
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc		.prettierrc
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
README.md		README.md
bun.lock		bun.lock
generate-report.ts		generate-report.ts
index.ts		index.ts
package.json		package.json
svelte.config.js		svelte.config.js
tsconfig.json		tsconfig.json
verify-references.ts		verify-references.ts
vitest-setup.js		vitest-setup.js
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ai-sdk-bench

Installation

Setup

Environment Variables

Supported Providers

Usage

Test Structure

Verifying Reference Implementations

MCP Integration

Exit Codes

Documentation

About

Uh oh!

Releases

Packages

Languages

sveltejs/ai

Folders and files

Latest commit

History

Repository files navigation

ai-sdk-bench

Installation

Setup

Environment Variables

Supported Providers

Usage

Test Structure

Verifying Reference Implementations

MCP Integration

Exit Codes

Documentation

About

Resources

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages