[feature] AI Agent Evaluations

## Summary

Add an automated evaluation system for ByteChef AI agents with test scenarios, LLM/deterministic judges, async run execution, and a dedicated UI panel — integrated with `spring-ai-community/agent-judge`.

## Key Features

- **Test Scenarios**: Single-turn (message → response) and multi-turn (simulated conversation) scenarios
- **Two-Level Judges**: Agent-level judges (run on all scenarios) + scenario-level judges (scoped)
- **Judge Types**: LLM rule-based + deterministic (contains text, regex, response length, JSON schema, similarity)
- **Async Execution**: Runs execute asynchronously with progress tracking and cancellation
- **Results & History**: Score tracking, judge verdicts with explanations, conversation transcript storage
- **UI Panel**: New "Evals" tab in AI Agent Editor with Tests/Judges/Runs sub-tabs

## Design Spec

`docs/superpowers/specs/2026-03-15-agent-evaluations-design.md`

## Implementation Plan

`docs/superpowers/plans/2026-03-15-agent-evaluations.md`

## Phase 1 Scope

- Agent-level evaluations only (workflow-level deferred)
- Sequential scenario execution (parallel deferred)
- Informational results only (save gating deferred)
- TOOL_USAGE judge deferred (requires structured tool event capture)

## Tech Stack

- Backend: Spring Boot 4, Spring Data JDBC, Spring AI, spring-ai-community/agent-judge
- Frontend: React 19, TypeScript, Zustand, TanStack Query, GraphQL

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature] AI Agent Evaluations #4553

Summary

Key Features

Design Spec

Implementation Plan

Phase 1 Scope

Tech Stack

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[feature] AI Agent Evaluations #4553

Description

Summary

Key Features

Design Spec

Implementation Plan

Phase 1 Scope

Tech Stack

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions