Skip to content

Conversation

heiko-hotz
Copy link

Description

This pull request introduces a new evaluation harness designed to bridge the gap between agents built with Google's Agent Development Kit (ADK) and the Tau2 Bench evaluation framework.

This initial version includes the core components needed for end-to-end evaluation:

  1. Main Evaluation Runner (run_evaluation.py):
  • Orchestrates the conversational flow between the Tau2 User Simulator and the ADK Agent.
  • Dynamically loads ADK agents from a specified file path.
  • Injects the task-specific Tau2 domain policy into the ADK agent's instructions at runtime.
  1. Tool Mapping & Translation Layer (harness/tool_mapper.py):
  • Provides a simple, extensible system for mapping tool names and arguments from the ADK agent's perspective to the Tau2 environment's implementation.
  • It intercepts FunctionCall events from the ADK agent, translates them, and executes the real tool within the Tau2 environment.
  1. Sample ADK Agent (sample_adk_agent/):
  • A fully functional example agent for the airline domain is included.
  • This serves as a clear template for how to structure an ADK agent to be compatible with this harness.
  1. Comprehensive Documentation (README.md):
  • A detailed README.md explains the project's purpose, architecture, and provides clear instructions for setup, usage, and extension to new domains.

Copy link
Collaborator

@mstyer-google mstyer-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks great. Check your line lengths across the board. Most of my other comments are just suggestions and not requirements - if you decide not to implement them just ack the comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants