|
| 1 | +# User Simulation |
| 2 | + |
| 3 | +<div class="language-support-tag"> |
| 4 | + <span class="lst-supported">Supported in ADK</span><span class="lst-python">Python v1.18.0</span> |
| 5 | +</div> |
| 6 | + |
| 7 | +When evaluating conversational agents, it is not always practical to use a fixed |
| 8 | +set of user prompts, as the conversation can proceed in unexpected ways. |
| 9 | +For example, if the agent needs the user to supply two values to perform a task, |
| 10 | +it may ask for those values one at a time or both at once. |
| 11 | +To resolve this issue, ADK can dynamically generate user prompts using a |
| 12 | +generative AI model. |
| 13 | + |
| 14 | +To use this feature, you must specify a |
| 15 | +[`ConversationScenario`](https://github.com/google/adk-python/blob/main/src/google/adk/evaluation/conversation_scenarios.py) |
| 16 | +which dictates the user's goals in their conversation with the agent. |
| 17 | +A sample conversation scenario for the |
| 18 | +[`hello_world`](https://github.com/google/adk-python/tree/main/contributing/samples/hello_world) |
| 19 | +agent is shown below: |
| 20 | + |
| 21 | +```json |
| 22 | +{ |
| 23 | + "starting_prompt": "What can you do for me?", |
| 24 | + "conversation_plan": "Ask the agent to roll a 20-sided die. After you get the result, ask the agent to check if it is prime." |
| 25 | +} |
| 26 | +``` |
| 27 | + |
| 28 | +The `starting_prompt` in a conversation scenario specifies a fixed initial |
| 29 | +prompt that the user should use to start the conversation with the agent. |
| 30 | +Specifying such fixed prompts for subsequent interactions with the agent is not |
| 31 | +practical as the agent may respond in different ways. |
| 32 | +Instead, the `conversation_plan` provides a guideline for how the rest of the |
| 33 | +conversation with the agent should proceed. |
| 34 | +An LLM uses this conversation plan, along with the conversation history, to |
| 35 | +dynamically generate user prompts until it judges that the conversation is |
| 36 | +complete. |
| 37 | + |
| 38 | +## Example: Evaluating the [`hello_world`](https://github.com/google/adk-python/tree/main/contributing/samples/hello_world) agent with conversation scenarios |
| 39 | + |
| 40 | +To add evaluation cases containing conversation scenarios to a new or existing |
| 41 | +[`EvalSet`](https://github.com/google/adk-python/blob/main/src/google/adk/evaluation/eval_set.py), |
| 42 | +you need to first create a list of conversation scenarios to test the agent in. |
| 43 | + |
| 44 | +Try saving the following to |
| 45 | +`contributing/samples/hello_world/conversation_scenarios.json`: |
| 46 | + |
| 47 | +```json |
| 48 | +{ |
| 49 | + "scenarios": [ |
| 50 | + { |
| 51 | + "starting_prompt": "What can you do for me?", |
| 52 | + "conversation_plan": "Ask the agent to roll a 20-sided die. After you get the result, ask the agent to check if it is prime." |
| 53 | + }, |
| 54 | + { |
| 55 | + "starting_prompt": "Hi, I'm running a tabletop RPG in which prime numbers are bad!", |
| 56 | + "conversation_plan": "Say that you don't care about the value; you just want the agent to tell you if a roll is good or bad. Once the agent agrees, ask it to roll a 6-sided die. Finally, ask the agent to do the same with 2 20-sided dice." |
| 57 | + } |
| 58 | + ] |
| 59 | +} |
| 60 | +``` |
| 61 | + |
| 62 | +You will also need a session input file containing information used during |
| 63 | +evaluation. |
| 64 | +Try saving the following to |
| 65 | +`contributing/samples/hello_world/session_input.json`: |
| 66 | + |
| 67 | +```json |
| 68 | +{ |
| 69 | + "app_name": "hello_world", |
| 70 | + "user_id": "user" |
| 71 | +} |
| 72 | +``` |
| 73 | + |
| 74 | +Then, you can add the conversation scenarios to an `EvalSet`: |
| 75 | + |
| 76 | +```bash |
| 77 | +# (optional) create a new EvalSet |
| 78 | +adk eval_set create \ |
| 79 | + contributing/samples/hello_world \ |
| 80 | + eval_set_with_scenarios |
| 81 | + |
| 82 | +# add conversation scenarios to the EvalSet as new eval cases |
| 83 | +adk eval_set add_eval_case \ |
| 84 | + contributing/samples/hello_world \ |
| 85 | + eval_set_with_scenarios \ |
| 86 | + --scenarios_file contributing/samples/hello_world/conversation_scenarios.json \ |
| 87 | + --session_input_file contributing/samples/hello_world/session_input.json |
| 88 | +``` |
| 89 | + |
| 90 | +By default, ADK runs evaluations with metrics that require the agent's expected |
| 91 | +response to be specified. |
| 92 | +Since that is not the case for a dynamic conversation scenario, we will use an |
| 93 | +[`EvalConfig`](https://github.com/google/adk-python/blob/main/src/google/adk/evaluation/eval_config.py) |
| 94 | +with some alternate supported metrics. |
| 95 | + |
| 96 | +Try saving the following to |
| 97 | +`contributing/samples/hello_world/eval_config.json`: |
| 98 | + |
| 99 | +```json |
| 100 | +{ |
| 101 | + "criteria": { |
| 102 | + "hallucinations_v1": { |
| 103 | + "threshold": 0.5, |
| 104 | + "evaluate_intermediate_nl_responses": true |
| 105 | + }, |
| 106 | + "safety_v1": { |
| 107 | + "threshold": 0.8 |
| 108 | + } |
| 109 | + } |
| 110 | +} |
| 111 | +``` |
| 112 | + |
| 113 | +Finally, you can use the `adk eval` command to run the evaluation: |
| 114 | + |
| 115 | +```bash |
| 116 | +adk eval \ |
| 117 | + contributing/samples/hello_world \ |
| 118 | + --config_file_path contributing/samples/hello_world/eval_config.json \ |
| 119 | + eval_set_with_scenarios \ |
| 120 | + --print_detailed_results |
| 121 | +``` |
| 122 | + |
| 123 | +## User simulator configuration |
| 124 | + |
| 125 | +You can override the default user simulator configuration to change the model, |
| 126 | +internal model behavior, and the maximum number of user-agent interactions. |
| 127 | +The below `EvalConfig` shows the default user simulator configuration: |
| 128 | + |
| 129 | +```json |
| 130 | +{ |
| 131 | + "criteria": { |
| 132 | + # same as before |
| 133 | + }, |
| 134 | + "user_simulator_config": { |
| 135 | + "model": "gemini-2.5-flash", |
| 136 | + "model_configuration": { |
| 137 | + "thinking_config": { |
| 138 | + "include_thoughts": true, |
| 139 | + "thinking_budget": 10240 |
| 140 | + } |
| 141 | + }, |
| 142 | + "max_allowed_invocations": 20 |
| 143 | + } |
| 144 | +} |
| 145 | +``` |
| 146 | + |
| 147 | +* `model`: The model backing the user simulator. |
| 148 | +* `model_configuration`: A |
| 149 | +[`GenerateContentConfig`](https://github.com/googleapis/python-genai/blob/6196b1b4251007e33661bb5d7dc27bafee3feefe/google/genai/types.py#L4295) |
| 150 | +which controls the model behavior. |
| 151 | +* `max_allowed_invocations`: The maximum user-agent interactions allowed before |
| 152 | +the conversation is forcefully terminated. This should be set to be greater than |
| 153 | +the longest reasonable user-agent interaction in your `EvalSet`. |
0 commit comments