Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stage 2 dataset issue #20

Open
korbinian-hoermann opened this issue Feb 9, 2025 · 1 comment
Open

Stage 2 dataset issue #20

korbinian-hoermann opened this issue Feb 9, 2025 · 1 comment

Comments

@korbinian-hoermann
Copy link

Hi!

I had a look at the guiact-web-multi datasets (l1-l3).
The conversations I saw, look like this (e.g. for uid_record_07674_step_00):

  | [{'end_turn': None, 'from': 'system', 'recipient': None, 'value': 'You are a GUI agent. You are given a task and a screenshot of the screen. You need to perform a series of pyautogui actions to complete the task.\n\nYou have access to the following functions:\n- {"name": "answer", "description": "Answer a question", "parameters": {"type": "object", "properties": {"answer": {"type": "string", "description": "The answer to the question"}}, "required": ["answer"]}}\n'} {'end_turn': None, 'from': 'human', 'recipient': None, 'value': "\nPlease generate the next move according to the UI screenshot, instruction and previous actions.\n\nInstruction: What information does the site provide about Judith Lauand's career, works and exhibitions?\n\nPrevious actions:\nNone"} {'end_turn': False, 'from': 'gpt', 'recipient': 'all', 'value': "Observation: The screenshot shows a dropdown menu on MutualArt with search results for 'Judith Lauand'. There are multiple entries detailing her exhibitions and mentions, such as in São Paulo, London, and various art reviews.\nThought: The goal is to gather information about Judith Lauand’s career, works, and exhibitions. The list provides various leads, each likely directing to detailed pages about specific aspects of her career. Selecting an option from the dropdown is essential to access more detailed information.\nAction: Click on the link labeled 'Judith Lauand: Brazilian 1922-2022' to explore more about her career and exhibitions.\n"} {'end_turn': True, 'from': 'gpt', 'recipient': 'os', 'value': 'pyautogui.click(x=0.41, y=0.178)'}]

Why is there only the answer function specified, not the other ones (clicking, scrolling, ...)?

@Timothyxxx
Copy link
Member

Hi, as mentioned in the paper, since clicking and scrolling as the inner function of pyautogui, we don't need to explicitly add these info in the system prompt as it is not additional knowledge. In other words, we tune these knowledge into the inner model instead of teach it to learn to read from system prompt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants