-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Review the finetuning experiments we've done from the relevant notes in this order:
- Blog outline finetuning experiment sections
- Analysis
- Raw experiment notes - This is the temp notes I kept while I did the experiments, some of the notes were outdated, use the blog outline and analysis as the high abstracted source of truth
Review papers:
- SpatialLadder
- GTA1 paper
- An Efficient Training Pipeline For Graphical User Interface Reasoning Models
- AutoGUI
- ZonUI
- Other relevant papers you find
Review and identify what are the next steps for improving 7B CUA models on our perturbation evaluation (especially on the spatial relational instructions). I have listed some directions such as improving the training data mix, and using RL training methods instead of SFT with LoRA.
You should investigate for all possible directions (breadth first) and find evidences and justifications / uncertainties for how to prioritize these investigation directions based on the risks and time required.
The next step after this ticket will be investigating more in depth on the top priority direction and design minimal rapid experiments to validate specific hypothesis (e.g., do this in this way will improve 7B model on our perturbation eval spatial relational instructions)