-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test-to-harness: initial set up #511
Conversation
Signed-off-by: David Korczynski <[email protected]>
Signed-off-by: David Korczynski <[email protected]>
Will add a benchmark corpus before making this ready for review |
Signed-off-by: David Korczynski <[email protected]>
Signed-off-by: David Korczynski <[email protected]>
Signed-off-by: David Korczynski <[email protected]>
Signed-off-by: David Korczynski <[email protected]>
Signed-off-by: David Korczynski <[email protected]>
/gcbrun exp -n dk-test-to-harenss-1 -m vertex_ai_gemini-1-5 -b comparison from-test-small |
Signed-off-by: David Korczynski <[email protected]>
Signed-off-by: David Korczynski <[email protected]>
/gcbrun exp -n dk-test-to-harenss-2 -m vertex_ai_gemini-1-5 -b from-test-small |
Experiment from test-to-harness conversion: https://llm-exp.oss-fuzz.com/Result-reports/ofg-pr/2024-07-27-511-dk-test-to-harenss-2-from-test-small/index.html |
/gcbrun exp -n dk-1231231432-m vertex_ai_gemini-1-5 -b comparison |
/gcbrun exp -n dk-comparison-jj1 -m vertex_ai_gemini-1-5 -b comparison |
regular comparison benchmark: https://llm-exp.oss-fuzz.com/Result-reports/ofg-pr/2024-07-27-511-dk-comparison-jj1-comparison//index.html |
/gcbrun skip |
This is ready for review. The experiment https://llm-exp.oss-fuzz.com/Result-reports/ofg-pr/2024-07-27-511-dk-test-to-harenss-2-from-test-small/index.html shows improvements in several projects where we previously had no gains: https://llm-exp.oss-fuzz.com/Result-reports/scheduled/2024-07-06-weekly-all/index.html I think this is a promising direction, not least because we're seeing improvements and there are many technical improvements we can do since we now, more or less, only do a copy-paste of the test into the prompt. I think the PR is in a state though where we can do incremental improvements on this. Particularly I think there are improvements needed in (1) architecture around benchmarks; (2) more context around tests; (3) more experiments around tests, e.g. we copy whole files in now where we could probably refine this (e.g. where there are multiple tests in a file we can extract the tests). |
cppify_headers=cppify_headers, | ||
commit=commit, | ||
use_context=use_context, | ||
function_dict=function)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK for now, but would you please merge the same code in if/else block later to reduce repetition later?
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, will do
Signed-off-by: David Korczynski <[email protected]>
/gcbrun exp -n dk-comparisonasdf12 -m vertex_ai_gemini-1-5 -b minor-for-ci |
/gcbrun exp -n dk-comparisonasfdf12 -m vertex_ai_gemini-1-5 -b minor-for-ci |
This PR adds JVM project support for the test-to-harness approach initiated in #511. This PR also adds new benchmark set using the test-to-harness approach on Java projects. --------- Signed-off-by: Arthur Chan <[email protected]>
Ref: #494
Some more comments on this PR in #511 (comment)