-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add benchmark automation tool #563
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: liu-cong The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
/hold |
@@ -0,0 +1,39 @@ | |||
#!/bin/bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a lot of bash. What's the principle underpinning how this will evolve? If you need to do analysis of the output, bash will quickly hit limits.
I kind of expected either a go code / go test like structure, or a python / colab style experimentation setup. Agree some of these things are good for bash, but not all of them are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bash is mostly kubectl commands to set up and tear down the benchmarks, output analysis is done with a Jupyter notebook.
But agree there is quite a lot of bash, and some of it can/should be done with python or go for better readability and testing.
I think a key principle to discuss is the benchmark config API, as well as the benchmark results API (currently implicitly defined by the LPG tool json format). With those defined people can come up with different implementations of automating the benchmark. And bash, or go, or juputer notebook is more of an implementation detail.
What are we thinking with this one? |
I think most folks are busy with other things and didn't have time to try this out. I know @kaushikmitr was planning to try this. I am happy to put this to a branch, and defer the decision once we have more concrete feedbacks. |
This is a tool to automatically generate benchmark manifests deploying vllm model servers and an inference gateway or a k8s service, and runs benchmarks against them. It automatically deploys the resources in a new namespace, collects results, and tears down the resources.
This tool is experimental. I tested this tool using GKE. In theory, it should work on any cluster, though it may require minimal modifications.
EDIT:
This tool currently has many limitations and will require non-trivial improvements to become a stable tool for long term maintenance. In the meantime, I found it very useful and saved me a lot of time, so I'd like to share it out so people may find it useful. I am seeking feedback on whether we should merge it now, or keep it in a branch and start thinking about changes that benefit us in the long term..
Future improvements