Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmark automation tool #563

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

liu-cong
Copy link
Contributor

@liu-cong liu-cong commented Mar 24, 2025

This is a tool to automatically generate benchmark manifests deploying vllm model servers and an inference gateway or a k8s service, and runs benchmarks against them. It automatically deploys the resources in a new namespace, collects results, and tears down the resources.

This tool is experimental. I tested this tool using GKE. In theory, it should work on any cluster, though it may require minimal modifications.

EDIT:
This tool currently has many limitations and will require non-trivial improvements to become a stable tool for long term maintenance. In the meantime, I found it very useful and saved me a lot of time, so I'd like to share it out so people may find it useful. I am seeking feedback on whether we should merge it now, or keep it in a branch and start thinking about changes that benefit us in the long term..

Future improvements

  • This tool currently has its own helm charts and it requires effort to keep them up to date. It will be nice to have these charts standardized across this repo.
  • The tool makes heavy use of bash scripts which is hard to test/maintain long term.
  • Improve code structure, readability and testing

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 24, 2025
@k8s-ci-robot k8s-ci-robot requested review from ahg-g and kfswain March 24, 2025 17:01
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: liu-cong
Once this PR has been reviewed and has the lgtm label, please assign arangogutierrez for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Mar 24, 2025
Copy link

netlify bot commented Mar 24, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 0d5e02c
🔍 Latest deploy log https://app.netlify.com/sites/gateway-api-inference-extension/deploys/67e19c0cc386b000085761ea
😎 Deploy Preview https://deploy-preview-563--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@kfswain
Copy link
Collaborator

kfswain commented Mar 24, 2025

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 24, 2025
@liu-cong
Copy link
Contributor Author

cc @kaushikmitr @achandrasekar

@@ -0,0 +1,39 @@
#!/bin/bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot of bash. What's the principle underpinning how this will evolve? If you need to do analysis of the output, bash will quickly hit limits.

I kind of expected either a go code / go test like structure, or a python / colab style experimentation setup. Agree some of these things are good for bash, but not all of them are.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bash is mostly kubectl commands to set up and tear down the benchmarks, output analysis is done with a Jupyter notebook.

But agree there is quite a lot of bash, and some of it can/should be done with python or go for better readability and testing.

I think a key principle to discuss is the benchmark config API, as well as the benchmark results API (currently implicitly defined by the LPG tool json format). With those defined people can come up with different implementations of automating the benchmark. And bash, or go, or juputer notebook is more of an implementation detail.

@kfswain
Copy link
Collaborator

kfswain commented Apr 9, 2025

What are we thinking with this one?

@liu-cong
Copy link
Contributor Author

liu-cong commented Apr 9, 2025

I think most folks are busy with other things and didn't have time to try this out. I know @kaushikmitr was planning to try this.
Some solid feedback will help us decide whether we want to maintain this.

I am happy to put this to a branch, and defer the decision once we have more concrete feedbacks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants