-
Notifications
You must be signed in to change notification settings - Fork 191
helm support for sidecar injection in EPP #1821
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
helm support for sidecar injection in EPP #1821
Conversation
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Thanks @capri-xiyue! Can we make this a distinct helm chart? Since we call this current chart |
|
@kfswain For this stage, EPP standalone mode still uses InferencePool. That's why I didn't put it in a distinct chart. Later, if we decide to have a standalone EPP without any k8s crd, I will refactor it to a distinct helm chart. To clarify, this PR just removes gateway api dependency, inferencepool api is still used.
|
|
/assign @ahg-g |
That may lead to confusing UX in the case where this is deployed in a cluster that has a GW controller, as it will attempt to reconcile on the InferencePool and integrate it into the GW system, is modifying EPP to just accept a selector not a viable path forward? |
I talked with @ahg-g before, modifying EPP to just accept a selector needs further discussion. Therefore he suggested me finalizing EPP with envoy proxy first with helm chart. Curious now what will happen when a inference pool deployed in a cluster with two GW controller?(for example kgateway and istio), will it cause issues here? Initially I thought each GW controller is able to handle this case. |
|
Yeah, my suggestion is to take a gradual approach, a gateway controller should not care about an inferencePool that is not referenced by an httpRoute. |
|
As an update, I'm now working on another PR to modify EPP to just accept a selector and will refactor the helm chart to have a distinct one as no inference pool is needed in that PR. EPP refactor probably takes a little while as fix bunch of ut takes time. Will let you know when I send a PR. @kfswain |
Yes. I've updated it to enable additional sidecar proxy injection. |
ahg-g
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
@danehans can you please make sure this works for agentgateway?
| # # Because the template just dumps this section, the keys become filenames. | ||
| # # The values MUST be strings (note the literal block scalar '|') | ||
| # data: | ||
| # envoy.yaml: | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What assumptions are we making in this configuration with regards to the EPP configuration, the configuration of the model servers and inferencePool? Put another way, is this configuration assuming that the epp/servers/infoPool is configuration in a specific way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example envoy config only assumes the communication between envoy proxy and epp communication. It has nothing to do with model server and inference pool. By the way, such envoy config can be customized by users. Users can use what ever config they want to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we want a config that just works, and the helm chart we offer should not even offer any customization for this config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. The current default config works as I just disable side car injection. By the way, originally the example envoy config was commented already. Now I just removed it as it will live in llm-d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current default config works as I just disable side car injection
I was referring to the envoy config, that config in llm-d should just work and not be customizable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. The envoy config in llm-d will just work and won't be customizable. I tested it with no inference pool dependency. Will ping you when I open a PR in llm-d
|
/lgtm Great! |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahg-g, capri-xiyue The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
What this PR does / why we need it:
see #1778
Which issue(s) this PR fixes:
Fixes #1778
Does this PR introduce a user-facing change?: