Skip to content

Commit 701b9ab

Browse files
committed
Release v0.1.1
Signed-off-by: kerthcet <[email protected]>
1 parent f4541b0 commit 701b9ab

File tree

9 files changed

+723
-701
lines changed

9 files changed

+723
-701
lines changed

chart/Chart.yaml

+2-2
Original file line numberDiff line numberDiff line change
@@ -13,9 +13,9 @@ type: application
1313
# This is the chart version. This version number should be incremented each time you make changes
1414
# to the chart and its templates, including the app version.
1515
# Versions are expected to follow Semantic Versioning (https://semver.org/)
16-
version: 0.0.6
16+
version: 0.0.7
1717
# This is the version number of the application being deployed. This version number should be
1818
# incremented each time you make changes to the application. Versions are not expected to
1919
# follow Semantic Versioning. They should reflect the version the application is using.
2020
# It is recommended to use it with quotes.
21-
appVersion: 0.1.0
21+
appVersion: 0.1.1

chart/crds/backendruntime-crd.yaml

+623-615
Large diffs are not rendered by default.

chart/crds/openmodel-crd.yaml

+17-17
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,23 @@ spec:
7070
- Pod scheduling with node selectors specified.
7171
- Cluster autoscaling with essential parameters provided.
7272
properties:
73+
limits:
74+
additionalProperties:
75+
anyOf:
76+
- type: integer
77+
- type: string
78+
pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$
79+
x-kubernetes-int-or-string: true
80+
description: |-
81+
Limits defines the required accelerators to serve the model for each replica,
82+
like <nvidia.com/gpu: 8>. For multi-hosts cases, the limits here indicates
83+
the resource requirements for each replica, usually equals to the TP size.
84+
Not recommended to set the cpu and memory usage here:
85+
- if using playground, you can define the cpu/mem usage at backendConfig.
86+
- if using inference service, you can define the cpu/mem at the container resources.
87+
However, if you define the same accelerator resources at playground/service as well,
88+
the resources will be overwritten by the flavor limit here.
89+
type: object
7390
name:
7491
description: Name represents the flavor name, which will
7592
be used in model claim.
@@ -92,23 +109,6 @@ spec:
92109
with <INSTANCE-TYPE: p4d.24xlarge> for AWS.
93110
Preset parameters: TP, PP, INSTANCE-TYPE.
94111
type: object
95-
requests:
96-
additionalProperties:
97-
anyOf:
98-
- type: integer
99-
- type: string
100-
pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$
101-
x-kubernetes-int-or-string: true
102-
description: |-
103-
Requests defines the required accelerators to serve the model for each replica,
104-
like <nvidia.com/gpu: 8>. For multi-hosts cases, the requests here indicates
105-
the resource requirements for each replica, usually equals to the TP size.
106-
Not recommended to set the cpu and memory usage here:
107-
- if using playground, you can define the cpu/mem usage at backendConfig.
108-
- if using inference service, you can define the cpu/mem at the container resources.
109-
However, if you define the same accelerator requests at playground/service as well,
110-
the requests will be overwritten by the flavor requests.
111-
type: object
112112
required:
113113
- name
114114
type: object

chart/crds/playground-crd.yaml

+55-63
Original file line numberDiff line numberDiff line change
@@ -47,22 +47,23 @@ spec:
4747
properties:
4848
args:
4949
description: |-
50-
Args represents the specified arguments of the backendRuntime,
51-
will be append to the backendRuntime.spec.Args.
52-
properties:
53-
flags:
54-
description: |-
55-
Flags represents all the preset configurations.
56-
Flag around with {{ .CONFIG }} is a configuration waiting for render.
57-
items:
58-
type: string
59-
type: array
60-
name:
61-
default: default
62-
description: Name represents the identifier of the backendRuntime
63-
argument.
64-
type: string
65-
type: object
50+
Args represents all the arguments for the command.
51+
Argument around with {{ .CONFIG }} is a configuration waiting for render.
52+
Args defined here will "append" the args in the recommendedConfig.
53+
items:
54+
type: string
55+
type: array
56+
backendName:
57+
default: vllm
58+
description: BackendName represents the inference backend under
59+
the hood, e.g. vLLM.
60+
type: string
61+
configName:
62+
description: |-
63+
ConfigName represents the recommended configuration name for the backend,
64+
It will be inferred from the models in the runtime if not specified, e.g. default,
65+
speculative-decoding or model-parallelism.
66+
type: string
6667
envs:
6768
description: Envs represents the environments set to the container.
6869
items:
@@ -183,16 +184,12 @@ spec:
183184
- name
184185
type: object
185186
type: array
186-
name:
187-
default: vllm
188-
description: Name represents the inference backend under the hood,
189-
e.g. vLLM.
190-
type: string
191187
resources:
192188
description: |-
193189
Resources represents the resource requirements for backend, like cpu/mem,
194190
accelerators like GPU should not be defined here, but at the model flavors,
195191
or the values here will be overwritten.
192+
Resources defined here will "overwrite" the resources in the recommendedConfig.
196193
properties:
197194
limits:
198195
additionalProperties:
@@ -219,38 +216,11 @@ spec:
219216
More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
220217
type: object
221218
type: object
222-
version:
223-
description: |-
224-
Version represents the backend version if you want a different one
225-
from the default version.
226-
type: string
227-
type: object
228-
elasticConfig:
229-
description: |-
230-
ElasticConfig defines the configuration for elastic usage,
231-
e.g. the max/min replicas.
232-
Note: this requires to install the HPA first or will report error.
233-
properties:
234-
maxReplicas:
235-
description: |-
236-
MaxReplicas indicates the maximum number of inference workloads based on the traffic.
237-
Default to nil means there's no limit for the instance number.
238-
format: int32
239-
type: integer
240-
minReplicas:
241-
default: 1
242-
description: |-
243-
MinReplicas indicates the minimum number of inference workloads based on the traffic.
244-
Default to 1.
245-
MinReplicas couldn't be 0 now, will support serverless in the future.
246-
format: int32
247-
type: integer
248219
scaleTrigger:
249220
description: |-
250-
ScaleTrigger defines a set of triggers to scale the workloads.
251-
If not defined, trigger configured in backendRuntime will be used,
252-
otherwise, trigger defined here will overwrite the defaulted ones.
253-
ScaleTriggerRef and ScaleTrigger can't be set at the same time.
221+
ScaleTrigger defines the rules to scale the workloads.
222+
Only one trigger cloud work at a time, mostly used in Playground.
223+
ScaleTrigger defined here will "overwrite" the scaleTrigger in the recommendedConfig.
254224
properties:
255225
hpa:
256226
description: HPA represents the trigger configuration of the
@@ -859,19 +829,41 @@ spec:
859829
type: array
860830
type: object
861831
type: object
862-
scaleTriggerRef:
832+
sharedMemorySize:
833+
anyOf:
834+
- type: integer
835+
- type: string
863836
description: |-
864-
ScaleTriggerRef refers to the configured scaleTrigger in the backendRuntime
865-
with tuned target value.
866-
ScaleTriggerRef and ScaleTrigger can't be set at the same time.
867-
properties:
868-
name:
869-
description: Name represents the scale trigger name defined
870-
in the backendRuntime.scaleTriggers.
871-
type: string
872-
required:
873-
- name
874-
type: object
837+
SharedMemorySize represents the size of /dev/shm required in the runtime of
838+
inference workload.
839+
SharedMemorySize defined here will "overwrite" the sharedMemorySize in the recommendedConfig.
840+
pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$
841+
x-kubernetes-int-or-string: true
842+
version:
843+
description: |-
844+
Version represents the backend version if you want a different one
845+
from the default version.
846+
type: string
847+
type: object
848+
elasticConfig:
849+
description: |-
850+
ElasticConfig defines the configuration for elastic usage,
851+
e.g. the max/min replicas.
852+
properties:
853+
maxReplicas:
854+
description: |-
855+
MaxReplicas indicates the maximum number of inference workloads based on the traffic.
856+
Default to nil means there's no limit for the instance number.
857+
format: int32
858+
type: integer
859+
minReplicas:
860+
default: 1
861+
description: |-
862+
MinReplicas indicates the minimum number of inference workloads based on the traffic.
863+
Default to 1.
864+
MinReplicas couldn't be 0 now, will support serverless in the future.
865+
format: int32
866+
type: integer
875867
type: object
876868
modelClaim:
877869
description: |-

chart/templates/manager-rbac.yaml

+12
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,18 @@ rules:
1414
- list
1515
- update
1616
- watch
17+
- apiGroups:
18+
- ""
19+
resources:
20+
- services
21+
verbs:
22+
- create
23+
- delete
24+
- get
25+
- list
26+
- patch
27+
- update
28+
- watch
1729
- apiGroups:
1830
- admissionregistration.k8s.io
1931
resources:

chart/values.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ controllerManager:
3333
- ALL
3434
image:
3535
repository: inftyai/llmaz
36-
tag: v0.1.0
36+
tag: v0.1.1
3737
resources:
3838
limits:
3939
cpu: 500m

config/manager/kustomization.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,4 @@ kind: Kustomization
55
images:
66
- name: controller
77
newName: inftyai/llmaz
8-
newTag: v0.1.0
8+
newTag: v0.1.1

docs/installation.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
```cmd
1313
helm repo add inftyai https://inftyai.github.io/llmaz
1414
helm repo update
15-
helm install llmaz inftyai/llmaz --namespace llmaz-system --create-namespace --version 0.0.6
15+
helm install llmaz inftyai/llmaz --namespace llmaz-system --create-namespace --version 0.0.7
1616
```
1717

1818
### Uninstall

index.yaml

+11-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,16 @@
11
apiVersion: v1
22
entries:
33
llmaz:
4+
- apiVersion: v2
5+
appVersion: 0.1.1
6+
created: "2025-02-18T14:46:30.474789+08:00"
7+
description: A Helm chart for llmaz
8+
digest: b30ba8a78986cba95256d4869f4f5bd0bd79c5d25867497021b80ae5f1ee04f0
9+
name: llmaz
10+
type: application
11+
urls:
12+
- https://inftyai.github.io/llmaz/llmaz-0.0.7.tgz
13+
version: 0.0.7
414
- apiVersion: v2
515
appVersion: 0.1.0
616
created: "2025-01-25T01:22:38.666093+08:00"
@@ -61,4 +71,4 @@ entries:
6171
urls:
6272
- https://inftyai.github.io/llmaz/llmaz-0.0.1.tgz
6373
version: 0.0.1
64-
generated: "2025-01-25T01:22:38.647336+08:00"
74+
generated: "2025-02-18T14:46:30.460221+08:00"

0 commit comments

Comments
 (0)