Skip to content

Commit 088928c

Browse files
author
thomassong
committed
Update README and deploy yaml
Signed-off-by: thomassong <[email protected]>
1 parent e4b3d85 commit 088928c

File tree

6 files changed

+92
-16
lines changed

6 files changed

+92
-16
lines changed

Makefile

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,6 @@ all:
66
clean:
77
rm -rf ./go
88

9-
.PHONY: vendor
10-
vendor:
11-
rm -rf vendor
12-
hack/glide.sh
13-
149
.PHONY: test
1510
test:
1611
hack/build.sh "test"

README.md

Lines changed: 84 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,41 @@
11
# GPU Manager
22

3+
[![Build Status](https://travis-ci.org/tkestack/gpu-manager.svg?branch=master)](https://travis-ci.org/tkestack/gpu-manager)
4+
35
GPU Manager is used for managing the nvidia GPU devices in Kubernetes cluster. It implements the `DevicePlugin` interface
46
of Kubernetes. So it's compatible with 1.9+ of Kubernetes release version.
57

68
To compare with the combination solution of `nvidia-docker`
79
and `nvidia-k8s-plugin`, GPU manager will use native `runc` without modification but nvidia solution does.
810
Besides we also support metrics report without deploying new components.
911

10-
To schedule a GPU payload correctly, GPU manager should work with `gpu-quota-admission` which is a kubernetes scheduler plugin.
12+
To schedule a GPU payload correctly, GPU manager should work with [gpu-admission](https://github.com/tkestack/gpu-admission) which is a
13+
kubernetes scheduler plugin.
1114

1215
GPU manager also supports the payload with fraction resource of GPU device such as 0.1 card or 100MiB gpu device memory.
1316
If you want this kind feature, please refer to [vcuda-controller](https://github.com/tkestack/vcuda-controller) project.
1417

15-
# How to deploy GPU Manager
18+
## Build
19+
20+
**1.** Build binary
21+
22+
- Prerequisite
23+
- CUDA toolkit
24+
25+
```
26+
make
27+
```
28+
29+
**2.** Build image
30+
31+
- Prerequisite
32+
- Docker
33+
34+
```
35+
make img
36+
```
37+
38+
## Deploy
1639

1740
GPU Manager is running as daemonset, and because of the RABC restriction and hydrid cluster,
1841
you need to do the following steps to make this daemonset run correctly.
@@ -30,6 +53,63 @@ kubectl create clusterrolebinding gpu-manager-role --clusterrole=cluster-admin -
3053
kubectl label node <node> nvidia-device-enable=enable
3154
```
3255

33-
- change gpu-manager.yaml and submit
56+
## Pod template example
57+
58+
There is nothing special to submit a Pod except the description of GPU resource is no longer 1
59+
. The GPU
60+
resources are described as that 100 `tencent.com/vcuda-core` for 1 GPU and N `tencent.com/vcuda
61+
-memory` for GPU memory (1 tencent.com/vcuda-memory means 256Mi
62+
GPU memory). And because of the limitation of extend resource validation of Kubernetes, to support
63+
GPU utilization limitation, you should add `tencent.com/vcuda-core-limit: XX` in the annotation
64+
field of a Pod.
65+
66+
**Notice: the value of `tencent.com/vcuda-core` is either the multiple of 100 or any value
67+
smaller than 100.For example, 100, 200 or 20 is valid value but 150 or 250 is invalid**
68+
69+
- Submit a Pod with 0.3 GPU utilization and 7680MiB GPU memory with 0.5 GPU utilization limit
3470

35-
change --incluster-mode from `false` to `true`, change image field to `<your repository>/public/gpu-manager:latest`, add serviceAccount filed to `gpu-manager-role`
71+
```
72+
apiVersion: v1
73+
kind: Pod
74+
metadata:
75+
name: vcuda
76+
annotation:
77+
tencent.com/vcuda-core-limit: 50
78+
spec:
79+
restartPolicy: Never
80+
hostNetwork: true
81+
containers:
82+
- image: <test-image>
83+
name: nvidia
84+
command: ['/usr/local/nvidia/bin/nvidia-smi']
85+
resources:
86+
requests:
87+
tencent.com/vcuda-core: 50
88+
tencent.com/vcuda-memory: 30
89+
limits:
90+
tencent.com/vcuda-core: 50
91+
tencent.com/vcuda-memory: 30
92+
```
93+
94+
- Submit a Pod with 2 GPU card
95+
96+
```
97+
apiVersion: v1
98+
kind: Pod
99+
metadata:
100+
name: vcuda
101+
spec:
102+
restartPolicy: Never
103+
hostNetwork: true
104+
containers:
105+
- image: <test-image>
106+
name: nvidia
107+
command: ['/usr/local/nvidia/bin/nvidia-smi']
108+
resources:
109+
requests:
110+
tencent.com/vcuda-core: 200
111+
tencent.com/vcuda-memory: 60
112+
limits:
113+
tencent.com/vcuda-core: 200
114+
tencent.com/vcuda-memory: 60
115+
```

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0.2.0
1+
1.0.3

gpu-manager.yaml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ spec:
1515
labels:
1616
name: gpu-manager-ds
1717
spec:
18+
serviceAccount: gpu-manager
1819
tolerations:
1920
# This toleration is deprecated. Kept here for backward compatibility
2021
# See https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/
@@ -33,7 +34,7 @@ spec:
3334
nvidia-device-enable: enable
3435
hostPID: true
3536
containers:
36-
- image: gpu-manager:latest
37+
- image: tkestack/gpu-manager:1.0.3
3738
imagePullPolicy: Always
3839
name: gpu-manager
3940
securityContext:
@@ -62,7 +63,7 @@ spec:
6263
- name: LOG_LEVEL
6364
value: "4"
6465
- name: EXTRA_FLAGS
65-
value: "--incluster-mode=false"
66+
value: "--incluster-mode=true"
6667
- name: NODE_NAME
6768
valueFrom:
6869
fieldRef:

hack/build.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ function plugin::build_binary() {
4242
function plugin::generate_img() {
4343
readonly local commit=$(git log --no-merges --oneline | wc -l | sed -e 's,^[ \t]*,,')
4444
readonly local version=$(<"${ROOT}/VERSION")
45-
readonly local base_img=${BASE_IMG:-"centos:7"}
45+
readonly local base_img=${BASE_IMG:-"tkestack/vcuda:1.0"}
4646

4747
mkdir -p "${ROOT}/go/build"
4848
tar czf "${ROOT}/go/build/gpu-manager-source.tar.gz" --transform 's,^,/gpu-manager-'${version}'/,' $(plugin::source_targets)
@@ -55,7 +55,7 @@ function plugin::generate_img() {
5555
--build-arg version=${version} \
5656
--build-arg commit=${commit} \
5757
--build-arg base_img=${base_img} \
58-
-t $IMAGE_FILE .
58+
-t "${IMAGE_FILE}:${version}" .
5959
)
6060
}
6161

hack/common.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
readonly PACKAGE="tkestack.io/gpu-manager"
44
readonly BUILD_IMAGE_REPO=plugin-build
55
readonly LOCAL_OUTPUT_IMAGE_STAGING="${ROOT}/go/images"
6-
readonly IMAGE_FILE=${IMAGE_FILE:-"gpu-manager:latest"}
6+
readonly IMAGE_FILE=${IMAGE_FILE:-"tkestack/gpu-manager"}
77
readonly PROTO_IMAGE="proto-generater"
88

99
function plugin::cleanup() {
@@ -75,4 +75,4 @@ function plugin::fmt_targets() {
7575
)
7676
)
7777
echo "${targets[@]}"
78-
}
78+
}

0 commit comments

Comments
 (0)