Skip to content

Commit 7ef699b

Browse files
authored
[Kubernetes] Introduce on-prem persistent Storage (Longhorn) 🎉 (#979)
* Introduce longhorn chart * Further longhorn configuration * Longhorn: further settings configuration * Fix longhorn configuration bugs Extra: introduce longhorn pv vales for portainer * Add comment for deletion longhorn * Further longhorn configuration * Add README.md for Longhorn wit FAQ * Update Longhorn readme * Update readme * Futher LH configuration * Update LH's Readme * Update Longhorn Readme * Improve LH's Readme * LH: Reduce reserved default disk space to 5% Since we use a dedicated disk for LH, we can go ahead with 5% * Use values to set Longhorn storage class * Update LH's Readme * LH Readme: add requirements reference * PR Review: bring back portainer s3 pv * LH: decrease portinaer volume size
1 parent ec1d38f commit 7ef699b

File tree

6 files changed

+148
-1
lines changed

6 files changed

+148
-1
lines changed

charts/Makefile

-1
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,6 @@ helmfile-sync: .check-helmfile-installed helmfile.yaml ## Syncs the helmfile con
4949
$(MAKE) -s .helmfile-local-post-install; \
5050
fi
5151

52-
5352
.PHONY: configure-local-hosts
5453
configure-local-hosts: ## Adds local hosts entries for the machine
5554
@echo "Adding $(MACHINE_FQDN) hosts to /etc/hosts ..."

charts/longhorn/README.md

+50
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# Longhorn (LH) Knowledge Base
2+
3+
### Can LH be used for critical services (e.g., Databases)?
4+
5+
No (as of now). , we should not use it for volumes of critical services.
6+
7+
As of now, we should avoid using LH for critical services. Instead, we should rely on easier-to-maintain solutions (e.g., application-level replication [Postgres Operators], S3, etc.). Once we get hands-on experience, extensive monitoring and ability to scale LH, we can consider using it for critical services.
8+
9+
LH uses networking to keep replicas in sync, and IO-heavy workloads may easily overload it, leading to unpredictable consequences. Until we can extensively monitor LH and scale it properly on demand, it should not be used for critical or IO-heavy services.
10+
11+
### How does LH decide which node's disk to use as storage?
12+
13+
It depends on the configuration. There are three possibilities:
14+
* https://longhorn.io/kb/tip-only-use-storage-on-a-set-of-nodes/
15+
16+
When using the `Create Default Disk on Labeled Nodes` option, it relies on the `node.longhorn.io/create-default-disk` Kubernetes node label.
17+
18+
Source: https://longhorn.io/docs/1.8.1/nodes-and-volumes/nodes/default-disk-and-node-config/#customizing-default-disks-for-new-nodes
19+
20+
### Will LH pick up storage from a newly added node?
21+
22+
By default, LH will use storage on all nodes (including newly created ones) where it runs. If `createDefaultDiskLabeledNodes` is configured, it will depend on the label of the node.
23+
24+
Source:
25+
* https://longhorn.io/kb/tip-only-use-storage-on-a-set-of-nodes/
26+
* https://longhorn.io/docs/1.8.1/nodes-and-volumes/nodes/default-disk-and-node-config/#customizing-default-disks-for-new-nodes
27+
28+
### Can workloads be run on nodes where LH is not installed?
29+
30+
Workloads can run on nodes without LH as long as LH is not restricted to specific nodes via the `nodeSelector` or `systemManagedComponentsNodeSelector` settings. If LH is configured to run on specific nodes, workloads can only run on those nodes.
31+
32+
Note: There is an [ongoing bug](https://github.com/longhorn/longhorn/discussions/7312#discussioncomment-13030581) where LH will raise warnings when workloads run on nodes without LH. However, it will still function correctly.
33+
34+
Source: https://longhorn.io/kb/tip-only-use-storage-on-a-set-of-nodes/
35+
36+
### Adding new volumes to (PVs that rely on) LH
37+
38+
Monitor carefully whether LH is capable of handling new volumes. Test the new volume under load (when many read/write operations occur) and ensure LH does not fail due to insufficient resource capacities (e.g., network or CPU). You can also consider LH's performance section from this Readme.
39+
40+
LH's minimum recommended resource requirements:
41+
* https://longhorn.io/docs/1.8.1/best-practices/#minimum-recommended-hardware
42+
43+
### LH's performance / resources
44+
45+
Insights into LH's performance:
46+
* https://longhorn.io/blog/performance-scalability-report-aug-2020/
47+
* https://github.com/longhorn/longhorn/wiki/Performance-Benchmark
48+
49+
Resource requirements:
50+
* https://github.com/longhorn/longhorn/issues/1691

charts/longhorn/values.yaml.gotmpl

+68
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
# Values documentation:
2+
# https://github.com/longhorn/longhorn/tree/v1.8.1/chart#values
3+
4+
global:
5+
# Warning: updating node selectors (after installation) will cause downtime
6+
# https://longhorn.io/docs/archives/1.2.2/advanced-resources/deploy/node-selector/#setting-up-node-selector-after-longhorn-has-been-installed
7+
#
8+
# Warning: using node selectors will restrict our workloads to the same nodes
9+
# https://longhorn.io/kb/tip-only-use-storage-on-a-set-of-nodes/#deploy-longhorn-components-only-on-a-specific-set-of-nodes
10+
nodeSelector: {}
11+
systemManagedComponentsNodeSelector: {}
12+
13+
defaultSettings:
14+
replicaAutoBalance: best-effort
15+
16+
# control on which nodes LH will use disks
17+
# use `node.longhorn.io/create-default-disk` node label for control
18+
createDefaultDiskLabeledNodes: true
19+
# use dedicated folder (disk) for storage
20+
defaultDataPath: /longhorn
21+
22+
# https://longhorn.io/docs/1.8.1/best-practices/#minimal-available-storage-and-over-provisioning
23+
storageMinimalAvailablePercentage: 10
24+
25+
# Prevent LH deletion. Set to true if you want to delete LH
26+
deletingConfirmationFlag: false
27+
28+
# let replicas to be scheduled on the same node
29+
replicaSoftAntiAffinity: false
30+
31+
# we always use dedicated disks. 5% is a good value
32+
storageReservedPercentageForDefaultDisk: 5
33+
34+
persistence:
35+
# use only for non-critical ops workloads
36+
# for critical workloads (e.g. database)
37+
# use application replication (e.g. postgres HA operator)
38+
defaultClass: false
39+
40+
# https://longhorn.io/docs/1.8.1/best-practices/#io-performance
41+
defaultDataLocality: best-effort
42+
defaultClassReplicaCount: 2
43+
44+
# minimum volume size is 300Mi
45+
# https://github.com/longhorn/longhorn/issues/8488
46+
defaultFsType: xfs
47+
48+
resources: # https://longhorn.io/docs/1.8.1/best-practices/#minimum-recommended-hardware
49+
requests:
50+
cpu: 0.5
51+
memory: 128Mi
52+
limits:
53+
cpu: 4
54+
memory: 4Gi
55+
56+
ingress:
57+
enabled: true
58+
className: ""
59+
annotations:
60+
namespace: {{ .Release.Namespace }}
61+
cert-manager.io/cluster-issuer: "cert-issuer"
62+
traefik.ingress.kubernetes.io/router.entrypoints: websecure
63+
traefik.ingress.kubernetes.io/router.middlewares: traefik-traefik-basic-auth@kubernetescrd,traefik-longhorn-strip-prefix@kubernetescrd # namespace + middleware name
64+
tls: true
65+
tlsSecret: monitoring-tls
66+
host: {{ requiredEnv "K8S_MONITORING_FQDN" }}
67+
path: /longhorn
68+
pathType: Prefix
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
persistence:
2+
enabled: true
3+
size: "300Mi" # cannot be lower https://github.com/longhorn/longhorn/issues/8488
4+
storageClass: "{{.Values.longhornStorageClassName}}"

charts/traefik/values.insecure.yaml.gotmpl

+14
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ extraObjects:
1414
name: traefik
1515
targetPort: 9000
1616
protocol: TCP
17+
1718
- apiVersion: v1
1819
kind: Secret
1920
metadata:
@@ -22,13 +23,15 @@ extraObjects:
2223
data:
2324
users: |2
2425
{{ requiredEnv "TRAEFIK_K8S_AUTHORIZED_USER" }}
26+
2527
- apiVersion: traefik.io/v1alpha1
2628
kind: Middleware
2729
metadata:
2830
name: traefik-basic-auth
2931
spec:
3032
basicAuth:
3133
secret: traefik-authorized-users # https://doc.traefik.io/traefik/middlewares/http/basicauth/#users
34+
3235
- apiVersion: traefik.io/v1alpha1
3336
kind: Middleware
3437
metadata:
@@ -38,6 +41,17 @@ extraObjects:
3841
stripPrefix:
3942
prefixes:
4043
- /portainer
44+
45+
- apiVersion: traefik.io/v1alpha1
46+
kind: Middleware
47+
metadata:
48+
name: longhorn-strip-prefix
49+
namespace: {{.Release.Namespace}}
50+
spec:
51+
stripPrefix:
52+
prefixes:
53+
- /longhorn
54+
4155
- apiVersion: networking.k8s.io/v1
4256
kind: Ingress
4357
metadata:

charts/traefik/values.secure.yaml.gotmpl

+12
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ extraObjects:
3939
spec:
4040
basicAuth:
4141
secret: traefik-authorized-users # https://doc.traefik.io/traefik/middlewares/http/basicauth/#users
42+
4243
- apiVersion: traefik.io/v1alpha1
4344
kind: Middleware
4445
metadata:
@@ -48,6 +49,17 @@ extraObjects:
4849
stripPrefix:
4950
prefixes:
5051
- /portainer
52+
53+
- apiVersion: traefik.io/v1alpha1
54+
kind: Middleware
55+
metadata:
56+
name: longhorn-strip-prefix
57+
namespace: {{.Release.Namespace}}
58+
spec:
59+
stripPrefix:
60+
prefixes:
61+
- /longhorn
62+
5163
- apiVersion: traefik.io/v1alpha1
5264
kind: Middleware
5365
metadata:

0 commit comments

Comments
 (0)