[Kubernetes] Introduce on-prem persistent Storage (Longhorn) 🎉 (#979)

YuryHrytsuk · web-flow · commit 7ef699bc5aac · 2025-05-06T14:57:03.000+02:00
* Introduce longhorn chart

* Further longhorn configuration

* Longhorn: further settings configuration

* Fix longhorn configuration bugs

Extra: introduce longhorn pv vales for portainer

* Add comment for deletion longhorn

* Further longhorn configuration

* Add README.md for Longhorn wit FAQ

* Update Longhorn readme

* Update readme

* Futher LH configuration

* Update LH's Readme

* Update Longhorn Readme

* Improve LH's Readme

* LH: Reduce reserved default disk space to 5%

Since we use a dedicated disk for LH, we can go ahead with 5%

* Use values to set Longhorn storage class

* Update LH's Readme

* LH Readme: add requirements reference

* PR Review: bring back portainer s3 pv

* LH: decrease portinaer volume size
diff --git a/charts/Makefile b/charts/Makefile
@@ -49,7 +49,6 @@ helmfile-sync: .check-helmfile-installed helmfile.yaml ## Syncs the helmfile con
 		$(MAKE) -s .helmfile-local-post-install; \
 	fi
 
-
 .PHONY: configure-local-hosts
 configure-local-hosts: ## Adds local hosts entries for the machine
 	@echo "Adding $(MACHINE_FQDN) hosts to /etc/hosts ..."
diff --git a/charts/longhorn/README.md b/charts/longhorn/README.md
@@ -0,0 +1,50 @@
+# Longhorn (LH) Knowledge Base
+
+### Can LH be used for critical services (e.g., Databases)?
+
+No (as of now). , we should not use it for volumes of critical services.
+
+As of now, we should avoid using LH for critical services. Instead, we should rely on easier-to-maintain solutions (e.g., application-level replication [Postgres Operators], S3, etc.). Once we get hands-on experience, extensive monitoring and ability to scale LH, we can consider using it for critical services.
+
+LH uses networking to keep replicas in sync, and IO-heavy workloads may easily overload it, leading to unpredictable consequences. Until we can extensively monitor LH and scale it properly on demand, it should not be used for critical or IO-heavy services.
+
+### How does LH decide which node's disk to use as storage?
+
+It depends on the configuration. There are three possibilities:
+* https://longhorn.io/kb/tip-only-use-storage-on-a-set-of-nodes/
+
+When using the `Create Default Disk on Labeled Nodes` option, it relies on the `node.longhorn.io/create-default-disk` Kubernetes node label.
+
+Source: https://longhorn.io/docs/1.8.1/nodes-and-volumes/nodes/default-disk-and-node-config/#customizing-default-disks-for-new-nodes
+
+### Will LH pick up storage from a newly added node?
+
+By default, LH will use storage on all nodes (including newly created ones) where it runs. If `createDefaultDiskLabeledNodes` is configured, it will depend on the label of the node.
+
+Source:
+* https://longhorn.io/kb/tip-only-use-storage-on-a-set-of-nodes/
+* https://longhorn.io/docs/1.8.1/nodes-and-volumes/nodes/default-disk-and-node-config/#customizing-default-disks-for-new-nodes
+
+### Can workloads be run on nodes where LH is not installed?
+
+Workloads can run on nodes without LH as long as LH is not restricted to specific nodes via the `nodeSelector` or `systemManagedComponentsNodeSelector` settings. If LH is configured to run on specific nodes, workloads can only run on those nodes.
+
+Note: There is an [ongoing bug](https://github.com/longhorn/longhorn/discussions/7312#discussioncomment-13030581) where LH will raise warnings when workloads run on nodes without LH. However, it will still function correctly.
+
+Source: https://longhorn.io/kb/tip-only-use-storage-on-a-set-of-nodes/
+
+### Adding new volumes to (PVs that rely on) LH
+
+Monitor carefully whether LH is capable of handling new volumes. Test the new volume under load (when many read/write operations occur) and ensure LH does not fail due to insufficient resource capacities (e.g., network or CPU). You can also consider LH's performance section from this Readme.
+
+LH's minimum recommended resource requirements:
+* https://longhorn.io/docs/1.8.1/best-practices/#minimum-recommended-hardware
+
+### LH's performance / resources
+
+Insights into LH's performance:
+* https://longhorn.io/blog/performance-scalability-report-aug-2020/
+* https://github.com/longhorn/longhorn/wiki/Performance-Benchmark
+
+Resource requirements:
+* https://github.com/longhorn/longhorn/issues/1691
diff --git a/charts/longhorn/values.yaml.gotmpl b/charts/longhorn/values.yaml.gotmpl
@@ -0,0 +1,68 @@
+# Values documentation:
+# https://github.com/longhorn/longhorn/tree/v1.8.1/chart#values
+
+global:
+  # Warning: updating node selectors (after installation) will cause downtime
+  # https://longhorn.io/docs/archives/1.2.2/advanced-resources/deploy/node-selector/#setting-up-node-selector-after-longhorn-has-been-installed
+  #
+  # Warning: using node selectors will restrict our workloads to the same nodes
+  # https://longhorn.io/kb/tip-only-use-storage-on-a-set-of-nodes/#deploy-longhorn-components-only-on-a-specific-set-of-nodes
+  nodeSelector: {}
+  systemManagedComponentsNodeSelector: {}
+
+defaultSettings:
+  replicaAutoBalance: best-effort
+
+  # control on which nodes LH will use disks
+  # use `node.longhorn.io/create-default-disk` node label for control
+  createDefaultDiskLabeledNodes: true
+  # use dedicated folder (disk) for storage
+  defaultDataPath: /longhorn
+
+  # https://longhorn.io/docs/1.8.1/best-practices/#minimal-available-storage-and-over-provisioning
+  storageMinimalAvailablePercentage: 10
+
+  # Prevent LH deletion. Set to true if you want to delete LH
+  deletingConfirmationFlag: false
+
+  # let replicas to be scheduled on the same node
+  replicaSoftAntiAffinity: false
+
+  # we always use dedicated disks. 5% is a good value
+  storageReservedPercentageForDefaultDisk: 5
+
+persistence:
+  # use only for non-critical ops workloads
+  # for critical workloads (e.g. database)
+  # use application replication (e.g. postgres HA operator)
+  defaultClass: false
+
+  # https://longhorn.io/docs/1.8.1/best-practices/#io-performance
+  defaultDataLocality: best-effort
+  defaultClassReplicaCount: 2
+
+  # minimum volume size is 300Mi
+  # https://github.com/longhorn/longhorn/issues/8488
+  defaultFsType: xfs
+
+resources: # https://longhorn.io/docs/1.8.1/best-practices/#minimum-recommended-hardware
+    requests:
+      cpu: 0.5
+      memory: 128Mi
+    limits:
+      cpu: 4
+      memory: 4Gi
+
+ingress:
+    enabled: true
+    className: ""
+    annotations:
+      namespace: {{ .Release.Namespace }}
+      cert-manager.io/cluster-issuer: "cert-issuer"
+      traefik.ingress.kubernetes.io/router.entrypoints: websecure
+      traefik.ingress.kubernetes.io/router.middlewares: traefik-traefik-basic-auth@kubernetescrd,traefik-longhorn-strip-prefix@kubernetescrd  # namespace + middleware name
+    tls: true
+    tlsSecret: monitoring-tls
+    host: {{ requiredEnv "K8S_MONITORING_FQDN" }}
+    path: /longhorn
+    pathType: Prefix
diff --git a/charts/portainer/values.longhorn-pv.yaml.gotmpl b/charts/portainer/values.longhorn-pv.yaml.gotmpl
@@ -0,0 +1,4 @@
+persistence:
+  enabled: true
+  size: "300Mi" # cannot be lower https://github.com/longhorn/longhorn/issues/8488
+  storageClass: "{{.Values.longhornStorageClassName}}"
diff --git a/charts/traefik/values.insecure.yaml.gotmpl b/charts/traefik/values.insecure.yaml.gotmpl
@@ -14,6 +14,7 @@ extraObjects:
       name: traefik
       targetPort: 9000
       protocol: TCP
+
 - apiVersion: v1
   kind: Secret
   metadata:
@@ -22,13 +23,15 @@ extraObjects:
   data:
     users: |2
       {{ requiredEnv "TRAEFIK_K8S_AUTHORIZED_USER" }}
+
 - apiVersion: traefik.io/v1alpha1
   kind: Middleware
   metadata:
     name: traefik-basic-auth
   spec:
     basicAuth:
       secret: traefik-authorized-users  # https://doc.traefik.io/traefik/middlewares/http/basicauth/#users
+
 - apiVersion: traefik.io/v1alpha1
   kind: Middleware
   metadata:
@@ -38,6 +41,17 @@ extraObjects:
     stripPrefix:
       prefixes:
       - /portainer
+
+- apiVersion: traefik.io/v1alpha1
+  kind: Middleware
+  metadata:
+    name: longhorn-strip-prefix
+    namespace: {{.Release.Namespace}}
+  spec:
+    stripPrefix:
+      prefixes:
+      - /longhorn
+
 - apiVersion: networking.k8s.io/v1
   kind: Ingress
   metadata:
diff --git a/charts/traefik/values.secure.yaml.gotmpl b/charts/traefik/values.secure.yaml.gotmpl
@@ -39,6 +39,7 @@ extraObjects:
   spec:
     basicAuth:
       secret: traefik-authorized-users  # https://doc.traefik.io/traefik/middlewares/http/basicauth/#users
+
 - apiVersion: traefik.io/v1alpha1
   kind: Middleware
   metadata:
@@ -48,6 +49,17 @@ extraObjects:
     stripPrefix:
       prefixes:
       - /portainer
+
+- apiVersion: traefik.io/v1alpha1
+  kind: Middleware
+  metadata:
+    name: longhorn-strip-prefix
+    namespace: {{.Release.Namespace}}
+  spec:
+    stripPrefix:
+      prefixes:
+      - /longhorn
+
 - apiVersion: traefik.io/v1alpha1
   kind: Middleware
   metadata: