|
1 | 1 | ---
|
2 | 2 | title: "Longhorn_as_storage_solution"
|
3 |
| -date: "2025-02-19" |
| 3 | +date: "2025-03-18" |
4 | 4 | ---
|
5 | 5 |
|
6 | 6 |
|
7 | 7 | | status: | date: | decision-makers: |
|
8 | 8 | | --- | --- | --- |
|
9 |
| -| proposed | 2025-02-19 | Alexandra Aldershaab | |
| 9 | +| proposed | 2025-03-18 | Sofus Albertsen | |
10 | 10 |
|
11 | 11 |
|
12 | 12 | ## Context and Problem Statement
|
13 | 13 |
|
14 |
| -{Describe the context and problem statement, e.g., in free form using two to three sentences or in the form of an illustrative story. You may want to articulate the problem in form of a question and add links to collaboration boards or issue management systems.} |
| 14 | +### Do i need it? |
| 15 | + |
| 16 | +Even though Kubernetes (and their Container Storage Interface) is production ready for persistency and statefull workloads, keeping your cluster stateless have several advantages: |
| 17 | + |
| 18 | +* Dead simple disaster recovery (and duplication) of your cluster: everything is defined as code, so (re)creating a cluster is as easy as running your setupscripts once more and wait for everything to come up. |
| 19 | +* Backup and restore has never been simple, and Kubernetes is not solving this for you. |
| 20 | + |
| 21 | +At it's core, we want stateless [cloud native applications](https://kodekloud.com/blog/cloud-native-principles-explained/). |
| 22 | +Remember the distinction between the need for persistency and ephemeral storage; your chaching service needs ephemeral storage, but does not need backup/restore of said data. Those are perfectly fitting for Kubernetes. |
| 23 | + |
| 24 | +Often time your Database is sitting on some special hardware, and is catered to by specialized competences. |
| 25 | +Keep it that way, and connect to the database from your cluster. |
| 26 | + |
| 27 | +### What are the criterias for choosing this? |
| 28 | + |
| 29 | +* **Performance Requirements:** What are the expected Input/Output Operations Per Second (IOPS) that your applications will demand? What about throughput? |
| 30 | +* **Scalability Needs:** Are you able to add storage seamlessly after the initial creation? |
| 31 | +* **Data Availability and Durability:** What are you requrements to replica's, time to recover etc. |
| 32 | +* **Team Expertise and Comfort Level:** What is your team's existing knowledge and experience with the specific storage solutions you are considering? |
| 33 | + |
| 34 | +### What are our weights for making a choice? |
| 35 | + |
| 36 | +While all criterias are important, choosing one persistency tool over the other can have vastly different requirements to the expertise of the team. |
| 37 | + |
| 38 | +Therefore, the primary weight is; if you have a storage solution that already supports Kubernetes with a CSI driver, evaluate that one before anything else. |
| 39 | + |
| 40 | +Secondary, it is the complexity introduced that we will focus on. |
| 41 | + |
15 | 42 |
|
16 | 43 | ## Considered Options
|
17 | 44 |
|
18 |
| -* Longhorn |
19 |
| -* Rook Ceph |
20 |
| -* OpenEBS |
| 45 | +* **Longhorn:** A lightweight, reliable, and easy-to-use distributed block storage system for Kubernetes. It's built by Rancher (now SUSE). Key features include built-in snapshots, backups, replication, and a user-friendly GUI. It's designed specifically *for* Kubernetes and integrates deeply. |
| 46 | + |
| 47 | +* **Rook Ceph:** Rook is a storage *operator* for Kubernetes, and Ceph is a highly scalable, distributed storage system offering object, block, and file storage. Rook automates deployment, management, and scaling of Ceph within Kubernetes. This combination is powerful but complex. |
| 48 | + |
| 49 | +* **OpenEBS:** A containerized storage solution that provides persistent block storage for Kubernetes applications. It offers several different "storage engines" (cStor, Jiva, Local PV, Mayastor), each with different performance and feature characteristics. Offers flexibility but can require careful selection of the right engine. |
| 50 | + |
| 51 | +* **Portworx:** A commercial (paid) storage platform designed for Kubernetes. It offers high performance, high availability, and advanced features like data encryption, storage-level snapshots, and automated scaling. It's a mature and feature-rich solution, but comes with licensing costs. |
| 52 | + |
21 | 53 |
|
22 | 54 | ## Decision Outcome
|
23 | 55 |
|
24 |
| -Chosen option: Longhorn, because {justification. e.g., only option, which meets k.o. criterion decision driver | which resolves force {force} | … | comes out best (see below)}. |
25 | 56 |
|
26 |
| -<!-- This is an optional element. Feel free to remove. --> |
| 57 | +Chosen option: **Longhorn**, because it provides a good balance of features, ease of use, and integration with Kubernetes, while minimizing the complexity overhead for our team. |
| 58 | + |
| 59 | +It's a strong, open-source option that aligns well with our focus on simplicity. |
| 60 | +It meets our needs for persistent storage within the cluster without introducing the operational overhead of a more complex solution like Rook/Ceph. |
| 61 | + |
| 62 | + |
| 63 | +Based on different scenarios, the following general recommendations can be made: |
| 64 | + |
| 65 | +- For organizations that require massive scalability, support for block, object, and file storage within a unified system, and have a team with the necessary expertise to manage a complex distributed storage platform, Ceph/Rook is a powerful option. |
| 66 | +- For users who are looking for a straightforward and reliable distributed block storage solution that is easy to deploy and manage within Kubernetes, especially for smaller to medium-sized environments, Longhorn is an excellent choice. |
| 67 | +- If you lack the required skillset all together, and need a high-performance, feature-rich, and commercially supported solution, and where the licensing costs are justified then Portworx is our recommendation. |
| 68 | + |
| 69 | + |
27 | 70 | ### Consequences
|
28 | 71 |
|
29 |
| -* Good, because {positive consequence, e.g., improvement of one or more desired qualities, …} |
30 |
| -* Bad, because {negative consequence, e.g., compromising one or more desired qualities, …} |
31 |
| -* … <!-- numbers of consequences can vary --> |
| 72 | +* Good, because it's relatively easy to deploy and manage, leading to lower operational overhead and faster time to value. It has good community support and active development. |
| 73 | +* Good, because Longhorn's performance is generally good for typical workloads, meeting our initial performance requirements. |
| 74 | + |
| 75 | +* Bad, because Longhorn is primarily focused on block storage. If we need robust support for shared filesystems (ReadWriteMany access mode with full POSIX compliance) *within* the cluster, we might need to wait untill newer versions of the tools supports this (see their [roadmap for more information](https://github.com/longhorn/longhorn/wiki/Roadmap#longhorn-v111-january-2026)) |
| 76 | +* Bad, because, although Longhorn has a growing community, it's not as mature as Ceph. While this is less of a direct "consequence" and more of a relative comparison, it's worth keeping in mind for long-term planning. |
0 commit comments