You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| {"proposed \| rejected \| accepted \| deprecated \| … \| superseded by ADR-0123"} | {YYYY-MM-DD when the decision was last updated} | {list everyone involved in the decision} |
10
10
11
-
12
11
## Context and Problem Statement
13
12
14
13
{Describe the context and problem statement, e.g., in free form using two to three sentences or in the form of an illustrative story. You may want to articulate the problem in form of a question and add links to collaboration boards or issue management systems.}
@@ -29,4 +28,4 @@ Chosen option: "{title of option 1}", because {justification. e.g., only option,
29
28
30
29
* Good, because {positive consequence, e.g., improvement of one or more desired qualities, …}
31
30
* Bad, because {negative consequence, e.g., compromising one or more desired qualities, …}
Copy file name to clipboardExpand all lines: contributing.md
+10-4
Original file line number
Diff line number
Diff line change
@@ -1,3 +1,5 @@
1
+
# Contributing
2
+
1
3
This guide is meant hold all of the experience from Eficodeans working with Kubernetes, distilled into one easily readable guide.
2
4
3
5
This means, that we welcome all contributions to 'what is the right tech stack'.
@@ -6,11 +8,13 @@ There are fundamentally 2 ways to contribute to this guide: recommend a tool, an
6
8
7
9
## Recommend a tool
8
10
9
-
If you want to recommend a tool, the place to start is to write an Architecture Decision Record (ADR). All tools recommended in the guide is reflected in an ADR.
11
+
If you want to recommend a tool, the place to start is to write an Architecture Decision Record (ADR). All tools recommended in the guide is reflected in an ADR.
10
12
11
13
To add an ADR do the following:
12
14
13
-
`hugo new --kind adr <DesiredFolder>/ADRs/<NameOfADRFile>.md --source .pages`
15
+
```shell
16
+
hugo new --kind adr <DesiredFolder>/ADRs/<NameOfADRFile>.md --source .pages
17
+
```
14
18
15
19
Fill out the sections in the generated ADR
16
20
@@ -50,8 +54,10 @@ You can use either Devbox or Dev Containers to set up a consistent development e
50
54
To preview the website locally while making changes:
51
55
52
56
1. Run the Hugo development server:
53
-
```
57
+
58
+
```sh
54
59
hugo server --source .pages
55
60
```
56
-
2. Open your browser and navigate to `http://localhost:1313/On-prem_Kubernetes_Guide/ `
61
+
62
+
2. Open your browser and navigate to `http://localhost:1313/On-prem_Kubernetes_Guide/`
57
63
3. The website will automatically refresh when you make changes to the source files
An opinionated guide to building and running your on-prem tech-stack for running Kubernetes.
6
+
6
7
## Introduction
8
+
7
9
Deploying and operating Kubernetes on-premises is fundamentally different from doing so in the cloud. Without a managed control plane or provider-operated infrastructure, organizations must take full ownership of networking, security, and operational automation to ensure a stable and secure environment. The complexity of these decisions can quickly lead to fragmentation, inefficiency, and technical debt if not approached with a well-defined strategy.
8
10
9
11
This guide delivers an opinionated, battle-tested roadmap for building a production-grade on-prem Kubernetes environment, structured around three foundational pillars:
12
+
10
13
-[Getting your hardware ready to work with Kubernetes](hardware_ready/_index.md)
11
14
-[Getting your software ready to work with Kubernetes](software_ready/_index.md)
12
15
-[Working with Kubernetes](working_with_k8s/_index.md)
@@ -16,9 +19,11 @@ Instead of presenting endless options, we provide clear, prescriptive recommenda
16
19
By following this approach, organizations can confidently design, deploy, and sustain an optimized, resilient, and future-compatible Kubernetes cluster, making informed decisions that balance control, flexibility, and operational efficiency from day one.
17
20
18
21
## Key differences between On-prem and Cloud Kubernetes
22
+
19
23
One of the biggest challenges of running Kubernetes on-prem is the absence of elastic cloud-based scaling, where compute and storage resources can be provisioned on demand. Instead, on-prem environments require careful capacity planning to avoid resource contention while minimizing unnecessary infrastructure costs. Additionally, the operational burden extends beyond initial deployment—day-two operations such as upgrades, observability, disaster recovery, and compliance enforcement demand greater automation and proactive management to maintain stability and performance. Without cloud-native integrations, teams must build and maintain their own ecosystem of networking, storage, and security solutions, ensuring that each component is optimized for reliability and maintainability. These factors make on-prem Kubernetes deployments more complex but also provide greater control over cost, security, and regulatory compliance.
20
24
21
25
## Document Structure
26
+
22
27
With the introduction and key differences out of the way, we can now get into the important parts of the document. As mentioned in the introduction, the document is structured around three foundational pillars, namely:
23
28
24
29
-[Getting your hardware ready to work with Kubernetes](hardware_ready/_index.md)
Copy file name to clipboardExpand all lines: docs/guide.md
+14-1
Original file line number
Diff line number
Diff line change
@@ -1,9 +1,15 @@
1
+
---
2
+
title: On-premises Kubernetes guide
3
+
---
1
4
2
5
An opinionated guide to building and running your on-prem tech-stack for running Kubernetes.
6
+
3
7
## Introduction
8
+
4
9
Deploying and operating Kubernetes on-premises is fundamentally different from doing so in the cloud. Without a managed control plane or provider-operated infrastructure, organizations must take full ownership of networking, security, and operational automation to ensure a stable and secure environment. The complexity of these decisions can quickly lead to fragmentation, inefficiency, and technical debt if not approached with a well-defined strategy.
5
10
6
11
This guide delivers an opinionated, battle-tested roadmap for building a production-grade on-prem Kubernetes environment, structured around three foundational pillars:
12
+
7
13
- Getting your hardware ready to work with Kubernetes
8
14
- Getting your software ready to work with Kubernetes
9
15
- Working with Kubernetes
@@ -13,9 +19,11 @@ Instead of presenting endless options, we provide clear, prescriptive recommenda
13
19
By following this approach, organizations can confidently design, deploy, and sustain an optimized, resilient, and future-compatible Kubernetes cluster, making informed decisions that balance control, flexibility, and operational efficiency from day one.
14
20
15
21
## Key differences between On-prem and Cloud Kubernetes
22
+
16
23
One of the biggest challenges of running Kubernetes on-prem is the absence of elastic cloud-based scaling, where compute and storage resources can be provisioned on demand. Instead, on-prem environments require careful capacity planning to avoid resource contention while minimizing unnecessary infrastructure costs. Additionally, the operational burden extends beyond initial deployment—day-two operations such as upgrades, observability, disaster recovery, and compliance enforcement demand greater automation and proactive management to maintain stability and performance. Without cloud-native integrations, teams must build and maintain their own ecosystem of networking, storage, and security solutions, ensuring that each component is optimized for reliability and maintainability. These factors make on-prem Kubernetes deployments more complex but also provide greater control over cost, security, and regulatory compliance.
17
24
18
25
## Document Structure
26
+
19
27
With the introduction and key differences out of the way, we can now get into the important parts of the document. As mentioned in the introduction, the document is structured around three foundational pillars, namely:
20
28
21
29
- Getting your hardware ready to work with Kubernetes
@@ -25,7 +33,9 @@ With the introduction and key differences out of the way, we can now get into th
25
33
For each of these pillars, we will be providing you with primary and secondary recommendations regarding tech-stack and any accompanying tools. These recommendations will go over the tools themselves and provide you with arguments for choosing them, as well as listing out common pitfalls and important points of consideration.
26
34
27
35
## Getting your hardware ready to work with Kubernetes
36
+
28
37
### Virtualisation or bare metal
38
+
29
39
One important aspect is to determine whether the clusters should run on an OS directly on the machines, or if it makes sense to add a virtualisation layer.
30
40
31
41
Running directly on the hardware gives you a 1-1 relationship between the machines and the nodes. This is not always advised if the machines are particularly beefy. Running directly on the hardware will of course have lower latency than when adding a virtualisation layer.
@@ -35,16 +45,19 @@ A virtualisation layer can benefit via abstracting the actual hardware, and enab
35
45
In case virtualisation is chosen, the below recommendations are what you would run in your VM. For setting up your VM’s we recommend Talos with KubeVirt.
36
46
37
47
### Decision Matrix
48
+
38
49
| Problem domain | Description | Reason for importance | Primary tool recommendation | Secondary tool recommendation |
39
50
|:---:|:---:|:---:|:---:|:---:|
40
51
| Kubernetes Node Operating System | The Operating System running on each of the hosts that will be part of your Kubernetes cluster | Choosing the right OS will be the foundation for building a production-grade Kubernetes cluster | Talos Linux | Flatcar Linux |
41
52
| Storage solution | The underlying storage capabilities which Kubernetes will leverage to provide persistence for stateful workloads | Choosing the right storage solution for your clusters needs is important as there is a lot of balance tradeoffs associated with it, e.g redundancy vs. complexity | Longhorn (iscsi) OpenEBS (iscsi) | Rook Ceph |
42
53
| Container Runtime (CRI) | The software that is responsible for running containers | You need a working container runtime on each node in your cluster, so that the kubelet can launch pods and their containers | Containerd (embedded in Talos??? But maybe always containerd anyways?) ||
43
54
| Network plugin (CNI) | Plugin used for cluster networking | A CNI plugin is required to implement the Kubernetes network model | Cillium? | Calico? |
44
55
45
-
46
56
## Getting your software ready to work with Kubernetes
57
+
58
+
<!-- markdownlint-disable MD024 -->
47
59
### Decision Matrix
60
+
48
61
| Problem domain | Description | Reason for importance | Primary tool recommendation | Secondary tool recommendation |
49
62
|:---:|:---:|:---:|:---:|:---:|
50
63
| Image Registry | A common place to store and fetch images | High availability, secure access control | Harbor | Sonatype Nexus |
Copy file name to clipboardExpand all lines: docs/hardware_ready/ADRs/Cilium_as_network_plugin.md
+8-18
Original file line number
Diff line number
Diff line change
@@ -6,12 +6,9 @@ title: Use Cilium as Network Plugin
6
6
| --- | --- | --- |
7
7
| proposed | 2025-02-18 | Alexandra Aldershaab, Steffen Petersen |
8
8
9
-
10
9
## Context and Problem Statement
11
10
12
-
A CNI plugin is required to implement the Kubernetes network model by assigning IP addresses from preallocated CIDR ranges
13
-
to pods and nodes. The CNI plugin is also responsible for enforcing network policies that control how traffic flows between
14
-
namespaces as well as between the cluster and the internet.
11
+
A CNI plugin is required to implement the Kubernetes network model by assigning IP addresses from preallocated CIDR ranges to pods and nodes. The CNI plugin is also responsible for enforcing network policies that control how traffic flows between namespaces as well as between the cluster and the internet.
15
12
16
13
## Considered Options
17
14
@@ -21,9 +18,7 @@ namespaces as well as between the cluster and the internet.
21
18
22
19
## Decision Outcome
23
20
24
-
Chosen option: **Cilium**, because it is a fully conformant CNI plugin that works in both cloud and on-premises environments
25
-
while also providing support for network policies as well as more advanced networking features. Cilium has also gained
26
-
rapid adoption in the Kubernetes community and is considered the future standard of CNI plugins.
21
+
Chosen option: **Cilium**, because it is a fully conformant CNI plugin that works in both cloud and on-premises environments while also providing support for network policies as well as more advanced networking features. Cilium has also gained rapid adoption in the Kubernetes community and is considered the future standard of CNI plugins.
27
22
28
23
Flannel was considered, but it does not support network policies which is considered a hard requirement.
29
24
@@ -32,14 +27,9 @@ Calico, while supporting Network policies, falls short compared to Cilium in ter
32
27
### Consequences
33
28
34
29
* Good, because Cilium provides support for network policies on L7 as well as the usual L3/L4.
35
-
* Good, because Cilium provides support for BGP controlplane integration, allowing for seamless integration with existing
36
-
networking infrastructure.
37
-
* Good, because Cilium provides a feature called Egress Gateway which allows for traffic exiting the cluster to be routed
38
-
through specific nodes, facilitating smooth integration with existing security infrastructure such as IP-based firewalls.
39
-
* Good, because Cilium comes with a utility called Hubble which provides deep observability into the network traffic, allowing
40
-
for easy debugging and troubleshooting of network issues.
41
-
42
-
* Bad, because Cilium requires you to understand both Kubernetes networking and tradition networking concepts to fully utilize
43
-
its advanced features.
44
-
* Bad, because Cilium does not come installed by default on any flavor of Kubernetes, requiring additional steps to
45
-
install it and provide necessary custom configuration.
30
+
* Good, because Cilium provides support for BGP controlplane integration, allowing for seamless integration with existing networking infrastructure.
31
+
* Good, because Cilium provides a feature called Egress Gateway which allows for traffic exiting the cluster to be routed through specific nodes, facilitating smooth integration with existing security infrastructure such as IP-based firewalls.
32
+
* Good, because Cilium comes with a utility called Hubble which provides deep observability into the network traffic, allowing for easy debugging and troubleshooting of network issues.
33
+
34
+
* Bad, because Cilium requires you to understand both Kubernetes networking and tradition networking concepts to fully utilize its advanced features.
35
+
* Bad, because Cilium does not come installed by default on any flavor of Kubernetes, requiring additional steps to install it and provide necessary custom configuration.
0 commit comments