Skip to content

Commit 05d450f

Browse files
Add ADR for using Talos OS as the preferred operating system for Kube… (#15)
1 parent adf3876 commit 05d450f

File tree

2 files changed

+41
-1
lines changed

2 files changed

+41
-1
lines changed
+40
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
---
2+
title: "Use Talos OS as the Preferred Operating System for Kubernetes Operations"
3+
date: "2025-02-25"
4+
---
5+
6+
7+
| status: | date: | decision-makers: |
8+
| --- | --- | --- |
9+
| proposed | 2025-02-25 | Sofus Albertsen |
10+
11+
12+
## Context and Problem Statement
13+
14+
Choosing the right operating system for your Kubernetes cluster is crucial for stability, security, and operational efficiency. The OS should be optimized for container workloads, minimize overhead, and integrate well with Infrastructure as Code (IaC) practices.
15+
## Considered Options
16+
17+
* Talos OS
18+
* Red Hat OpenShift
19+
* SUSE Rancher (RancherOS/RKE)
20+
21+
## Decision Outcome
22+
23+
Chosen option: **Talos OS**, because its minimal footprint, API-driven configuration, and singular focus on Kubernetes make it ideal for automated infrastructure management and reduce operational overhead.
24+
25+
Talos OS's immutable architecture and security-focused design further enhance its suitability for Kubernetes deployments, giving you a minimal attack surface from the OS point of view. As an example, the OS does not have any shell, so no bash scripts can be executed.
26+
27+
OpenShift and Rancher were considered, but their comprehensive feature sets, while beneficial in some scenarios, introduce increased complexity and overhead.
28+
29+
While their dashboards can simplify initial setup, they can also encourage "click-ops" and deviate from IaC best practices. These platforms might be suitable if existing Red Hat or SUSE expertise is a primary driver, but becuase they are fully fledged OS's underneath, they introduce more operational overhead than Talos.
30+
31+
### Consequences
32+
33+
* **Good:** Talos OS's minimal package selection makes it a smaller attack surface.
34+
* **Good:** The API-driven configuration of Talos OS allows for seamless integration with IaC tools like Terraform, enabling fully automated cluster provisioning and management.
35+
* **Good:** The immutable infrastructure of Talos OS simplifies updates and adds recilliency because of it's dual boot bank setup.
36+
* **Good:** The "two package" approach simplifies maintenance (day 2 operations) and reduces the likelihood of OS-related issues, as all known package combinations can be tested from the vendor.
37+
38+
* **Bad:** The learning curve for Talos OS might be steeper initially for teams unfamiliar with its API-driven approach.
39+
* **Bad:** The lack of a graphical user interface might be a drawback for some users accustomed to traditional OS management.
40+
* **Bad:** Talos is a relatively newer project compared to OpenShift or Rancher, therefore community support and available resources might be smaller.

docs/hardware_ready/_index.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ In case virtualisation is chosen, the below recommendations are what you would r
1313
## Decision Matrix
1414
| Problem domain | Description | Reason for importance | Tool recommendation |
1515
|:---:|:---:|:---:|:---:|
16-
| Kubernetes Node Operating System | The Operating System running on each of the hosts that will be part of your Kubernetes cluster | Choosing the right OS will be the foundation for building a production-grade Kubernetes cluster | |
16+
| Kubernetes Node Operating System | The Operating System running on each of the hosts that will be part of your Kubernetes cluster | Choosing the right OS will be the foundation for building a production-grade Kubernetes cluster | [Talos OS](hardware_ready/ADRs/talos_as_os.md) |
1717
| Storage solution | The underlying storage capabilities which Kubernetes will leverage to provide persistence for stateful workloads | Choosing the right storage solution for your clusters needs is important as there is a lot of balance tradeoffs associated with it, e.g redundancy vs. complexity | |
1818
| Container Runtime (CRI) | The software that is responsible for running containers | You need a working container runtime on each node in your cluster, so that the kubelet can launch pods and their containers | |
1919
| Network plugin (CNI) | Plugin used for cluster networking | A CNI plugin is required to implement the Kubernetes network model | [Cilium](Cilium_as_network_plugin.md) |

0 commit comments

Comments
 (0)