|
| 1 | +# Multi-Cluster Support |
| 2 | +Author: @sttts |
| 3 | +Initial implementation: @vincepri |
| 4 | + |
| 5 | +Last Updated on: 03/26/2024 |
| 6 | + |
| 7 | +## Table of Contents |
| 8 | + |
| 9 | +<!--ts--> |
| 10 | +- [Multi-Cluster Support](#multi-cluster-support) |
| 11 | + - [Table of Contents](#table-of-contents) |
| 12 | + - [Summary](#summary) |
| 13 | + - [Motivation](#motivation) |
| 14 | + - [Goals](#goals) |
| 15 | + - [Examples](#examples) |
| 16 | + - [Non-Goals/Future Work](#non-goalsfuture-work) |
| 17 | + - [Proposal](#proposal) |
| 18 | + - [Multi-Cluster-Compatible Reconcilers](#multi-cluster-compatible-reconcilers) |
| 19 | + - [User Stories](#user-stories) |
| 20 | + - [Controller Author with no interest in multi-cluster wanting to old behaviour.](#controller-author-with-no-interest-in-multi-cluster-wanting-to-old-behaviour) |
| 21 | + - [Multi-Cluster Integrator wanting to support cluster managers like Cluster-API or kind](#multi-cluster-integrator-wanting-to-support-cluster-managers-like-cluster-api-or-kind) |
| 22 | + - [Multi-Cluster Integrator wanting to support apiservers with logical cluster (like kcp)](#multi-cluster-integrator-wanting-to-support-apiservers-with-logical-cluster-like-kcp) |
| 23 | + - [Controller Author without self-interest in multi-cluster, but open for adoption in multi-cluster setups](#controller-author-without-self-interest-in-multi-cluster-but-open-for-adoption-in-multi-cluster-setups) |
| 24 | + - [Controller Author who wants to support certain multi-cluster setups](#controller-author-who-wants-to-support-certain-multi-cluster-setups) |
| 25 | + - [Risks and Mitigations](#risks-and-mitigations) |
| 26 | + - [Alternatives](#alternatives) |
| 27 | + - [Implementation History](#implementation-history) |
| 28 | + |
| 29 | +<!--te--> |
| 30 | + |
| 31 | +## Summary |
| 32 | + |
| 33 | +Controller-runtime today allows to write controllers against one cluster only. |
| 34 | +Multi-cluster use-cases require the creation of multiple managers and/or cluster |
| 35 | +objects. This proposal is about adding native support for multi-cluster use-cases |
| 36 | +to controller-runtime. |
| 37 | + |
| 38 | +## Motivation |
| 39 | + |
| 40 | +This change is important because: |
| 41 | +- multi-cluster use-cases are becoming more and more common, compare projects |
| 42 | + like Kamarda, Crossplane or kcp. They all need to write (controller-runtime) |
| 43 | + controllers that operate on multiple clusters. |
| 44 | +- writing controllers for upper systems in a **portable** way is hard today. |
| 45 | + Consequently, there is no multi-cluster controller ecosystem, but could and |
| 46 | + should be. |
| 47 | +- kcp maintains a [controller-runtime fork with multi-cluster support](https://github.com/kcp-dev/controller-runtime) |
| 48 | + because adding support on-top leads to an inefficient controller design, and |
| 49 | + even more important leads of divergence in the ecosystem. |
| 50 | + |
| 51 | +### Goals |
| 52 | + |
| 53 | +- Provide a way to natively write controllers that |
| 54 | + 1. (UNIFORM MULTI-CLUSTER CONTROLLER) operate on multiple clusters in a uniform way, |
| 55 | + i.e. reconciling the same resources on multiple clusters, **optionally** |
| 56 | + - sourcing information from one central hub cluster |
| 57 | + - sourcing information cross-cluster. |
| 58 | + |
| 59 | + Example: distributed `ReplicaSet` controller, reconciling `ReplicaSets` on multiple clusters. |
| 60 | + 2. (AGGREGATING MULTI-CLUSTER CONTROLLER) operate on one central hub cluster aggregating information from multiple clusters. |
| 61 | + |
| 62 | + Example: distributed `Deployment` controller, aggregating `ReplicaSets` back into the `Deployment` object. |
| 63 | +- Allow clusters to dynamically join and leave the set of clusters a controller operates on. |
| 64 | +- Allow event sources to be cross-cluster: |
| 65 | + 1. Multi-cluster events that trigger reconciliation in the one central hub cluster. |
| 66 | + 2. Central hub cluster events to trigger reconciliation on multiple clusters. |
| 67 | +- Allow (informer) indexes that span multiple clusters. |
| 68 | +- Allow logical clusters where a set of clusters is actually backed by one physical informer store. |
| 69 | +- Allow 3rd-parties to plug in their multi-cluster adapter (in source code) into |
| 70 | + an existing multi-cluster-compatible code-base. |
| 71 | +- Minimize the amount of changes to make a controller-runtime controller |
| 72 | + multi-cluster-compatible, in a way that 3rd-party projects have no reason to |
| 73 | + object these kind of changes. |
| 74 | + |
| 75 | +Here we call a controller to be multi-cluster-compatible if the reconcilers get |
| 76 | +reconcile requests in cluster `X` and do all reconciliation in cluster `X`. This |
| 77 | +is less than being multi-cluster-aware, where reconcilers implement cross-cluster |
| 78 | +logic. |
| 79 | + |
| 80 | +### Examples |
| 81 | + |
| 82 | +- Run a controller-runtime controller against a kubeconfig with arbitrary many contexts, all being reconciled. |
| 83 | +- Run a controller-runtime controller against cluster-managers like kind, Cluster-API, Open-Cluster-Manager or Hypershift. |
| 84 | +- Run a controller-runtime controller against a kcp shard with a wildcard watch. |
| 85 | + |
| 86 | +### Non-Goals/Future Work |
| 87 | + |
| 88 | +- Ship integration for different multi-cluster setups. This should become |
| 89 | + out-of-tree subprojects that can individually evolve and vendor'ed by controller authors. |
| 90 | +- Make controller-runtime controllers "binary pluggable". |
| 91 | +- Manage one manager per cluster. |
| 92 | +- Manage one controller per cluster with dedicated workqueues. |
| 93 | + |
| 94 | +## Proposal |
| 95 | + |
| 96 | +The `ctrl.Manager` _SHOULD_ be extended to get an optional `cluster.Provider` via |
| 97 | +`ctrl.Options` implementing |
| 98 | + |
| 99 | +```golang |
| 100 | +// pkg/cluster |
| 101 | +type Provider interface { |
| 102 | + Get(ctx context.Context, clusterName string, opts ...Option) (Cluster, error) |
| 103 | + List(ctx context.Context) ([]string, error) |
| 104 | + Watch(ctx context.Context) (Watcher, error) |
| 105 | +} |
| 106 | +``` |
| 107 | +The `cluster.Cluster` _SHOULD_ be extended with a unique name identifier: |
| 108 | +```golang |
| 109 | +// pkg/cluster: |
| 110 | +type Cluster interface { |
| 111 | + Name() string |
| 112 | + ... |
| 113 | +} |
| 114 | +``` |
| 115 | + |
| 116 | +The `ctrl.Manager` will use the provider to watch clusters coming and going, and |
| 117 | +will inform runnables implementing the `cluster.AwareRunnable` interface: |
| 118 | + |
| 119 | +```golang |
| 120 | +// pkg/cluster |
| 121 | +type AwareRunnable interface { |
| 122 | + Engage(context.Context, Cluster) error |
| 123 | + Disengage(context.Context, Cluster) error |
| 124 | +} |
| 125 | +``` |
| 126 | +In particular, controllers implement the `AwareRunnable` interface. They react |
| 127 | +to engaged clusters by duplicating and starting their registered `source.Source`s |
| 128 | +and `handler.EventHandler`s for each cluster through implementation of |
| 129 | +```golang |
| 130 | +// pkg/source |
| 131 | +type DeepCopyableSyncingSource interface { |
| 132 | + SyncingSource |
| 133 | + DeepCopyFor(cluster cluster.Cluster) DeepCopyableSyncingSource |
| 134 | +} |
| 135 | + |
| 136 | +// pkg/handler |
| 137 | +type DeepCopyableEventHandler interface { |
| 138 | + EventHandler |
| 139 | + DeepCopyFor(c cluster.Cluster) DeepCopyableEventHandler |
| 140 | +} |
| 141 | +``` |
| 142 | +The standard implementing types, in particular `internal.Kind` will adhere to |
| 143 | +these interfaces. |
| 144 | + |
| 145 | +The `ctrl.Manager` _SHOULD_ be extended by a `cluster.Cluster` getter: |
| 146 | +```golang |
| 147 | +// pkg/manager |
| 148 | +type Manager interface { |
| 149 | + // ... |
| 150 | + GetCluster(ctx context.Context, clusterName string) (cluster.Cluster, error) |
| 151 | +} |
| 152 | +``` |
| 153 | +The embedded `cluster.Cluster` corresponds to `GetCluster(ctx, "")`. We call the |
| 154 | +clusters with non-empty name "provider clusters" or "enganged clusters", while |
| 155 | +the embedded cluster of the manager is called the "default cluster" or "hub |
| 156 | +cluster". |
| 157 | + |
| 158 | +The `reconcile.Request` _SHOULD_ be extended by an optional `ClusterName` field: |
| 159 | +```golang |
| 160 | +// pkg/reconile |
| 161 | +type Request struct { |
| 162 | + ClusterName string |
| 163 | + types.NamespacedName |
| 164 | +} |
| 165 | +``` |
| 166 | + |
| 167 | +With these changes, the behaviour of controller-runtime without a set cluster |
| 168 | +provider will be unchanged. |
| 169 | + |
| 170 | +### Multi-Cluster-Compatible Reconcilers |
| 171 | + |
| 172 | +Reconcilers can be made multi-cluster-compatible by changing client and cache |
| 173 | +accessing code from directly accessing `mgr.GetClient()` and `mgr.GetCache()` to |
| 174 | +going through `mgr.GetCluster(req.ClusterName).GetClient()` and |
| 175 | +`mgr.GetCluster(req.ClusterName).GetCache()`. |
| 176 | + |
| 177 | +When building a controller like |
| 178 | +```golang |
| 179 | +builder.NewControllerManagedBy(mgr). |
| 180 | + For(&appsv1.ReplicaSet{}). |
| 181 | + Owns(&v1.Pod{}). |
| 182 | + Complete(reconciler) |
| 183 | +``` |
| 184 | +with the described change to use `GetCluster(ctx, req.ClusterName)` will automatically |
| 185 | +act as *uniform multi-cluster controller*. It will reconcile resources from cluster `X` |
| 186 | +in cluster `X`. |
| 187 | + |
| 188 | +For a manager with `cluster.Provider`, the builder _SHOULD_ create a controller |
| 189 | +that sources events **ONLY** from the provider clusters that got engaged with |
| 190 | +the controller. |
| 191 | + |
| 192 | +Controllers that should be triggered by events on the hub cluster will have to |
| 193 | +opt-in like in this example: |
| 194 | + |
| 195 | +```golang |
| 196 | +builder.NewControllerManagedBy(mgr). |
| 197 | + For(&appsv1.Deployment{}, builder.InDefaultCluster). |
| 198 | + Owns(&v1.ReplicaSet{}). |
| 199 | + Complete(reconciler) |
| 200 | +``` |
| 201 | +A mixed set of sources is possible as shown here in the example. |
| 202 | + |
| 203 | +## User Stories |
| 204 | + |
| 205 | +### Controller Author with no interest in multi-cluster wanting to old behaviour. |
| 206 | + |
| 207 | +- Do nothing. Controller-runtime behaviour is unchanged. |
| 208 | + |
| 209 | +### Multi-Cluster Integrator wanting to support cluster managers like Cluster-API or kind |
| 210 | + |
| 211 | +- Implement the `cluster.Provider` interface, either via polling of the cluster registry |
| 212 | + or by watching objects in the hub cluster. |
| 213 | +- For every new cluster create an instance of `cluster.Cluster`. |
| 214 | + |
| 215 | +### Multi-Cluster Integrator wanting to support apiservers with logical cluster (like kcp) |
| 216 | + |
| 217 | +- Implement the `cluster.Provider` interface by watching the apiserver for logical cluster objects |
| 218 | + (`LogicalCluster` CRD in kcp). |
| 219 | +- Return a facade `cluster.Cluster` that scopes all operations (client, cache, indexers) |
| 220 | + to the logical cluster, but backed by one physical `cluster.Cluster` resource. |
| 221 | +- Add cross-cluster indexers to the physical `cluster.Cluster` object. |
| 222 | + |
| 223 | +### Controller Author without self-interest in multi-cluster, but open for adoption in multi-cluster setups |
| 224 | + |
| 225 | +- Replace `mgr.GetClient()` and `mgr.GetCache` with `mgr.GetCluster(req.ClusterName).GetClient()` and `mgr.GetCluster(req.ClusterName).GetCache()`. |
| 226 | +- Make manager and controller plumbing vendor'able to allow plugging in multi-cluster provider. |
| 227 | + |
| 228 | +### Controller Author who wants to support certain multi-cluster setups |
| 229 | + |
| 230 | +- Do the `GetCluster` plumbing as described above. |
| 231 | +- Vendor 3rd-party multi-cluster providers and wire them up in `main.go` |
| 232 | + |
| 233 | +## Risks and Mitigations |
| 234 | + |
| 235 | +- The standard behaviour of controller-runtime is unchanged for single-cluster controllers. |
| 236 | +- The activation of the multi-cluster mode is through attaching the `cluster.Provider` to the manager. |
| 237 | + To make it clear that the semantics are experimental, we make the `Options.provider` field private |
| 238 | + and adds `Options.WithExperimentalClusterProvider` method. |
| 239 | +- We only extend these interfaces and structs: |
| 240 | + - `ctrl.Manager` with `GetCluster(ctx, clusterName string) (cluster.Cluster, error)` |
| 241 | + - `cluster.Cluster` with `Name() string` |
| 242 | + - `reconcile.Request` with `ClusterName string` |
| 243 | + We think that the behaviour of these extensions is well understood and hence low risk. |
| 244 | + Everything else behind the scenes is an implementation detail that can be changed |
| 245 | + at any time. |
| 246 | + |
| 247 | +## Alternatives |
| 248 | + |
| 249 | +- Multi-cluster support could be built outside of core controller-runtime. This would |
| 250 | + lead likely to a design with one manager per cluster. This has a number of problems: |
| 251 | + - only one manager can serve webhooks or metrics |
| 252 | + - cluster management must be custom built |
| 253 | + - logical cluster support would still require a fork of controller-runtime and |
| 254 | + with that a divergence in the ecosystem. The reason is that logical clusters |
| 255 | + require a shared workqueue because they share the same apiserver. So for |
| 256 | + fair queueing, this needs deep integration into one manager. |
| 257 | + - informers facades are not supported in today's cluster/cache implementation. |
| 258 | +- We could deepcopy the builder instead of the sources and handlers. This would |
| 259 | + lead to one controller and one workqueue per cluster. For the reason outlined |
| 260 | + in the previous alternative, this is not desireable. |
| 261 | +- We could skip adding `ClusterName` to `reconcile.Request` and instead pass the |
| 262 | + cluster through in the context. On the one hand, this looks attractive as it |
| 263 | + would avoid having to touch reconcilers at all to make them multi-cluster-compatible. |
| 264 | + On the other hand, with `cluster.Cluster` embedded into `manager.Manager`, not |
| 265 | + every method of `cluster.Cluster` carries a context. So virtualizing the cluster |
| 266 | + in the manager leads to contradictions in the semantics. |
| 267 | + |
| 268 | + For example, it can well be that every cluster has different REST mapping because |
| 269 | + installed CRDs are different. Without a context, we cannot return the right |
| 270 | + REST mapper. |
| 271 | + |
| 272 | + An alternative would be to add a context to every method of `cluster.Cluster`, |
| 273 | + which is a much bigger and uglier change than what is proposed here. |
| 274 | + |
| 275 | + |
| 276 | +## Implementation History |
| 277 | + |
| 278 | +- [PR #2207 by @vincepri : WIP: ✨ Cluster Provider and cluster-aware controllers](https://github.com/kubernetes-sigs/controller-runtime/pull/2207) – with extensive review |
| 279 | +- [PR #2208 by @sttts replace #2207: WIP: ✨ Cluster Provider and cluster-aware controllers](https://github.com/kubernetes-sigs/controller-runtime/pull/2726) – |
| 280 | + picking up #2207, addressing lots of comments and extending the approach to what kcp needs, with a `fleet-namespace` example that demonstrates a similar setup as kcp with real logical clusters. |
| 281 | +- [github.com/kcp-dev/controller-runtime](https://github.com/kcp-dev/controller-runtime) – the kcp controller-runtime fork |
| 282 | + |
| 283 | +<!-- Links --> |
| 284 | +[community meeting]: https://docs.google.com/document/d/1Ih-2cgg1bUrLwLVTB9tADlPcVdgnuMNBGbUl4D-0TIk |
0 commit comments