|
1 | 1 | # Federation
|
2 | 2 |
|
3 |
| -A Kubernetes controller that provides discovery service for Istio mesh federation. |
4 |
| - |
5 |
| -Mesh federation allows exposing services between meshes and enabling communication across mesh boundaries. |
6 |
| -Each mesh may expose a subset of its services to allow other meshes to connect to the exposed services. |
7 |
| - |
8 |
| -Controllers utilize XDS protocol to discover exported services in federated meshes. |
9 |
| -Controllers are deployed with sidecars, so cross-cluster connections between controllers are secured with Istio mTLS. |
10 |
| - |
11 |
| -## Motivation |
12 |
| - |
13 |
| -In this deployment model, independent meshes deployed in different clusters can connect services without configuring |
14 |
| -access to the k8s api-server in remote clusters. This allows to achieve multi-cluster connectivity for meshes managed |
15 |
| -by different teams in different clusters. |
16 |
| - |
17 |
| -## Development |
18 |
| - |
19 |
| -### Prerequisites |
20 |
| -1. Go 1.22+ |
21 |
| -2. protoc 3.19.0+ |
22 |
| -3. protoc-gen-go v1.30.0+ |
23 |
| -4. protoc-get-golang-deepcopy |
24 |
| - |
25 |
| -### Commands |
26 |
| - |
27 |
| -1. Compile controller: |
28 |
| - ```shell |
29 |
| - make |
30 |
| - ``` |
31 |
| -1. Run unit tests: |
32 |
| - ```shell |
33 |
| - make test |
34 |
| - ``` |
35 |
| -1. Build image: |
36 |
| - ```shell |
37 |
| - HUB=quay.io/maistra-dev TAG=test make docker-build |
38 |
| - ``` |
39 |
| -1. Push image: |
40 |
| - ```shell |
41 |
| - HUB=quay.io/maistra-dev TAG=test make docker-push |
42 |
| - ``` |
43 |
| -1. Run e2e tests: |
44 |
| - ```shell |
45 |
| - make e2e |
46 |
| - ``` |
47 |
| -1. Run e2e tests with specific Istio version and custom controller image: |
48 |
| - ```shell |
49 |
| - HUB=quay.io/maistra-dev TAG=test ISTIO_VERSION=1.23.0 make e2e |
50 |
| - ``` |
51 |
| -1. Run specific test suites: |
52 |
| - ```shell |
53 |
| - TEST_SUITES="spire" make e2e |
54 |
| - ``` |
55 |
| -1. Customize federation controller image used in tests (`TAG` is ignored if `USE_LOCAL_IMAGE=true` or not set): |
56 |
| - ```shell |
57 |
| - USE_LOCAL_IMAGE=false HUB=quay.io/maistra-dev TAG=0.1 make e2e |
58 |
| - ``` |
| 3 | +This project implements Istio mesh federation using a Kubernetes controller that provides an API |
| 4 | +for managing multi-mesh communication, implements service discovery and automates the management of Istio configuration. |
| 5 | + |
| 6 | +Mesh federation enables secure communication between applications across mesh boundaries using mTLS. |
| 7 | +Each mesh can federate a subset of its services, allowing applications from other meshes to connect securely. |
| 8 | +With end-to-end mTLS authorization can be handled directly by the federated application. |
| 9 | + |
| 10 | +## Multi-primary vs federation |
| 11 | + |
| 12 | +[Multi-primary](https://istio.io/latest/docs/setup/install/multicluster/multi-primary_multi-network/) and |
| 13 | +[primary-remote](https://istio.io/latest/docs/setup/install/multicluster/primary-remote_multi-network/) topologies |
| 14 | +are great solutions for expanding single mesh to multiple k8s clusters, giving better system resiliency and higher availability. |
| 15 | +However, they do not fit well in the following cases: |
| 16 | + |
| 17 | +1. Decentralized control and ownership of clusters. |
| 18 | + |
| 19 | + **Use case**: Different teams or departments manage their own clusters and control planes independently. |
| 20 | + |
| 21 | + **Reason**: Federation allows each team to maintain autonomy over their cluster’s Istio configurations while still enabling |
| 22 | + selective cross-mesh communication. |
| 23 | + |
| 24 | +1. Simplified networking between clusters. |
| 25 | + |
| 26 | + **Use case**: Clusters communicate over public networks without a shared private network (e.g., VPC peering). |
| 27 | + |
| 28 | + **Reason**: Federation and multi-primary both use gateway-based communication for data-plane traffic. |
| 29 | + Multi-primary deployments, however, require control planes to access remote kube-apiservers. |
| 30 | + This often requires extra network configuration, as users typically do not want to expose kube-apiservers to the internet. |
| 31 | + |
| 32 | +1. Limited service sharing. |
| 33 | + |
| 34 | + **Use case**: Only a subset of services needs to be shared across clusters (e.g., common APIs or external-facing services). |
| 35 | + |
| 36 | + **Reason**: Federation allows you to expose and consume specific services across meshes using service entries, |
| 37 | + without fully integrating the control planes. This is partially possible in multi-primary deployment, |
| 38 | + but exporting services could be limited only to namespaces matching configured discovery selectors. |
| 39 | + |
| 40 | +1. Operational simplicity for isolated meshes. |
| 41 | + |
| 42 | + **Use case**: Simplified troubleshooting and upgrades by isolating cluster-specific issues. |
| 43 | + |
| 44 | + **Reason**: Since federated meshes don’t rely on a shared control plane, issues are localized to individual clusters. |
| 45 | + |
| 46 | +## High-level architecture |
| 47 | + |
| 48 | + |
| 49 | + |
| 50 | +## How it works |
| 51 | + |
| 52 | +### Service discovery |
| 53 | + |
| 54 | +#### Import |
| 55 | + |
| 56 | +Controllers connect to each other using gRPC protocol and subscribe to `FederatedService` API. |
| 57 | +When a controller receives an update, it creates `ServiceEntry` or `WorkloadEntry` depending on the local cluster state. |
| 58 | +It also applies client-side configurations using `DestinationRule` if the mesh federation requires customizing SNI for cross-cluster traffic. |
| 59 | + |
| 60 | +#### Export |
| 61 | + |
| 62 | +Controllers connect to the local kube-apiserver to discover local services matching export rules. |
| 63 | +When a controller receives an update from Kubernetes about a `Service` matching export rules, |
| 64 | +it is exposed on a federation ingress gateway. The federation ingress gateway is very similar to the east-west gateway |
| 65 | +in multi-primary and primary-remote deployments, but it exposes only one TLS auto-passthrough port. |
| 66 | + |
| 67 | +### Security |
| 68 | + |
| 69 | +The federation controller is deployed within each federated mesh with a sidecar like any other application. |
| 70 | +Each controller creates `PeerAuthentication` to enable strict mTLS for itself and configures proper `AuthorizationPolicy` |
| 71 | +to allow traffic only from the configured remote controllers. |
| 72 | + |
| 73 | +Controllers **DO NOT** enforce any authorization policy at the mesh boundaries to avoid mTLS termination between applications. |
| 74 | +Application or cluster admins are responsible for configuring their authz policies, and it is highly recommended |
| 75 | +to deny all traffic by default and allow only selected services. |
| 76 | + |
| 77 | +## Identity and trust model |
| 78 | + |
| 79 | +This controller does not provide any mechanism to share trust bundles between meshes using different CAs. |
| 80 | +It can only enable mTLS communication between meshes when all clusters use the same root CA or use SPIRE |
| 81 | +with enabled trust bundle federation. |
| 82 | + |
| 83 | +## Getting started |
| 84 | + |
| 85 | +Follow these guides to see how it works in practice: |
| 86 | +1. [Simple multi-mesh bookinfo deployment](examples/README.md). |
| 87 | +2. [Integration with SPIRE](examples/spire/README.md). |
| 88 | + |
| 89 | +## Comparing to other projects |
| 90 | + |
| 91 | +#### Admiral |
| 92 | + |
| 93 | +While [Admiral](https://github.com/istio-ecosystem/admiral) is designed to manage multi-cluster service discovery |
| 94 | +and traffic distribution in Istio, it focuses on scenarios where clusters operate as part of a single logical mesh, |
| 95 | +such as multi-primary or primary-remote topologies. Since it does not natively support mesh federation, |
| 96 | +it serves a fundamentally different purpose, making it an unsuitable fit for implementing multi-mesh APIs. |
| 97 | + |
| 98 | +#### Emcee |
| 99 | + |
| 100 | +[Emcee](https://github.com/istio-ecosystem/emcee), a proof-of-concept for mesh federation, has been inactive for over five years. |
| 101 | +Reviving the project would require significant changes to its APIs and underlying assumptions, making it more practical |
| 102 | +to start fresh rather than build on the existing codebase. |
0 commit comments