This document is the detailed design and architecture of the Flomesh Service Mesh being built in this repository.
Flomesh Service Mesh (FSM) is a simple, complete, and standalone service mesh solution. FSM provides a fully featured control plane. It leverages an architecture based on Pipy reverse-proxy sidecar. While by default FSM ships with Pipy, the design utilizes interfaces, which enable integrations with any other reverse-proxy.
FSM relies on SMI Spec to reference services that will participate in the service mesh. FSM ships out-of-the-box with all necessary components to deploy a complete service mesh spanning multiple compute platforms.
As an operator of services spanning diverse compute platforms (Kubernetes and Virtual Machines on public and private clouds) I need an open-source solution, which will dynamically:
- Apply policies governing TCP & HTTP access between peer services
- Encrypt traffic between services leveraging mTLS and short-lived certificates with a custom CA
- Rotate certificates as often as necessary to make these short-lived and remove the need for certificate revocation management
- Collect traces and metrics to provide visibility into the health and operation of the services
- Implement traffic split between various versions of the services deployed as defined via SMI Spec
The system must be:
- easy to understand
- simple to install
- effortless to maintain
- painless to troubleshoot
- configurable via SMI Spec
When a new Pod creation is initiated, FSM's Sidecar Injector intercepts the create pod operations for namespaces joined to the mesh, and forwards these API calls to the FSM control plane. FSM control plane patches the Pod spec with two new containers - 1. Init container, 2. Pipy sidecar. The init container is ephemeral. It executes a set of iptables
commands and terminates. The init container requires NET_ADMIN Kernel capability for iptables changes to be applied. FSM uses iptables
to ensure that all inbound and outbound traffic is intercepted and redirected to the Pipy sidecar. The init container Docker image is passed as a string pointing to a container registry. This is passed via the spec.sidecar.initContainerImage
field of the MeshConfig
. The default value is defined in the chart values.
The Flomesh Service Mesh project is composed of the following five high-level components:
- Proxy control plane - handles gRPC connections from the service mesh sidecar proxies
- Certificate manager - handles issuance and management of certificates
- Endpoints providers - components capable of introspecting the participating compute platforms; these retrieve the IP addresses of the compute backing the services in the mesh
- Mesh specification - wrapper around the SMI Spec's Go SDK; this facility provides simple methods to retrieve SMI Spec resources, abstracting away cluster and storage specifics
- Mesh catalog - the service mesh's heart; this is the central component that collects inputs from all other components and dispatches configuration to the proxy control plane
(source)
Let's take a look at each component:
The Proxy Control Plane plays a key part in operating the service mesh. All proxies are installed as sidecars and establish an mTLS gRPC connection to the Proxy Control Plane. The proxies continuously receive configuration updates. This component implements the interfaces required by the specific reverse proxy chosen. FSM implements Pipy Repo API. The Pipy Plugin mechanism is used to extend the functionality.
The Proxy Control Plane's availability is of foremost importance when it comes to traffic policy enforcement and connectivity management between services. Some of the Control Plane design decisions are heavily influenced by that fact, such as its stateless nature. To read more on the design decisions behind the High Availability design of the Control Plane, please refer to the HA design doc.
Certificate Manager is a component that provides each service participating in the service mesh with a TLS certificate. These service certificates are used to establish and encrypt connections between services using mTLS.
Endpoints Providers are one or more components that communicate with the compute platforms (Kubernetes clusters, on-prem machines, or cloud-providers' VMs) participating in the service mesh. Endpoints providers resolve service names into lists of IP addresses. The Endpoints Providers understand the specific primitives of the compute provider they are implemented for, such as virtual machines, virtual machine scale sets, and Kubernetes clusters.
Mesh Specification is a wrapper around the existing SMI Spec components. This component abstracts the specific storage chosen for the YAML definitions. This module is effectively a wrapper around SMI Spec's Kubernetes informers, currently abstracting away the storage (Kubernetes/etcd) specifics.
Mesh Catalog is the central component of FSM, which combines the outputs of all other components into a structure, which can then be transformed into proxy configuration and dispatched to all listening proxies via the proxy control plane. This component:
- Communicates with the mesh specification module (4) to detect when a Kubernetes service was created, changed, or deleted via SMI Spec.
- Reaches out to the certificate manager (2) and requests a new TLS certificate for the newly discovered service.
- Retrieves the IP addresses of the mesh workloads by observing the compute platforms via the endpoints providers (3).
- Combines the outputs of 1, 2, and 3 above into a data structure, which is then passed to the proxy control plane (1), serialized and sent to all relevant connected proxies.
(source)
This section outlines the conventions adopted and guiding the development of the Flomesh Service Mesh (FSM). Components discussed in this section:
- (A) Proxy sidecar - Pipy or other reverse-proxy with service-mesh capabilities
- (B) Proxy Certificate - unique X.509 certificate issued to the specific proxy by the Certificate Manager
- (C) Service - Kubernetes service resource referenced in SMI Spec
- (D) Service Certificate - X.509 certificate issued to the service
- (E) Policy - SMI Spec traffic policy enforced by the target service's proxy
- Examples of service endpoints handling traffic for the given service:
- (F) Azure VM - process running on an Azure VM, listening for connections on IP 1.2.3.11, port 81.
- (G) Kubernetes Pod - container running on a Kubernetes cluster, listening for connections on IP 1.2.3.12, port 81.
- (H) On-prem compute - process running on a machine within the customer's private data center, listening for connections on IP 1.2.3.13, port 81.
The Service (C) is assigned a Certificate (D) and is associated with an SMI Spec Policy (E). Traffic for Service (C) is handled by the Endpoints (F, G, H) where each endpoint is augmented with a Proxy (A). The Proxy (A) has a dedicated Certificate (B), which is different from the Service Cert (D), and is used for mTLS connection from the Proxy to the proxy control plane.
Service in the diagram above is a Kubernetes service resource referenced in SMI Spec. An example is the bookstore
service defined below and referenced by a TrafficSplit
policy:
apiVersion: v1
kind: Service
metadata:
name: bookstore
labels:
app: bookstore
spec:
ports:
- port: 14001
targetPort: 14001
name: web-port
selector:
app: bookstore
---
apiVersion: split.smi-spec.io/v1alpha2
kind: TrafficSplit
metadata:
name: bookstore-traffic-split
spec:
service: bookstore
backends:
- service: bookstore-v1
weight: 100
In FSM Proxy
is defined as an abstract logical component, which:
- fronts a mesh service process (container or binary running on Kubernetes or a VM)
- maintains a connection to a proxy control plane (xDS server)
- continuously receives configuration updates from the proxy control plane (the Pipy Repo implementation) FSM ships out of the box with Pipy reverse-proxy implementation.
Within the FSM codebase Endpoint
is defined as the IP address and port number tuple of a container or a virtual machine, which is hosting a proxy, which is fronting a process, which is a member of a service and as such participates in the service mesh.
The service endpoints (F,G,H) are the actual binaries serving traffic for the service (C).
An endpoint uniquely identifies a container, binary, or a process.
It has an IP address, port number, and belongs to a service.
A service can have zero or more endpoints, and each endpoint can have only one sidecar proxy. Since an endpoint must belong to a single service, it follows that an associated proxy must also belong to a single service.
Proxies, fronting endpoints, which form a given service will share the certificate for the given service.
This certificate is used to establish mTLS connection with peer proxies fronting endpoints of other services within the service mesh.
The service certificate is short-lived.
Each service certificate's lifetime will be approximately 48 hours, which eliminates the need for a certificate revocation facility.
FSM declares a type ServiceCertificate
for these certificates.
ServiceCertificate
is how this kind of certificate is referred to in the Interfaces section of this document.
The proxy TLS certificate is a X.509 certificate issued to each individual proxy, by the Certificate Manager.
This kind of certificate is different than the service certificate and is used exclusively for proxy-to-control-plane mTLS communication.
Each Pipy proxy will be bootstrapped with a proxy certificate, which will be used for the xDS mTLS communication.
This kind of certificate is different than the one issued for service-to-service mTLS communication.
FSM declares a type ProxyCertificate
for these certificates.
We refer to these certificates as ProxyCertificate
in the interfaces declarations section of this document.
This certificate's Common Name leverages the DNS-1123 standard with the following format: <proxy-UUID>.<service-name>.<service-namespace>
. The chosen format allows us to uniquely identify the connected proxy (proxy-UUID
) and the namespaced service, which this proxy belongs to (service-name.service-namespace
).
The policy component referenced in the diagram above (E) is any SMI Spec resource referencing the service (C). For instance, TrafficSplit
, referencing a services bookstore
, and bookstore-v1
:
apiVersion: split.smi-spec.io/v1alpha2
kind: TrafficSplit
metadata:
name: bookstore-traffic-split
spec:
service: bookstore
backends:
- service: bookstore-v1
weight: 100
The service certificates issued by the Certificate Manager are short-lived certificates, with a validity of approximately 48 hours. The short certificate expiration eliminates the need for an explicit revocation mechanism. A given certificate's expiration will be randomly shortened or extended from the 48 hours, in order to avoid thundering herd problem inflicted on the underlying certificate management system. Proxy certificates, on the other hand, are long-lived certificates.
ProxyCertificate
is issued by FSM for aProxy
, which is expected to connect to the proxy control plane sometime in the future. After the certificate is issued, and before the proxy connects to the proxy control plane, the certificate is in theunclaimed
state. The state of the certificate changes toclaimed
after a proxy has connected to the control plane using the certificate.Proxy
is the reverse-proxy, which attempts to connect to the proxy control plane; theProxy
may, or may not be allowed to connect to the proxy control plane.Endpoint
is fronted by aProxy
, and is a member of aService
. FSM may have discovered endpoints, via the endpoints providers, which belong to a given service, but FSM has not seen any proxies, fronting these endpoints, connect to the proxy control plane yet.
The intersection of the set of issued ProxyCertificates
∩ connected Proxies
∩ discovered Endpoints
is the set of participants in the service mesh.
- Each
Proxy
is issued a uniqueProxyCertificate
, which is dedicated to xDS mTLS communication ProxyCertificate
has a per-proxy unique Subject CN, which identifies theProxy
, and the respectivePod
it resides on.- The
Proxy
's service membership is determined by the Pod's service membership. FSM identifies the Pod when the Pipy established gRPC to XDS and presents client certificate. Then CN of the cert contains a unique ID assigned to the pod and a Kubernetes namespace where the pod resides. Once XDS parses the CN of the connected Pipy, Pod context is available. From Pod we determine Service membership, Pod's ServiceAccount and other Kubernetes context. - There is one unique
ProxyCertificate
issued to oneProxy
, which is dedicated to one uniqueEndpoint
(pod). FSM limits the number of services served by a proxy to 1 for simplicity. - A mesh
Service
is constructed by one or moreProxyCertificate
+Proxy
+Endpoint
When a new pod is created (via deployment) the creation is intercepted by a MutationWebhookConfiguration. The actual web server handling webhook requests is the FSM pod itself. A request to create a new pod results in a patch operation adding the Pipy sidecar. The webhook handler server creates a bootstrap configuration for the Pipy proxy with two critical components:
- address of the XDS server (this is the FSM pod itself)
- mTLS certificate to connect to XDS (this certificate is different than the service-to-service certificates issued by FSM)
The webhook server also generates a UUID for the pod and the pod is labeled with it. The UUID is also used in the certificate. This UUID links the pod and the certificate.
The Pipy bootstrap certificate issued has a CN of the following format: <pod-uuid>.<pod-namespace>
. The pod-uuid
is a UUID generated by the webhook handler. This UUID is added as a label to the pod as well as in the certificate CN. The key for the Pod label is unique to the FSM instance.
When the Pipy proxy connects to xDS it presents the client certificate. FSM / xDS parses the certificate and from the pod UUID and the namespace (obtained from the cert CN) fetches the Kubernetes Pod object. From the Pod object FSM determines ServiceAccount and Service belonging. Based on this context and SMI policies, FSM generates Pipy config.
The FSM leverages the communicating sequential processes (CSP) pattern for sending messages between the various components of the system. This section describes the signaling mechanism used by FSM. ** TBD **
This section defines the Go Interfaces needed for the development of the FSM in this repository.
This section adopts the following assumptions:
- 1:1 relationship between a proxy and an instance of a service. (No more than one service fronted by the same proxy.)
- 1:1 relationship between an endpoint (port and IP) and a proxy
The Proxy control plane handles gRPC connections from the service mesh sidecar proxies and implements Pipy's Repo.
For a fully functional Pipy-based service mesh, the proxy control plane must implement the following interface:
- Pipy Repo - source
// AggregatedDiscoveryServiceServer is the server API for AggregatedDiscoveryService service. type AggregatedDiscoveryServiceServer interface { StreamAggregatedResources(AggregatedDiscoveryService_StreamAggregatedResourcesServer) error DeltaAggregatedResources(AggregatedDiscoveryService_DeltaAggregatedResourcesServer) error }
The StreamAggregatedResources
method is the entrypoint into the ADS vertical of FSM. This is
declared in the AggregatedDiscoveryServiceServer
interface, which is provided by the Pipy Go control plane.
Method DeltaAggregatedResources
is used by the ADS REST API. This project
implements gRPC only and this method will not be implemented.
When the Pipy Go control plane evaluates StreamAggregatedResources
it passes a AggregatedDiscoveryService_StreamAggregatedResourcesServer
server. The
implementation of the StreamAggregatedResources
method will then use server.Send(response)
to send
an pipy.DiscoveryResponce
to all connected proxies.
An MVP implementation of StreamAggregatedResources
would require:
- Depending on the
DiscoveryRequest.TypeUrl
theDiscoveryResponse
struct for CDS, EDS, RDS, LDS or SDS is created. This will provide connected Pipy proxies with a list of clusters, mapping of service name to list of routable IP addresses, list of permitted routes, listeners and secrets respectively. - a method of notifying the system when the method described in #1 needs to be evaluated to refresh the connected Pipy proxies with the latest available resources (endpoints, clusters, routes, listeners or secrets)
In the previous section, we proposed implementation of the StreamAggregatedResources
method. This provides
connected Pipy proxies with a list of clusters, mapping of service name to list of routable IP addresses, list of permitted routes, listeners and secrets for CDS, EDS, RDS, LDS and SDS respectively.
The ListEndpointsForService
method will be provided by the FSM component, which we refer to
as the Mesh Catalog in this document.
The Mesh Catalog will have access to the MeshSpec
, CertificateManager
, and the list of EndpointsProvider
s and ServiceProvider
s.
// MeshCataloger is the mechanism by which the Service Mesh controller discovers all Pipy proxies connected to the catalog.
type MeshCataloger interface {
// GetSMISpec returns the SMI spec
GetSMISpec() smi.MeshSpec
// ListInboundServiceIdentities lists the downstream service identities that can connect to the given service account
ListInboundServiceIdentities(service.K8sServiceAccount) ([]service.K8sServiceAccount, error)
// ListOutboundServiceIdentities lists the upstream service identities the given service account can connect to
ListOutboundServiceIdentities(service.K8sServiceAccount) ([]service.K8sServiceAccount, error)
// ListServiceIdentitiesForService lists the service identities associated with the given service
ListServiceIdentitiesForService(service.MeshService) ([]service.K8sServiceAccount, error)
// ListSMIPolicies lists SMI policies.
ListSMIPolicies() ([]*split.TrafficSplit, []service.K8sServiceAccount, []*spec.HTTPRouteGroup, []*target.TrafficTarget)
// ListEndpointsForService returns the list of provider endpoints corresponding to a service
ListEndpointsForService(service.MeshService) ([]endpoint.Endpoint, error)
// ExpectProxy catalogs the fact that a certificate was issued for an Pipy proxy and this is expected to connect to XDS.
ExpectProxy(certificate.CommonName)
// GetServicesForProxy returns a list of services the given Pipy is a member of based on its certificate,
// which is a cert issued to an Pipy for XDS communication (not Pipy-to-Pipy).
GetServicesForProxy(*pipy.Proxy) ([]service.MeshService, error)
// RegisterProxy registers a newly connected proxy with the service mesh catalog.
RegisterProxy(*pipy.Proxy)
// UnregisterProxy unregisters an existing proxy from the service mesh catalog
UnregisterProxy(*pipy.Proxy)
// GetServicesForServiceIdentity returns a list of services corresponding to a service identity
GetServicesForServiceIdentity(identity.ServiceIdentityt) ([]service.MeshService, error)
// GetIngressPoliciesForService returns the inbound traffic policies associated with an ingress service
GetIngressPoliciesForService(service.MeshService) ([]*trafficpolicy.InboundTrafficPolicy, error)
}
Additional types needed for this interface:
// MeshService is a type for a namespaced service
type MeshService struct {
Namespace string
Name string
}
// NamespacedServiceAccount is a type for a namespaced service account
type NamespacedServiceAccount struct {
Namespace string
ServiceAccount string
}
// TrafficPolicy is a struct of the allowed RoutePaths from sources to a destination
type TrafficPolicy struct {
PolicyName string
Destination TrafficResource
Source TrafficResource
PolicyRoutePaths []RoutePolicy
}
// Proxy is a representation of an Pipy proxy connected to the xDS server.
// This should at some point have a 1:1 match to an Endpoint (which is a member of a meshed service).
type Proxy struct {
certificate.CommonName
net.IP
ServiceName service.MeshService
announcements chan announcements.Announcement
lastSentVersion map[TypeURI]uint64
lastAppliedVersion map[TypeURI]uint64
lastNonce map[TypeURI]string
}
The Endpoints providers component provides abstractions around the Go SDKs of various Kubernetes clusters, or cloud vendor's virtual machines and other compute, which participate in the service mesh. Each endpoint provider is responsible for either a particular Kubernetes cluster, or a cloud vendor subscription. The Mesh catalog will query each Endpoints provider for a particular service, and obtain the IP addresses and ports of the endpoints handling traffic for service.
The Endpoints providers are aware of:
- Kubernetes Service and their own CRD
- vendor-specific APIs and methods to retrieve IP addresses and Port numbers for Endpoints
The Endpoints providers has no awareness of:
- what SMI Spec is
- what Proxy or sidecar is
Note: As of this iteration of FSM we deliberately choose to leak the Mesh Specification implementation into the EndpointsProvider. The Endpoints Providers are responsible for implementing a method to resolve an SMI-declared service to the provider's specific resource definition. For instance, when Azure EndpointProvider's
ListEndpointsForService
is invoked with some a service name the provider would use its own method to resolve the service to a list of Azure URIs (example:/resource/subscriptions/e3f0/resourceGroups/mesh-rg/providers/Microsoft.Compute/virtualMachineScaleSets/baz
). These URIs are unique identifiers of Azure VMs, VMSS, or other compute with Pipy reverse-proxies, participating in the service mesh.
In the sample ListEndpointsForService
implementation, the Mesh Catalog loops over a list of Endpoints providers:
for _, provider := range catalog.ListEndpointsProviders() {
For each provider
registered in the Mesh Catalog, we invoke ListEndpointsForService
.
The function will be provided a ServiceName
, which is an SMI-declared service. The provider will
resolve the service to its own resource ID. For example ListEndpointsForService
invoked on the
Azure EndpointsProvider with service webservice
, will resolve webservice
to the URI of an
Azure VM hosting an instance of
the service: /resource/subscriptions/e3f0/resourceGroups/mesh-rg/providers/Microsoft.Compute/virtualMachineScaleSets/baz
.
From the URI the provider will resolve the list of IP addresses of participating Pipy proxies.
package FSM
// EndpointsProvider is an interface to be implemented by components abstracting Kubernetes, Azure, and other compute/cluster providers.
type EndpointsProvider interface {
// ListEndpointsForService fetches the IPs and Ports for the given service
ListEndpointsForService(ServiceName) []Endpoint
}
This component provides an abstraction around the SMI Spec Go SDK.
The abstraction hides the Kubernetes primitives. This allows us to implement SMI Spec providers
that do not rely exclusively on Kubernetes for storage etc. MeshSpec
Interface provides
a set of methods, listing all services, traffic splits, and policy definitions for the
entire service mesh.
The MeshSpec
implementation has no awareness of:
- what Pipy or reverse-proxy is
- what IP address, Port number, or Endpoint is
- what Azure, Azure Resource Manager etc. is or how it works
// MeshSpec is an interface declaring functions, which provide the specs for a service mesh declared with SMI.
type MeshSpec interface {
// ListTrafficSplits lists SMI TrafficSplit resources
ListTrafficSplits() []*split.TrafficSplit
// ListServiceAccounts lists Service Account resources specified in SMI TrafficTarget resources
ListServiceAccounts() []service.K8sServiceAccount
// GetService fetches a Kubernetes Service resource for the given MeshService
GetService(service.MeshService) *corev1.Service
// ListServices Lists Kubernets Service resources that are part of monitored namespaces
ListServices() []*corev1.Service
// ListServiceAccounts Lists Kubernetes Service Account resources that are part of monitored namespaces
ListServiceAccounts() []*corev1.ServiceAccounts
// ListHTTPTrafficSpecs lists SMI HTTPRouteGroup resources
ListHTTPTrafficSpecs() []*spec.HTTPRouteGroup
// ListTrafficTargets lists SMI TrafficTarget resources
ListTrafficTargets() []*target.TrafficTarget
}
The certificate.Manager
as shown below is as simple as having a single method for issuing certificates, and another for obtaining a notification channel.
package certificate
// Manager is the interface declaring the methods for the Certificate Manager.
type Manager interface {
// IssueCertificate issues a new certificate.
IssueCertificate(CommonName, time.Duration) (*Certificate, error)
// GetCertificate returns a certificate given its Common Name (CN)
GetCertificate(CommonName) (*Certificate, error)
// RotateCertificate rotates an existing certificate.
RotateCertificate(CommonName) (*Certificate, error)
// GetRootCertificate returns the root certificate.
GetRootCertificate() (*Certificate, error)
// ListCertificates lists all certificates issued
ListCertificates() ([]*Certificate, error)
// ReleaseCertificate informs the underlying certificate issuer that the given cert will no longer be needed.
// This method could be called when a given payload is terminated. Calling this should remove certs from cache and free memory if possible.
ReleaseCertificate(CommonName)
// GetAnnouncementsChannel returns a channel, which is used to announce when changes have been made to the issued certificates.
GetAnnouncementsChannel() <-chan interface{}
}
The following types are referenced in the interfaces proposed in this document:
-
Port
// Port is a numerical port of an Pipy proxy type Port int
-
ServiceName
// ServiceName is the name of a service defined via SMI type ServiceName string
-
ServiceAccount
// ServiceAccount is a type for a service account type ServiceAccount string
-
Endpoint
// Endpoint is a tuple of IP and Port, representing an Pipy proxy, fronting an instance of a service type Endpoint struct { net.IP `json:"ip"` Port `json:"port"` }
-
ClusterName
// ClusterName is a type for a service name type ClusterName string
-
RoutePolicy
// RoutePolicy is a struct of a path and the allowed methods on a given route type RoutePolicy struct { PathRegex string `json:"path_regex:omitempty"` Methods []string `json:"methods:omitempty"` }
-
WeightedCluster
// WeightedCluster is a struct of a cluster and is weight that is backing a service type WeightedCluster struct { ClusterName ClusterName `json:"cluster_name:omitempty"` Weight int `json:"weight:omitempty"` }
-
TrafficResources
//TrafficResource is a struct of the various resources of a source/destination in the TrafficPolicy type TrafficResource struct { ServiceAccount ServiceAccount `json:"service_account:omitempty"` Namespace string `json:"namespace:omitempty"` Services []MeshService `json:"services:omitempty"` Clusters []WeightedCluster `json:"clusters:omitempty"` }