Skip to content

Adding GEP-3539: Gateway API to Expose Pods on Cluster-Internal IP Address (ClusterIP Gateway) #3608

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions examples/standard/clusterip-gateway/clusterip-gateway.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: example-cluster-ip-gateway
spec:
addresses:
- value: 10.12.0.15
gatewayClassName: cluster-ip
listeners:
- name: example-service
protocol: TCP
port: 8080
allowedRoutes:
namespaces:
from: Same
kinds:
- kind: TCPRoute
- kind: CustomRoute
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: cluster-ip
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this name "special" or can it be anything?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's intended that GatewayClass names can be any valid Kubernetes object name.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spec:
controllerName: "networking.k8s.io/cluster-ip-controller"
Binary file added geps/gep-3539/images/LB-NP-Clusterip.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added geps/gep-3539/images/gatewayclasses-lb-np.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
240 changes: 240 additions & 0 deletions geps/gep-3539/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,240 @@
# GEP-3539: ClusterIP Gateway - Gateway API to Expose Pods on Cluster-Internal IP Address

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might have started out as "ClusterIP Gateways" but at this point it's really more like "Service-equivalent functionality via Gateway API".


* Issue: [#3539](https://github.com/kubernetes-sigs/gateway-api/issues/3539)
* Status: Provisional

## TLDR

Gateway API enables advanced traffic routing and can be used to expose a
logical set of pods on a single IP address within a cluster. It can be seen
as the next generation ClusterIP providing more flexibility and composability
than Service API. This comes at the expense of some additional configuration
and manageability burden.

## Goals

* Define Gateway API usage to accomplish ClusterIP Service style behavior

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beyond the fact that it's not just ClusterIP, I think there are at least 3 use cases hiding in that sentence.

  1. "Gateway as new-and-improved Service" - Providing an API that does generally the same thing that v1.Service does, but in a cleaner and more orthogonally-extensible way, so that when people have feature requests like "I want externalTrafficPolicy: Local Services without allocating healthCheckNodePorts" (to pick the most recent example), they can do that without us needing to add Yet Another ServiceSpec Flag.
  2. "Gateway as a backend for v1.Service" - Providing an API that can do everything that v1.Service can do (even the deprecated parts and the parts we don't like), so that you can programmatically turn Services into Gateways and then the backend proxies/loadbalancers/etc would not need to look at Service objects at all.
  3. "MultiNetworkService" - Providing an API that lets users do v1.Service-equivalent things in multi-network contexts.

The GEP talks about case 2 some, but it doesn't really explain why we'd want to do that (other than via the link to Tim's KubeCon lightning talk).

* Propose DNS layout and record format for ClusterIP Gateway
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem like we have fleshed this out. Compared to https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/ we have just 1-2 sentences with a lot of ambiguity here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct, we haven't. DNS bit has been identified as a good candidate to be split out into its own GEP

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've expressed elsewhere, but will say again - my ideal outcome is that users could 1-for-1 convert Services (maybe not "while running", but convert YAML) into this new form of Gateway and not change their clients at all. Ideally this new thing gets the same name as the equivalent Service.

* Extend the use of Gateway API to provide NodePort and LoadBalancer Service
type of functionality

## Non-Goals

* Make significant changes to Gateway API
* Provide path for existing ClusterIP Services in a cluster to migrate to
Gateway API model

## API Changes

* EndpointSelector is recognized as a backend
* DNS record format for ClusterIP Gateways

## Introduction

Gateway API provides a generic and composable model for defining L4 and L7
routing in Kubernetes. Very simply, it describes how to get traffic into pods.
ClusterIP provides similar functionality of an ingress point for routing traffic
into pods. As the Gateway API has evolved, there have been discussions around whether
it can be a substitute for the increasingly complex and overloaded Service API. This
document aims to describe what this could look like in practice, with a focus on
ClusterIP and a brief commentary on how the concept design can be extended to
accommodate LoadBalancer and NodePort Services.

## Overview

Gateway API can be thought of as decomposing Service API into multiple separable
components that allow for definition of the ClusterIP address and listener configuration
(Gateway resource), implementation specifics and common configuration (GatewayClass
resource), and routing traffic to backends (Route resource).

### Limitations of Service API
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to be realistic here and acknowledge the benefits of the Service API from a user's POV - which I think we could summarize as that, for simple use cases, its very simple. It's only one object, as opposed to (at minimum) four in the simplest case here (GatewayClass, Gateway, Route, and EndpointSelector).

I completely agree that breaking Service apart for more advanced use cases is useful, but we should acknowledge the reason why it's stuck around for so long - the level of simplicity and flexibility it has allows folks to get started much more easily. Additionally, Service is a GA API that's not going anywhere, so we need to be very clear that we're not talking about deprecating or replacing Service with this. As with Gateway API north/south and Ingress, the GA core resource is going to stick around, but this proposal is about giving us a better base to look at adding features to rather than trying to fit them into the existing, overloaded Service construct.

Speaking from experience, putting a section outlining this into this document now will save a lot of discussion later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 agree, which is why, IMO, the best path forward here is making Service the higher level API that decomposes into these resources. Then you can chose to break the abstraction and manually configure the underlying resources (as you can do today by manually creating EndpointSlice!).

Otherwise we end up with a split universe indefinitely

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether literally core/v1.Service gets decomposed into these or some new type which is "familiar but better" or some embedding into Gateway itself, I think I agree. We need not fear layering.


Besides what has been discussed in the past about Service API maintainability, evolvability,
and complexity concerns, see: https://www.youtube.com/watch?v=Oslwx3hj2Eg, we ran into
additional practical concerns that rendered Service API insufficient for the needs at hand.

Service IPs can only be assigned out of the ServiceCIDR range configured for the API server.
While Kubernetes 1.31 added a Beta feature that allows for the Extension of Service IP Ranges,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This is GA now.)

there have been use cases where multi-NIC pods (pods with multiple network interfaces) require
the flexibility of specifying different ServiceCIDR ranges to be used for ClusterIP services
corresponding to the multiple different networks. There are strict traffic splitting and network
isolation requirements that demand non-overlapping ServiceCIDR ranges for per-network ClusterIP
service groups. Because of the way service definition and IP address allocation are tightly
coupled in API server, it is not possible to use the current Service API to achieve this model
without resorting to inelegant and klugey implementations.

Gateway API also satisfies, in a user-friendly and uncomplicated manner, the need for advanced

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤣

routing and load balancing capabilities in order to enable canary rollouts, weighted traffic
distribution, isolation of access and configuration.

### Service Model to Gateway API Model

![image displaying service model to gateway api model mapping](images/service-model-gw-api-model.png "image displaying service model to gateway api model mapping")

### EndpointSelector as Backend

A Route can forward traffic to the endpoints selected via selector rules defined in EndpointSelector.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I can imagine a path toward maybe making this a regular core feature. I am sure that it would be tricky but I don't think it's impossible.

Eg.

Define a Service with selector foo=bar. That triggers us to create a PodSelector for foo=bar. That triggers the endpoints controller(s) to do their thing. Same as we do with IP.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting thought.

For starters at least, there seemed to be agreement on having a GEP for EndpointSelector as the next step.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As always, Gateway proves something is a good idea, then core steals the spotlight.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define a Service with selector foo=bar. That triggers us to create a PodSelector for foo=bar. That triggers the endpoints controller(s) to do their thing.

FWIW NetworkPolicies also contain selectors that need to be resolved to Pods, and we've occasionally talked about how nice it would be if the selector-to-pod mapping could be handled centrally, rather than every NP impl needing to implement that itself, often doing it redundantly on every node.

I guess in theory, we could do that with EndpointSlice even, since kube-proxy will ignore EndpointSlices that don't have a label pointing back to a Service, so we could just have another set of EndpointSlices for NetworkPolicies... (EndpointSlice has a bunch of fields that are wrong for NetworkPolicy but most of them are optional and could just be left unset...)

Though this also reminds me of my theory that EndpointSlice should have been a gRPC API rather than an object stored in etcd. The EndpointSlice controller can re-derive the entire (controller-generated) EndpointSlice state from Services and Pods at any time, and it needs to keep all that state in memory while it's running anyway. So it should just serve that information out to the controllers that need it (kube-proxy, gateways) in an efficient use-case-specific form (kind of like the original kpng idea) rather than writing it all out to etcd.

(Alternate version: move discovery.k8s.io to an aggregated apiserver that is part of the EndpointSlice controller, and have it serve EndpointSlices out of memory rather than out of etcd.)

While Service is the default resource kind of the referent in backendRef, EndpointSelector is
suggested as an example of a custom resource that implementations could have to attach pods (or
potentially other resource kinds) directly to a Route via backendRef.

```yaml
apiVersion: networking.gke.io/v1alpha1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not sure if this apiVersion is just a placeholder for now or if it should have been replaced with something else?)

kind: EndpointSelector
metadata:
name: front-end-pods

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably want this to work the same way EndpointSlice does, where the name is not meaningful (so as to avoid conflicts), and there's a label (or something) that correlates it with its Service

spec:
kind: Pod
selector:
- key: app
value: frontend
operator: In
```

The EndpointSelector object is defined as follows. It allows the user to specify which endpoints
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to have a config field so we can have implementation specific parameters?
For example, if I create a Layer 3/4 load balancer type of gateway, I would like to express how the traffic will be distributed (what algorithm is being used), the maximum number of endpoints and from where IP addresses should be selected (ResourceClaim (Multi-Network), Annotations (Multus)...).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to have a config field so we can have implementation specific parameters?

Yes, I think something like config would be needed.

For example, if I create a Layer 3/4 load balancer type of gateway, I would like to express how the traffic will be distributed (what algorithm is being used),

How the traffic will be distributed seems to be more of a Route level config than EndpointSelector level config. e.g. BackendRefs already have a weight field today

the maximum number of endpoints and from where IP addresses should be selected (ResourceClaim (Multi-Network), Annotations (Multus)...).

publishNotReadyAddresses could be one more thing that could go here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think something like config would be needed.

The Gateway has a similar field in .spec.infrastructure.parametersRef pointing to another object holding the configuration (e.g. configmap).
Otherwise, it is also possible to use runtime.RawExtension to embed arbitrary parameters in the EndpointSelector.
DRA uses it for example: https://github.com/kubernetes/api/blob/release-1.33/resource/v1beta2/types.go#L1032

How the traffic will be distributed seems to be more of a Route level config than EndpointSelector level config. e.g. BackendRefs already have a weight field today

Not sure. To me, this is a combination of both Route and Backend (Service, EndpointSelector...). The Route steers traffic to backends via some characteristics (L7 (HTTP...), L3/L4 (IPs, Ports, protocols)) and the backend (Service, EndpointSelector...) defines how to distribute it (Load-Balance it over a set of IPs).

publishNotReadyAddresses could be one more thing that could go here

Yes, to me publishNotReadyAddresses would also make sense there in the EndpointSelector.

should be targeted for the Route.

```yaml
apiVersion: networking.gke.io/v1alpha1
kind: EndpointSelector
metadata:
name: front-end-pods
spec:
kind: Pod
selector:
- key: app
value: frontend
operator: In
```

To allow more granular control over traffic routing, there have been discussions around adding
support for using Kubernetes resources besides Service (or external endpoints) directly as backendRefs.
Gateway API allows for this flexibility, so having a generic EndpointSelector resource supported as a
backendRef would be a good evolutionary step.

### User Journey

Infrastructure provider supplies a GatewayClass corresponding to the type of service-like behavior to
be supported.

Below is the example of a GatewayClass for ClusterIP support:
```yaml
{% include 'standard/clusterip-gateway/clusterip-gatewayclass.yaml' %}
```

The user must then create a Gateway in order to configure and enable the behavior as per their intent:
```yaml
{% include 'standard/clusterip-gateway/clusterip-gateway.yaml' %}
```

By default, IP address(es) from a pool specified by a CIDR block will be assigned unless a static IP is
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the default path should be to allocate from the same ServiceCIDR resource. If you need an IP from a different resource you would do something different. Either a different class or a different allocator or something.

Copy link
Author

@ptrivedi ptrivedi Apr 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the default path should be to allocate from the same ServiceCIDR resource. If you need an IP from a different resource you would do something different. Either a different class or a different allocator or something.

Two problems (depending on whether we think of this having a path to being a regular core feature):

  1. The current ServiceCIDR is coupled with core, managed by core controllers
  2. The name 'ServiceCIDR' sounds aligned to Service API

Maybe there's more?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented in https://github.com/kubernetes-sigs/gateway-api/pull/3608/files#r2053572715, ServiceCIDR acts as a definition of ranges and can be decoupled and server for other implementations to express IP ranges, specially since there is no cross object checks that may complicate it ... we just need to follow the existing convention of adding a label managed-by so the Services controllers handle their own resources

configured in the _addresses_ field as shown above. The CIDR block may be configured using a custom CR.
Subject to further discussion, it may make sense to have a GatewayCIDR resource available upstream to
specify an IP address range for Gateway IP allocation.

Finally the specific Route and EndpointSelector resources must be created in order to set up the backend
pods for the configured ClusterIP.
```yaml
kind: [TCPRoute|CustomRoute]
metadata:
name: service-route
spec:
config:
sessionAffinity: false
parentRefs:
- name: example-cluster-ip-gateway
rules:
backendRefs:
- kind: EndpointSelector
port: 8080
name: exampleapp-app-pods
---
apiVersion: gateway.networking.k8s.io/v1alpha1
kind: EndpointSelector
metadata:
name: exampleapp-app-pods
spec:
selector:
- key: app
value: exampleapp
operator: In
```

### Backends on Listeners

As seen above, Gateway API requires at least three CRs to be defined. This introduces some complexity.
GEP-1713 proposes the addition of a ListenerSet resource to allow sets of listeners to attach to a Gateway.
As a part of discussions around this topic, the idea of directly adding backendRefs to listeners has come
up. Allowing backendRefs directly on the listeners eliminates the need to have Route objects for simple
cases. More complex traffic splitting and advanced load balancing cases can still use Route attachments via
allowedRoutes.

### DNS

ClusterIP Gateways in the cluster need to have consistent DNS names assigned to allow ClusterIP lookup by
name rather than IP address. DNS A and/or AAAA record creation needs to happen when Kubernetes publishes
information about Gateways, in a manner similar to ClusterIP Service creation behavior. DNS nameservers
in pods’ /etc/resolv.conf need to be programmed accordingly by kubelet.

```
<name of gateway>.<gateway-namespace>.gw.cluster.local
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think DNS is a fraught topic. We REALLY REALLY do not want to add more search paths, especially if they could cause ambiguous names. We could just lean on the "svc" space for this, since these are effectively services. We would need to define how to avoid collisions and I'd be lying if I said I had a great answer.

Maybe, like IPAddress, we extract ServiceName to new resource, and whomever gets there first wins? That sort of transaction doesn't work well for CRDs but I guess it could be async. Weird failure modes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on this. Another search path is not ideal. NodeLocal DNSCache may help. But the alternative of using "svc", as you pointed out, requires a solution for collision avoidance which could get messy. Another issue is using "svc" space which is used for 'Service' API -- more of a semantic issue

@aojea had some ideas here.

cc: @bowei

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, like IPAddress, we extract ServiceName to new resource, and whomever gets there first wins?

agree, @thockin I do think we need to manage DNS and ClusterIP differently, both things are always different implementations, kube-proxy and coredns per example, so having the same resource is good for the happy scenario, but once each of them require more specific features we end conflating everything in the same object with problems" like kubernetes/kubernetes#105986 (comment)

```

This results in the following search option entries in Pods’ /etc/resolv.conf:
```
search <ns>.gw.cluster.local gw.cluster.local cluster.local
```

### Cross-namespace References

Gateway API allows for Routes in different namespaces to attach to the Gateway.

When modeling ClusterIP service networking, the simplest recommendation might be to keep Gateway and Routes
within the same namespace. While cross namespace routing would work and allow for evolved functionality,
it may make supporting certain cases tricky. One specific example for this case is the pod DNS resolution
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we are making a new DNS name, do we actually care to support this POD-IP DNS name?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pod-ip DNS name is for headless only: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#:~:text=Any%20Pods%20exposed,domain.example.
And that was before DNS specification. After DNS specification, the record is of the format
pod-hostname.service-name.my-namespace.svc.cluster-domain.example

Headless, in case of Gateway API would likely be expressed using a separate GatewayClass, in order to avoid conflating ClusterIP and Headless and provide clean separation of concerns. So, you are right, does not necessarily need to be supported for ClusterIP Gateway.
Headless and externalName that are more DNS specific warrant more discussion

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we end up in a new DNS space, this whole proposal's value is diminished. IMO.

support of the following format

```
pod-ipv4-address.gateway-name.my-namespace.gw.cluster-domain.example
```

If Gateway and Routes (and hence the backing pods) are in different namespaces, there arises ambiguity in
whether and how to support this pod DNS resolution format.

## LoadBalancer and NodePort Services

Extending the concept further to LoadBalancer and NodePort type services follows a similar pattern. The idea
is to have a GatewayClass corresponding to each type of service networking behavior that needs to be modeled
and supported.

![image displaying gatewayclasses to represent different service types](images/gatewayclasses-lb-np.png "image displaying gatewayclasses to represent different service types")

Note that Gateway API allows flexibility and clear separation of concerns so that one would not need to
configure cluster-ip and node-port when configuring a load-balancer.

But for completeness, the case shown below demonstrates how load balancer functionality analogous to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this proposal makes sense as a logic way to solve "If you had to implement Service using Gateway API primitives, how would you do it".

What doesn't make sense to me is the why and the how this becomes something practically useful from a proposal to a thing in the real world.

The diagram below shows 1 object becoming 8. Do we expect users to actually create these 8 objects?

Which projects are expected to, and which are commited to, supporting these? Kube-proxy? Coredns? Various 3p CNIs (Cilium, calico, etc)? Service meshes? All gateway implementations?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this proposal makes sense as a logic way to solve "If you had to implement Service using Gateway API primitives, how would you do it".

Yes, that is how it started as a Memorandum GEP, but has evolved into 'this should be implemented'

What doesn't make sense to me is the why and the how this becomes something practically useful from a proposal to a thing in the real world.

Linking this question back to similar discussion thread here that has some good comments: #3608 (comment)

The diagram below shows 1 object becoming 8. Do we expect users to actually create these 8 objects?

This diagram was mainly for completeness, in case someone wanted to use a common Route across different types of Gateway to mimic the exact LB Service behavior. Ideally though, with the Gateway API model, you won't need to mimic that behavior and configuring an LB Gateway won't require you to configure ClusterIP and NodePort Gateways as well.
Not denying that you would still be going from 1 to 4 objects, but that's an ongoing discussion that doesn't get old.

Which projects are expected to, and which are commited to, supporting these? Kube-proxy? Coredns? Various 3p CNIs (Cilium, calico, etc)? Service meshes? All gateway implementations?

Don't have an answer to that yet.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this is where "Gateways that are doing something similar to Service" and "Gateways that are trying to be an exact translation of v1.Service" really diverge.

If you are doing something similar to Service, you don't need an API-compatible implementation of NodePorts, warts and all.

But if you're doing an exact translation of Service, you don't need orthogonal, composable API pieces. We could just have "ServiceRoute" that mirrors the entire Service API all in one.

LoadBalancer Service API can be achieved using Gateway API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the example from the image uses load-balancer as the class. The cloud providers usually have a few variants of LBs and preferably these would have their own classes

but these would be somewhat unique since the cloud provider controller would act on it but also kubeproxy, cilium and others would need to do some setup on their side. Maybe we could set something on the GatewayClass that would indicate it's a L4 LB class so that the node networking part has to treat it as a LB?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the example from the image uses load-balancer as the class. The cloud providers usually have a few variants of LBs and preferably these would have their own classes

but these would be somewhat unique since the cloud provider controller would act on it but also kubeproxy, cilium and others would need to do some setup on their side. Maybe we could set something on the GatewayClass that would indicate it's a L4 LB class so that the node networking part has to treat it as a LB?

Linking a similar discussion: #3608 (comment)


![image displaying LB Service API analogous GW API objects](images/LB-NP-Clusterip.png "image displaying LB Service API analogous GW API objects")

## Additional Service API Features

Services natively provide additional features as listed below (not an exhaustive list). Gateway API can be
extended to provide some of these features natively, while others may be left up to the specifics of
implementations.

| Feature | ServiceAPI options | Gateway API possibilities |
|---|---|---|
| sessionAffinity | ClientIP <br /> NoAffinity | Route level
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does L4 Gateway support affinity?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not currently, no.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be great to have options for more than ClientIP as in the services case. We've had people asking about 2-tuple, 3-tuple, 5-tuple affinities

| allocateLoadBalancerNodePorts | True <br /> False | Not supported? (No need for LB Gateway type to also create NodePort) |
| externalIPs | List of externalIPs for service | Not supported? |
| externalTrafficPolicy | Local <br /> Cluster | Supported for LB Gateways only, Route level |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all interesting challenges which maybe need something more than a plain TCP Route?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more generic L4Route that combines TCP/UDPRoutes?

| internalTrafficPolicy | Local <br /> Cluster | Supported for ClusterIP Gateways only, Route level |
| ipFamily | IPv4 <br /> IPv6 | Route level |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the IPs are specified only in the Gateway and not in the Routes, why would the ipFamily be at the Route Level?
Or should there be a Layer 3 and 4 Route that contains the Ports, Protocol and IPs (+ipFamily)?

Copy link
Author

@ptrivedi ptrivedi Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the IPs are specified only in the Gateway and not in the Routes, why would the ipFamily be at the Route Level? Or should there be a Layer 3 and 4 Route that contains the Ports, Protocol and IPs (+ipFamily)?

I think you are right, ipFamily makes sense at the Gateway level -- even ports and protocols are under Listeners at Gateway level

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the IPs are specified only in the Gateway and not in the Routes, why would the ipFamily be at the Route Level? Or should there be a Layer 3 and 4 Route that contains the Ports, Protocol and IPs (+ipFamily)?

I also just realized why I had some of these at the Route level. I started off with wanting to minimize changes to the API and Route is extensible while Gateway is not. Things are evolving a bit differently now though.

Copy link
Author

@ptrivedi ptrivedi May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More here: #3608 (comment)

| publishNotReadyAddresses | True <br /> False | Route or EndpointSelector level |
| ClusterIP (headless service) | IPAddress <br /> None | GatewayClass definition for Headless Service type |
| externalName | External name reference <br /> (e.g. DNS CNAME) | GatewayClass definition for ExternalName Service type |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • sessionAffinity - As noted elsewhere, this not implemented compatibly by all service proxies. It's also not implemented by many LoadBalancers because historically we have mostly not done any e2e testing for non-GCE LoadBalancers.
  • externalIPs - bad alternative implementation of LoadBalancers. Needed for "exactly equivalent to Service" Gateways but not wanted for "similar to Service" Gateways.
  • externalTrafficPolicy: Local - overly-opinionated combined implementation of two separate features (preserve source IP / route traffic more efficiently). We should do this better for the "similar to Service" case.
  • publishNotReadyAddresses - is this just an early attempt to solve the problem that was later solved better by ProxyTerminatingEndpoints?

Not mentioned here:

  • trafficDistribution - I'm not sure what Gateway already has for topology, but this is definitely something that should be exposed generically.


## References

* [Original Doc](https://docs.google.com/document/d/1N-C-dBHfyfwkKufknwKTDLAw4AP2BnJlnmx0dB-cC4U/edit)
7 changes: 7 additions & 0 deletions geps/gep-3539/metadata.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
apiVersion: internal.gateway.networking.k8s.io/v1alpha1
kind: GEPDetails
number: 3539
name: ClusterIP Gateway - Gateway API to Expose Pods on Cluster-Internal IP Address
status: Provisional
authors:
- ptrivedi
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,7 @@ nav:
- geps/gep-2659/index.md
- geps/gep-2722/index.md
- geps/gep-2907/index.md
- geps/gep-3539/index.md
- Declined:
- geps/gep-735/index.md
- geps/gep-1282/index.md
Expand Down