Adding GEP-3539: Gateway API to Expose Pods on Cluster-Internal IP Address (ClusterIP Gateway) #3608

ptrivedi · 2025-02-10T21:24:35Z

Recommend reviewing deploy preview so examples are inlined: https://deploy-preview-3608--kubernetes-sigs-gateway-api.netlify.app/geps/gep-3539/

Signed-off-by: Pooja Trivedi [email protected]

What type of PR is this?

/kind gep

What this PR does / why we need it:

This defines via documentation how Gateway API can be used to accomplish ClusterIP Service behavior. It also proposes DNS record format for ClusterIP Gateway, proposes an EndpointSelector resource, and briefly touches upon Gateway API usage to define LoadBalancer and NodePort behaviors.

Which issue(s) this PR fixes:

Fixes #3539

Does this PR introduce a user-facing change?:

NONE

linux-foundation-easycla · 2025-02-10T21:24:40Z

The committers listed above are authorized under a signed CLA.

✅ login: ptrivedi / name: Pooja Trivedi (741292c, 6a061ca, e876ced, 5176c6d)

k8s-ci-robot · 2025-02-10T21:24:45Z

Hi @ptrivedi. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

…dress (ClusterIP Gateway) Signed-off-by: Pooja Trivedi [email protected]

ptrivedi · 2025-02-11T14:30:24Z

Adding this comment here for tracking a few open items resulting from the comments on the google doc here: https://docs.google.com/document/d/1N-C-dBHfyfwkKufknwKTDLAw4AP2BnJlnmx0dB-cC4U/edit?tab=t.0

Topology aware routing feature needs to be discussed and hashed out in detail. Features like internal/externalTrafficPolicy should then be appropriately morphed and provided as a part of topology aware routing
EndpointSelector resource and DNS for Gateway topics warrant followup GEPs focused on these areas
Headless, ExternalName, and other DNS functionality may warrant separate DNS API/Object. Subject to further discussion
Need broader discussion around where do we implement this functionality, does it replace Service API completely in the long term and that we should have a migration plan, or does it become an underlying implementation for Service functionality allowing the simpler UX provided by Service API to be unchanged for end users while allowing advanced users to deal with Gateway API resources directly

@robscott @bowei @aojea @howardjohn @mskrocki

geps/gep-3539/index.md

* Fix missing image * Change GEP status to Memorandum * Make GEP navigable * Crop trailing whitespace from images Signed-off-by: Pooja [email protected]

ptrivedi · 2025-02-13T21:07:18Z

/assign @thockin

thockin

First: LOVE IT

The questions I keep coming back to all are around how the node-proxy knows to pay attention to THIS gateway so it can implement the clusterIP or nodePort or externalTrafficPolicy or ...

examples/standard/clusterip-gateway/customroute.yaml

thockin · 2025-02-21T00:17:43Z

examples/standard/clusterip-gateway/customroute.yaml

+  - name: example-cluster-ip-gateway
+  rules:
+    config:
+	sessionAffinity: false


Is the indent wrong on this?

Not just the indent, this config section was meant to be under spec. Likely a copy/paste foobar, will fix.

To explain further, I had brought this up as an open question/discussion point during the community meeting presentation. For Service features like internalTrafficPolicy, sessionAffinity, etc. (and possibly other things in the future), does it make sense to have a RouteConfig section under RouteSpec? FWIW, for the internal implementation of ClusterIP Gateway we are doing for multi-network environments, we use a custom route where we had some debate around whether these parameters should go directly under RouteSpec or in a separate section under RouteSpec.
The table towards the end of the document lists Service API features and the possibility of translating those to Gateway API. But the discussion has evolved further since, and it may make sense to think holistically about Gateway API functionality enhancements rather than simply thinking in terms of Service API mapping/parity. For example, Gateway aiming at a more generic model of topology routing functionality than features like internalTrafficPolicy in Service API that are restrictive and somewhat confusing. We likely need an overall discussion around having something like 'trafficDistribution' (or whatever we decide to call it in GW world) and where it should live (Do we propose a Route extension? Or does it belong at Gateway level?). Same for some of the other features.

There are parts of the service API that are deeply entangled with the implementation details. If we can untangle them, great, but they likely are not "generic" enough to apply to just any Gateway, or even to any L4 or L3 Gateway.

Let's look at the table from below:

sessionAffinity: seems generally applicable to any L3 or L4 GW

allocateLoadBalancerNodePorts: should be moot if NodePort is a different gateway than clusterIP or LB

externalIPs: a gross hack that perhaps we can abandon

externalTrafficPolicy: by definition does not apply to clusterIPs

internalTrafficPolicy: by definition ONLY applies to cluster IPs. It's not clear to me if we could have a generic L4Route.TrafficPolicy sort of thing. It makes sense for ClusterIP but not for a generic L4 or L3 route? I think? It feels "special", but I am not 100% sure

trafficDiustirbution: definitely matters for clusterIP, maybe is generic L4?

ipFamily: this is a request to the GW IP allocator, so it probably makes sense as a parameter to the Gateway? How do we express v4 vs v6 today?

publishNotReadyAddresses: This is related to endpoint selection which you are trying to factor out, but it IS used.

headless service: I feel like this is a different "kind" of Gateway. The fact that it was implemented as a variation of ClusterIP rather than a distinct value for Type is a historical thing.

externalName: Again, this is probably a different kind of gateway

type: obviated - the class of Gateway should capture the same concept space

I can't tell if I just argued for a generic L4 Route or for a special VirtualService route or something else?

ipFamily: this is a request to the GW IP allocator, so it probably makes sense as a parameter to the Gateway? How do we express v4 vs v6 today?

We don't. When requesting a static address, you can request either type (by string conversion, not by structured field), and it's up to the implementation to figure out if it can fulfill the request, and also what addresses to give if you don't specify anything. Note that implementations are totally allowed to give out multiple addresses here if no static addresses are speciffied.

We do have a type field in the addresses config, so we can extend there as needed though.

We also have #3616 to make the value optional, so that you can request an dynamically-allocated address of a specific type. Combine those two together, and we could add v4 and v6 specific types that could be validated more clearly or have you request a specific address type only.

Yeah, pre-allocated IPs matter a lot more for "north-south" than "east-west", but it seems inevitable that people want to express things like "IPv6 but you choose the address" or "dual-stack if possible, but single-stack is OK".

The Service ipFamilyPolicy covers that and I will be shocked if you don't need essentially the same gamut of config.

internalTrafficPolicy: by definition ONLY applies to cluster IPs. It's not clear to me if we could have a generic L4Route.TrafficPolicy sort of thing. It makes sense for ClusterIP but not for a generic L4 or L3 route? I think? It feels "special", but I am not 100% sure

Could it be thought of as being rolled under a broader topology-aware routing type thing, where routing behavior can be adjusted to keep traffic on the node, in the zone/region, across Cluster, and so on?

trafficDiustirbution: definitely matters for clusterIP, maybe is generic L4?

Yeah, pre-allocated IPs matter a lot more for "north-south" than "east-west", but it seems inevitable that people want to express things like "IPv6 but you choose the address" or "dual-stack if possible, but single-stack is OK".

The Service ipFamilyPolicy covers that and I will be shocked if you don't need essentially the same gamut of config.

I did not separately mention ipFamilies and ipFamilyPolicy. But the matrix of possiblities with ipFamilies specifying the list of families (IPv4, IPv6) and ipFamilyPolicy dictating the requested/required dual-stack properties ("SingleStack", "PreferDualStack", "RequireDualStack") is definitely something that may be needed.

thockin · 2025-02-21T00:31:19Z

geps/gep-3539/index.md

+
+### EndpointSelector as Backend
+
+A Route can forward traffic to the endpoints selected via selector rules defined in EndpointSelector. 


FWIW, I can imagine a path toward maybe making this a regular core feature. I am sure that it would be tricky but I don't think it's impossible.

Eg.

Define a Service with selector foo=bar. That triggers us to create a PodSelector for foo=bar. That triggers the endpoints controller(s) to do their thing. Same as we do with IP.

Interesting thought.

For starters at least, there seemed to be agreement on having a GEP for EndpointSelector as the next step.

As always, Gateway proves something is a good idea, then core steals the spotlight.

Define a Service with selector foo=bar. That triggers us to create a PodSelector for foo=bar. That triggers the endpoints controller(s) to do their thing.

FWIW NetworkPolicies also contain selectors that need to be resolved to Pods, and we've occasionally talked about how nice it would be if the selector-to-pod mapping could be handled centrally, rather than every NP impl needing to implement that itself, often doing it redundantly on every node.

I guess in theory, we could do that with EndpointSlice even, since kube-proxy will ignore EndpointSlices that don't have a label pointing back to a Service, so we could just have another set of EndpointSlices for NetworkPolicies... (EndpointSlice has a bunch of fields that are wrong for NetworkPolicy but most of them are optional and could just be left unset...)

Though this also reminds me of my theory that EndpointSlice should have been a gRPC API rather than an object stored in etcd. The EndpointSlice controller can re-derive the entire (controller-generated) EndpointSlice state from Services and Pods at any time, and it needs to keep all that state in memory while it's running anyway. So it should just serve that information out to the controllers that need it (kube-proxy, gateways) in an efficient use-case-specific form (kind of like the original kpng idea) rather than writing it all out to etcd.

(Alternate version: move discovery.k8s.io to an aggregated apiserver that is part of the EndpointSlice controller, and have it serve EndpointSlices out of memory rather than out of etcd.)

thockin · 2025-02-21T00:34:56Z

examples/standard/clusterip-gateway/clusterip-gatewayclass.yaml

+apiVersion: gateway.networking.k8s.io/v1
+kind: GatewayClass
+metadata:
+  name: cluster-ip


Is this name "special" or can it be anything?

It's intended that GatewayClass names can be any valid Kubernetes object name.

See expanded question under https://github.com/kubernetes-sigs/gateway-api/pull/3608/files#r1964558745

thockin · 2025-02-21T00:35:02Z

examples/standard/clusterip-gateway/clusterip-gatewayclass.yaml

+metadata:
+  name: cluster-ip
+spec:
+  controllerName: "cluster-ip-controller"


Is this name "special" or can it be anything?

The name can be anything but implementations must only reconcile GatewayClasses that has a controllerName that they expect. GatewayClass objects that do not match an implementation's controllerName must ignore that GatewayClass completely, and not update it at all (to prevent fighting on status).

Some implementations allow configuration of this string (for example, Contour allows it so that you can run multiple instances of Contour in a cluster).

Is that the behavior we want here? In Service, its a single object with (many) multiple controllers consuming it. If I want my service exposed to the CNI, kube-proxy, service mesh, observability platform, ... do I need to make N Gateways?

See expanded question under https://github.com/kubernetes-sigs/gateway-api/pull/3608/files#r1964558745

Agree with John's question, and I think it betrays a fundamental difference in perspective. I see this idea as "Services with a better API"

Because we're using the same object that can be used in other contexts though (ie Gateway), we need a way to disambiguate, and the way we have is GatewayClass. I'd be happy to see proposals around alternatives to GatewayClass, but I haven't seen anything to date that handles the problem that implementations of Gateway API almost always need multiple-namespace access, and the only currently available thing we have that's bigger than a single namespace is cluster-wide.

thockin · 2025-02-21T00:35:39Z

examples/standard/clusterip-gateway/clusterip-gateway.yaml

+  name: example-cluster-ip-gateway
+spec:
+  addresses:
+  - 10.12.0.15


How does kube-proxy (or Cilium or Antrea or ...) know which Gateways it should be capturing traffic for?

Normally that's handled by the rollup of Gateway -> GatewayClass. Implementations own GatewayClasses that specify the correct string in GatewayClass spec.controllerName. All Gateways in that GatewayClass in that GatewayClass would need to be serviced by an implementation that can fulfill this request (that is, it both has the required functionality, and, in this case of requesting a static address, is actually able to assign that address). In the case that an implementation cannot fulfil this Gateway for some reason, it must be marked as not Accepted (by having an Accepted type condition in the Gateway's status with status: false).

I can't tell if you are giving me a hard time or not :)

What I meant to ask is:

Service as a built-in API is (more or less) universally implemented by on-node agents (kube-proxy, cilium or antrea, ovn, etc). If we are trying to offer a form of ClusterIP Gateway which replaces part of the Service API, how does a user express "this is a cluster IP gateway" in a portable way such that all of the implementations know "this is for me"?

If each implementation has its own controllerName, and the GatewayClass can be named anything the cluster admin wants, how does our poor beleaguered app operator know what to put in their YAML?

Today they can say:

apiVersion: v1 kind: Service metadata: name: my-service spec: type: ClusterIP selector: foo: bar ports: - port: 8080

...and be confident that ANY cluster, regardless of which CNI, will allocate a virtual IP and route traffic.

I'd like to write a generic tool which does:

for each service S in `kubectl get svc -A` { evaluate template with S to produce an equivalent Gateway }

Yeah, okay, I see the use case, but this is the problem with extensions v core - we left the flexibility there for implementations, (for good reason), and now we don't have a way to define a default GatewayClass at all, even for specific use cases.

I think that practically, a tool like you describe would need to know the gatewayclass it was targeting, and output Gateways based on that.

We could conceivably have a convention and pick a reserved name (like cni-clusterip or something), but we've been reluctant in the past to do that, preferring the increased specificity of requiring people to specify something (even though there is a friction cost to be paid there).

(And I wasn't trying to give you a hard time - I have details get pushed out of my head all the time, so wanted to make sure this hadn't happened here. 😄 But also, I wanted to help other readers understand too)

I think that practically, a tool like you describe would need to know the gatewayclass it was targeting,

Hence my questions about "is this name special". One answer is "thou shalt use the name 'clusterip' and the 'clusterip' is the name thou shalt use", and just hope not to collide with users. Another answer is to define a sub-space of names that users can't currently use, or are exceedingly unlikely to be using e.g. k8s.io:clusterip. This is an appropriate place to ideate, right?

since 1.33 you can use the IPAddress object to represent an unique IP address in the cluster

official names looks like a good idea, but I do not think we should make this exclusive, we already have "service.kubernetes.io/service-proxy-name" for Services, so it makes sense we may consider multiple implementations of clusterIP , so we can delegate our prefix to indicate that this is a Service IP --- the relation with the IPAddress object will guarantee the consistency ...IPAddress already has a reference field and a managed-by label

I think my strawman approach is:

gateway class prefixed with clusterip.kubernetes.io/kube-proxy or clusterip.kubernetes.io/cilium,antrea,ovn-kubernetes

the gateway allocates the corresponding IPAddress on the cluster to avoid conflicts

&networking.IPAddress{ ObjectMeta: metav1.ObjectMeta{ Name: "192.168.2.2", Labels: map[string]string{"ipaddress.kubernetes.io/managed-by":"kube-proxy", }, Spec: networking.IPAddressSpec{ ParentRef: &networking.ParentReference{ Group: "gateway.networking.k8s.io", Resource: "gateway", Name: "foo", Namespace: "bar", }, }, },

this is even more complicated for type LoadBalancer

currently for services and passthrough LBs, part of the setup belongs to kube-proxy, cilium etc (routing on the nodes) and part to the cloud providers. You also have the .loadBalancerClass (Service API field) field so that you can instruct the LB controller to provision a specific kind of LB on the cloud provider side.
Wouldn't it be best to leave the GatewayClass to be equivalent to loadBalancerClass in this case? There would need to be something that instructs kube-proxy to do the routing on it's side.

thockin · 2025-02-21T00:40:18Z

geps/gep-3539/index.md

+{% include 'standard/clusterip-gateway/clusterip-gateway.yaml' %}
+```
+
+By default, IP address(es) from a pool specified by a CIDR block will be assigned unless a static IP is 


I think the default path should be to allocate from the same ServiceCIDR resource. If you need an IP from a different resource you would do something different. Either a different class or a different allocator or something.

I think the default path should be to allocate from the same ServiceCIDR resource. If you need an IP from a different resource you would do something different. Either a different class or a different allocator or something.

Two problems (depending on whether we think of this having a path to being a regular core feature):

The current ServiceCIDR is coupled with core, managed by core controllers

The name 'ServiceCIDR' sounds aligned to Service API

Maybe there's more?

commented in https://github.com/kubernetes-sigs/gateway-api/pull/3608/files#r2053572715, ServiceCIDR acts as a definition of ranges and can be decoupled and server for other implementations to express IP ranges, specially since there is no cross object checks that may complicate it ... we just need to follow the existing convention of adding a label managed-by so the Services controllers handle their own resources

thockin · 2025-02-21T00:46:49Z

geps/gep-3539/index.md

+in pods’ /etc/resolv.conf need to be programmed accordingly by kubelet.
+
+```
+<name of gateway>.<gateway-namespace>.gw.cluster.local


I think DNS is a fraught topic. We REALLY REALLY do not want to add more search paths, especially if they could cause ambiguous names. We could just lean on the "svc" space for this, since these are effectively services. We would need to define how to avoid collisions and I'd be lying if I said I had a great answer.

Maybe, like IPAddress, we extract ServiceName to new resource, and whomever gets there first wins? That sort of transaction doesn't work well for CRDs but I guess it could be async. Weird failure modes.

Agree on this. Another search path is not ideal. NodeLocal DNSCache may help. But the alternative of using "svc", as you pointed out, requires a solution for collision avoidance which could get messy. Another issue is using "svc" space which is used for 'Service' API -- more of a semantic issue

@aojea had some ideas here.

cc: @bowei

Maybe, like IPAddress, we extract ServiceName to new resource, and whomever gets there first wins?

agree, @thockin I do think we need to manage DNS and ClusterIP differently, both things are always different implementations, kube-proxy and coredns per example, so having the same resource is good for the happy scenario, but once each of them require more specific features we end conflating everything in the same object with problems" like kubernetes/kubernetes#105986 (comment)

thockin · 2025-02-21T00:50:19Z

geps/gep-3539/index.md

+
+| Feature | ServiceAPI options | Gateway API possibilities |
+|---|---|---|
+| sessionAffinity | ClientIP <br /> NoAffinity | Route level


Does L4 Gateway support affinity?

Not currently, no.

it would be great to have options for more than ClientIP as in the services case. We've had people asking about 2-tuple, 3-tuple, 5-tuple affinities

thockin · 2025-02-21T01:03:01Z

geps/gep-3539/index.md

+| sessionAffinity | ClientIP <br /> NoAffinity | Route level
+| allocateLoadBalancerNodePorts | True <br /> False | Not supported for ClusterIP Gateway <br /> Supported for LoadBalancer Gateway |
+| externalIPs | List of externalIPs for service | Not supported? |
+| externalTrafficPolicy | Local <br /> Cluster | Supported for LB Gateways only, Route level |


These are all interesting challenges which maybe need something more than a plain TCP Route?

A more generic L4Route that combines TCP/UDPRoutes?

howardjohn · 2025-02-21T17:11:20Z

geps/gep-3539/index.md

+
+When modeling ClusterIP service networking, the simplest recommendation might be to keep Gateway and Routes 
+within the same namespace. While cross namespace routing would work and allow for evolved functionality, 
+it may make supporting certain cases tricky. One specific example for this case is the pod DNS resolution 


Given we are making a new DNS name, do we actually care to support this POD-IP DNS name?

The pod-ip DNS name is for headless only: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#:~:text=Any%20Pods%20exposed,domain.example.
And that was before DNS specification. After DNS specification, the record is of the format
pod-hostname.service-name.my-namespace.svc.cluster-domain.example

Headless, in case of Gateway API would likely be expressed using a separate GatewayClass, in order to avoid conflating ClusterIP and Headless and provide clean separation of concerns. So, you are right, does not necessarily need to be supported for ClusterIP Gateway.
Headless and externalName that are more DNS specific warrant more discussion

If we end up in a new DNS space, this whole proposal's value is diminished. IMO.

howardjohn · 2025-02-21T17:16:08Z

geps/gep-3539/index.md

+Note that Gateway API allows flexibility and clear separation of concerns so that one would not need to 
+configure cluster-ip and node-port when configuring a load-balancer.
+
+But for completeness, the case shown below demonstrates how load balancer functionality analogous to 


All of this proposal makes sense as a logic way to solve "If you had to implement Service using Gateway API primitives, how would you do it".

What doesn't make sense to me is the why and the how this becomes something practically useful from a proposal to a thing in the real world.

The diagram below shows 1 object becoming 8. Do we expect users to actually create these 8 objects?

Which projects are expected to, and which are commited to, supporting these? Kube-proxy? Coredns? Various 3p CNIs (Cilium, calico, etc)? Service meshes? All gateway implementations?

All of this proposal makes sense as a logic way to solve "If you had to implement Service using Gateway API primitives, how would you do it".

Yes, that is how it started as a Memorandum GEP, but has evolved into 'this should be implemented'

What doesn't make sense to me is the why and the how this becomes something practically useful from a proposal to a thing in the real world.

Linking this question back to similar discussion thread here that has some good comments: #3608 (comment)

The diagram below shows 1 object becoming 8. Do we expect users to actually create these 8 objects?

This diagram was mainly for completeness, in case someone wanted to use a common Route across different types of Gateway to mimic the exact LB Service behavior. Ideally though, with the Gateway API model, you won't need to mimic that behavior and configuring an LB Gateway won't require you to configure ClusterIP and NodePort Gateways as well.
Not denying that you would still be going from 1 to 4 objects, but that's an ongoing discussion that doesn't get old.

Which projects are expected to, and which are commited to, supporting these? Kube-proxy? Coredns? Various 3p CNIs (Cilium, calico, etc)? Service meshes? All gateway implementations?

Don't have an answer to that yet.

Yeah, I think this is where "Gateways that are doing something similar to Service" and "Gateways that are trying to be an exact translation of v1.Service" really diverge.

If you are doing something similar to Service, you don't need an API-compatible implementation of NodePorts, warts and all.

But if you're doing an exact translation of Service, you don't need orthogonal, composable API pieces. We could just have "ServiceRoute" that mirrors the entire Service API all in one.

howardjohn · 2025-02-21T17:16:48Z

geps/gep-3539/index.md

+## Goals
+
+* Define Gateway API usage to accomplish ClusterIP Service style behavior
+* Propose DNS layout and record format for ClusterIP Gateway


It doesn't seem like we have fleshed this out. Compared to https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/ we have just 1-2 sentences with a lot of ambiguity here.

That is correct, we haven't. DNS bit has been identified as a good candidate to be split out into its own GEP

I've expressed elsewhere, but will say again - my ideal outcome is that users could 1-for-1 convert Services (maybe not "while running", but convert YAML) into this new form of Gateway and not change their clients at all. Ideally this new thing gets the same name as the equivalent Service.

youngnick

Initial pass over the first half or so, still thinking through some of the later half. Will be back with more in the next few days.

youngnick · 2025-02-24T03:17:47Z

geps/gep-3539/index.md

+# GEP-3539: ClusterIP Gateway - Gateway API to Expose Pods on Cluster-Internal IP Address
+
+* Issue: [#3539](https://github.com/kubernetes-sigs/gateway-api/issues/3539)
+* Status: Memorandum


This should currently be Provisional, as it's the first iteration and we are still deciding on the approach here.

The Memorandum status is for registering general agreement about things, not for features that will require actual code changes to the Gateway API specification (which this definitely will).

This also needs to be changed in the corresponding metadata.yaml file - the YAML file is actually the canonical place for the status, this is just to remind everyone. I'll suggest the same change there.

youngnick · 2025-02-24T03:20:31Z

examples/standard/clusterip-gateway/clusterip-gateway.yaml

+  name: example-cluster-ip-gateway
+spec:
+  addresses:
+  - 10.12.0.15


Normally that's handled by the rollup of Gateway -> GatewayClass. Implementations own GatewayClasses that specify the correct string in GatewayClass spec.controllerName. All Gateways in that GatewayClass in that GatewayClass would need to be serviced by an implementation that can fulfill this request (that is, it both has the required functionality, and, in this case of requesting a static address, is actually able to assign that address). In the case that an implementation cannot fulfil this Gateway for some reason, it must be marked as not Accepted (by having an Accepted type condition in the Gateway's status with status: false).

youngnick · 2025-02-24T03:20:50Z

examples/standard/clusterip-gateway/clusterip-gatewayclass.yaml

+apiVersion: gateway.networking.k8s.io/v1
+kind: GatewayClass
+metadata:
+  name: cluster-ip


It's intended that GatewayClass names can be any valid Kubernetes object name.

youngnick · 2025-02-24T03:22:38Z

examples/standard/clusterip-gateway/clusterip-gatewayclass.yaml

+metadata:
+  name: cluster-ip
+spec:
+  controllerName: "cluster-ip-controller"


The name can be anything but implementations must only reconcile GatewayClasses that has a controllerName that they expect. GatewayClass objects that do not match an implementation's controllerName must ignore that GatewayClass completely, and not update it at all (to prevent fighting on status).

Some implementations allow configuration of this string (for example, Contour allows it so that you can run multiple instances of Contour in a cluster).

examples/standard/clusterip-gateway/customroute.yaml

geps/gep-3539/index.md

youngnick · 2025-02-24T03:33:11Z

geps/gep-3539/index.md

+(Gateway resource), implementation specifics and common configuration (GatewayClass 
+resource), and routing traffic to backends (Route resource).
+
+### Limitations of Service API


I think we need to be realistic here and acknowledge the benefits of the Service API from a user's POV - which I think we could summarize as that, for simple use cases, its very simple. It's only one object, as opposed to (at minimum) four in the simplest case here (GatewayClass, Gateway, Route, and EndpointSelector).

I completely agree that breaking Service apart for more advanced use cases is useful, but we should acknowledge the reason why it's stuck around for so long - the level of simplicity and flexibility it has allows folks to get started much more easily. Additionally, Service is a GA API that's not going anywhere, so we need to be very clear that we're not talking about deprecating or replacing Service with this. As with Gateway API north/south and Ingress, the GA core resource is going to stick around, but this proposal is about giving us a better base to look at adding features to rather than trying to fit them into the existing, overloaded Service construct.

Speaking from experience, putting a section outlining this into this document now will save a lot of discussion later.

💯 agree, which is why, IMO, the best path forward here is making Service the higher level API that decomposes into these resources. Then you can chose to break the abstraction and manually configure the underlying resources (as you can do today by manually creating EndpointSlice!).

Otherwise we end up with a split universe indefinitely

Whether literally core/v1.Service gets decomposed into these or some new type which is "familiar but better" or some embedding into Gateway itself, I think I agree. We need not fear layering.

k8s-ci-robot · 2025-02-26T15:25:34Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ptrivedi
Once this PR has been reviewed and has the lgtm label, please ask for approval from thockin. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

LionelJouin · 2025-03-10T20:15:49Z

geps/gep-3539/index.md

+{% include 'standard/clusterip-gateway/tcproute-with-endpointselector.yaml' %}
+```
+
+The EndpointSelector object is defined as follows. It allows the user to specify which endpoints 


Would it make sense to have a config field so we can have implementation specific parameters?
For example, if I create a Layer 3/4 load balancer type of gateway, I would like to express how the traffic will be distributed (what algorithm is being used), the maximum number of endpoints and from where IP addresses should be selected (ResourceClaim (Multi-Network), Annotations (Multus)...).

Would it make sense to have a config field so we can have implementation specific parameters?

Yes, I think something like config would be needed.

For example, if I create a Layer 3/4 load balancer type of gateway, I would like to express how the traffic will be distributed (what algorithm is being used),

How the traffic will be distributed seems to be more of a Route level config than EndpointSelector level config. e.g. BackendRefs already have a weight field today

the maximum number of endpoints and from where IP addresses should be selected (ResourceClaim (Multi-Network), Annotations (Multus)...).

publishNotReadyAddresses could be one more thing that could go here

Yes, I think something like config would be needed.

The Gateway has a similar field in .spec.infrastructure.parametersRef pointing to another object holding the configuration (e.g. configmap).
Otherwise, it is also possible to use runtime.RawExtension to embed arbitrary parameters in the EndpointSelector.
DRA uses it for example: https://github.com/kubernetes/api/blob/release-1.33/resource/v1beta2/types.go#L1032

How the traffic will be distributed seems to be more of a Route level config than EndpointSelector level config. e.g. BackendRefs already have a weight field today

Not sure. To me, this is a combination of both Route and Backend (Service, EndpointSelector...). The Route steers traffic to backends via some characteristics (L7 (HTTP...), L3/L4 (IPs, Ports, protocols)) and the backend (Service, EndpointSelector...) defines how to distribute it (Load-Balance it over a set of IPs).

publishNotReadyAddresses could be one more thing that could go here

Yes, to me publishNotReadyAddresses would also make sense there in the EndpointSelector.

LionelJouin · 2025-03-10T20:16:34Z

geps/gep-3539/index.md

+| externalIPs | List of externalIPs for service | Not supported? |
+| externalTrafficPolicy | Local <br /> Cluster | Supported for LB Gateways only, Route level |
+| internalTrafficPolicy | Local <br /> Cluster | Supported for ClusterIP Gateways only, Route level |
+| ipFamily | IPv4 <br /> IPv6 | Route level |


If the IPs are specified only in the Gateway and not in the Routes, why would the ipFamily be at the Route Level?
Or should there be a Layer 3 and 4 Route that contains the Ports, Protocol and IPs (+ipFamily)?

If the IPs are specified only in the Gateway and not in the Routes, why would the ipFamily be at the Route Level? Or should there be a Layer 3 and 4 Route that contains the Ports, Protocol and IPs (+ipFamily)?

I think you are right, ipFamily makes sense at the Gateway level -- even ports and protocols are under Listeners at Gateway level

If the IPs are specified only in the Gateway and not in the Routes, why would the ipFamily be at the Route Level? Or should there be a Layer 3 and 4 Route that contains the Ports, Protocol and IPs (+ipFamily)?

I also just realized why I had some of these at the Route level. I started off with wanting to minimize changes to the API and Route is extensible while Gateway is not. Things are evolving a bit differently now though.

More here: #3608 (comment)

mmamczur · 2025-03-25T16:50:02Z

geps/gep-3539/index.md

+configure cluster-ip and node-port when configuring a load-balancer.
+
+But for completeness, the case shown below demonstrates how load balancer functionality analogous to 
+LoadBalancer Service API can be achieved using Gateway API.


the example from the image uses load-balancer as the class. The cloud providers usually have a few variants of LBs and preferably these would have their own classes

but these would be somewhat unique since the cloud provider controller would act on it but also kubeproxy, cilium and others would need to do some setup on their side. Maybe we could set something on the GatewayClass that would indicate it's a L4 LB class so that the node networking part has to treat it as a LB?

the example from the image uses load-balancer as the class. The cloud providers usually have a few variants of LBs and preferably these would have their own classes

but these would be somewhat unique since the cloud provider controller would act on it but also kubeproxy, cilium and others would need to do some setup on their side. Maybe we could set something on the GatewayClass that would indicate it's a L4 LB class so that the node networking part has to treat it as a LB?

Linking a similar discussion: #3608 (comment)

gauravkghildiyal · 2025-03-26T06:33:16Z

/cc

bowei · 2025-04-21T20:05:02Z

/ok-to-test

bowei · 2025-04-21T20:05:08Z

/assign

geps/gep-3539/index.md

* Change GEP status to Provisional * Fix indentation in Route yaml * Fix optional-selection notation

danwinship

late to the party...

danwinship · 2025-05-02T13:06:31Z

geps/gep-3539/index.md

@@ -0,0 +1,240 @@
+# GEP-3539: ClusterIP Gateway - Gateway API to Expose Pods on Cluster-Internal IP Address


This might have started out as "ClusterIP Gateways" but at this point it's really more like "Service-equivalent functionality via Gateway API".

danwinship · 2025-05-02T13:06:53Z

geps/gep-3539/index.md

+
+## Goals
+
+* Define Gateway API usage to accomplish ClusterIP Service style behavior


Beyond the fact that it's not just ClusterIP, I think there are at least 3 use cases hiding in that sentence.

"Gateway as new-and-improved Service" - Providing an API that does generally the same thing that v1.Service does, but in a cleaner and more orthogonally-extensible way, so that when people have feature requests like "I want externalTrafficPolicy: Local Services without allocating healthCheckNodePorts" (to pick the most recent example), they can do that without us needing to add Yet Another ServiceSpec Flag.

"Gateway as a backend for v1.Service" - Providing an API that can do everything that v1.Service can do (even the deprecated parts and the parts we don't like), so that you can programmatically turn Services into Gateways and then the backend proxies/loadbalancers/etc would not need to look at Service objects at all.

"MultiNetworkService" - Providing an API that lets users do v1.Service-equivalent things in multi-network contexts.

The GEP talks about case 2 some, but it doesn't really explain why we'd want to do that (other than via the link to Tim's KubeCon lightning talk).

danwinship · 2025-05-02T13:32:13Z

geps/gep-3539/index.md

+additional practical concerns that rendered Service API insufficient for the needs at hand.
+
+Service IPs can only be assigned out of the ServiceCIDR range configured for the API server. 
+While Kubernetes 1.31 added a Beta feature that allows for the Extension of Service IP Ranges, 


(This is GA now.)

danwinship · 2025-05-02T13:46:24Z

geps/gep-3539/index.md

+coupled in API server, it is not possible to use the current Service API to achieve this model 
+without resorting to inelegant and klugey implementations.
+
+Gateway API also satisfies, in a user-friendly and uncomplicated manner, the need for advanced 


danwinship · 2025-05-02T13:50:02Z

geps/gep-3539/index.md

+potentially other resource kinds) directly to a Route via backendRef. 
+
+```yaml
+apiVersion: networking.gke.io/v1alpha1


(not sure if this apiVersion is just a placeholder for now or if it should have been replaced with something else?)

danwinship · 2025-05-02T14:36:32Z

geps/gep-3539/index.md

+
+### EndpointSelector as Backend
+
+A Route can forward traffic to the endpoints selected via selector rules defined in EndpointSelector. 


Define a Service with selector foo=bar. That triggers us to create a PodSelector for foo=bar. That triggers the endpoints controller(s) to do their thing.

FWIW NetworkPolicies also contain selectors that need to be resolved to Pods, and we've occasionally talked about how nice it would be if the selector-to-pod mapping could be handled centrally, rather than every NP impl needing to implement that itself, often doing it redundantly on every node.

I guess in theory, we could do that with EndpointSlice even, since kube-proxy will ignore EndpointSlices that don't have a label pointing back to a Service, so we could just have another set of EndpointSlices for NetworkPolicies... (EndpointSlice has a bunch of fields that are wrong for NetworkPolicy but most of them are optional and could just be left unset...)

Though this also reminds me of my theory that EndpointSlice should have been a gRPC API rather than an object stored in etcd. The EndpointSlice controller can re-derive the entire (controller-generated) EndpointSlice state from Services and Pods at any time, and it needs to keep all that state in memory while it's running anyway. So it should just serve that information out to the controllers that need it (kube-proxy, gateways) in an efficient use-case-specific form (kind of like the original kpng idea) rather than writing it all out to etcd.

(Alternate version: move discovery.k8s.io to an aggregated apiserver that is part of the EndpointSlice controller, and have it serve EndpointSlices out of memory rather than out of etcd.)

danwinship · 2025-05-02T14:38:50Z

geps/gep-3539/index.md

+apiVersion: networking.gke.io/v1alpha1
+kind: EndpointSelector
+metadata:
+  name: front-end-pods


probably want this to work the same way EndpointSlice does, where the name is not meaningful (so as to avoid conflicts), and there's a label (or something) that correlates it with its Service

danwinship · 2025-05-02T14:48:56Z

geps/gep-3539/index.md

+Note that Gateway API allows flexibility and clear separation of concerns so that one would not need to 
+configure cluster-ip and node-port when configuring a load-balancer.
+
+But for completeness, the case shown below demonstrates how load balancer functionality analogous to 


Yeah, I think this is where "Gateways that are doing something similar to Service" and "Gateways that are trying to be an exact translation of v1.Service" really diverge.

If you are doing something similar to Service, you don't need an API-compatible implementation of NodePorts, warts and all.

But if you're doing an exact translation of Service, you don't need orthogonal, composable API pieces. We could just have "ServiceRoute" that mirrors the entire Service API all in one.

danwinship · 2025-05-02T15:04:44Z

geps/gep-3539/index.md

+| ipFamily | IPv4 <br /> IPv6 | Route level |
+| publishNotReadyAddresses | True <br /> False | Route or EndpointSelector level |
+| ClusterIP (headless service) | IPAddress <br /> None | GatewayClass definition for Headless Service type |
+| externalName | External name reference <br /> (e.g. DNS CNAME) | GatewayClass definition for ExternalName Service type |


sessionAffinity - As noted elsewhere, this not implemented compatibly by all service proxies. It's also not implemented by many LoadBalancers because historically we have mostly not done any e2e testing for non-GCE LoadBalancers.

externalIPs - bad alternative implementation of LoadBalancers. Needed for "exactly equivalent to Service" Gateways but not wanted for "similar to Service" Gateways.

externalTrafficPolicy: Local - overly-opinionated combined implementation of two separate features (preserve source IP / route traffic more efficiently). We should do this better for the "similar to Service" case.

publishNotReadyAddresses - is this just an early attempt to solve the problem that was later solved better by ProxyTerminatingEndpoints?

Not mentioned here:

trafficDistribution - I'm not sure what Gateway already has for topology, but this is definitely something that should be exposed generically.

youngnick · 2025-05-12T07:59:03Z

I still haven't had the bandwidth to come back and give this a full, proper pass, but I did want to point out that, while this PR is currently targeting "Provisional" status, which isn't bound by Gateway API's release cycle, if you did want to look at moving this to Experimental (and thus, having something be implementable) this year, an item needs to be added to the Scoping discussion at #3760 to cover including it there.

If folks don't feel there will be bandwidth to push this forward, we can concentrate on getting this into Provisional in the v1.4 timeframe, then look at Experimental for v1.5.

k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/gep PRs related to Gateway Enhancement Proposal(GEP) labels Feb 10, 2025

k8s-ci-robot requested review from mlavacca and youngnick February 10, 2025 21:24

k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 10, 2025

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 10, 2025

ptrivedi force-pushed the gep-clusterip-gateway branch from afc6467 to 835e6a3 Compare February 10, 2025 21:30

Adding GEP-3539: Gateway API to Expose Pods on Cluster-Internal IP Ad…

6a061ca

…dress (ClusterIP Gateway) Signed-off-by: Pooja Trivedi [email protected]

ptrivedi force-pushed the gep-clusterip-gateway branch from 835e6a3 to 6a061ca Compare February 10, 2025 21:48

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Feb 10, 2025

mskrocki reviewed Feb 11, 2025

View reviewed changes

geps/gep-3539/index.md Outdated Show resolved Hide resolved

geps/gep-3539/index.md Show resolved Hide resolved

geps/gep-3539/index.md Outdated Show resolved Hide resolved

k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 11, 2025

ptrivedi force-pushed the gep-clusterip-gateway branch 3 times, most recently from 1e793b0 to b5e81ee Compare February 12, 2025 15:17

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Feb 12, 2025

GEP-3539: a few fixes

e876ced

* Fix missing image * Change GEP status to Memorandum * Make GEP navigable * Crop trailing whitespace from images Signed-off-by: Pooja [email protected]

ptrivedi force-pushed the gep-clusterip-gateway branch from b5e81ee to e876ced Compare February 12, 2025 15:35

k8s-ci-robot assigned thockin Feb 13, 2025

thockin reviewed Feb 21, 2025

View reviewed changes

howardjohn reviewed Feb 21, 2025

View reviewed changes

youngnick reviewed Feb 24, 2025

View reviewed changes

LionelJouin reviewed Mar 10, 2025

View reviewed changes

robscott mentioned this pull request Mar 11, 2025

GEP: Gateway Routability #1651

Open

mmamczur reviewed Mar 25, 2025

View reviewed changes

k8s-ci-robot requested a review from gauravkghildiyal March 26, 2025 06:33

bittrance mentioned this pull request Apr 9, 2025

cilium-operator accepts, but fails to create clusterip service for gateway cilium/cilium#38852

Open

3 tasks

ptrivedi force-pushed the gep-clusterip-gateway branch from b61be16 to 3d16085 Compare April 14, 2025 15:01

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 21, 2025

k8s-ci-robot assigned bowei Apr 21, 2025

aojea reviewed Apr 22, 2025

View reviewed changes

geps/gep-3539/index.md Show resolved Hide resolved

Fixes:

5176c6d

* Change GEP status to Provisional * Fix indentation in Route yaml * Fix optional-selection notation

ptrivedi force-pushed the gep-clusterip-gateway branch from 3d16085 to 5176c6d Compare April 22, 2025 15:31

This was referenced Apr 26, 2025

[ServiceCIDR] how to conrol allocation? kubernetes/kubernetes#130792

Open

externalIPs DNAT rules are not installed when clusterIP is None kubernetes/kubernetes#131497

Open

ptrivedi force-pushed the gep-clusterip-gateway branch 2 times, most recently from d38a4b4 to 2839a40 Compare April 30, 2025 15:23

Fixes to make conformance tests happy

741292c

ptrivedi force-pushed the gep-clusterip-gateway branch from 2839a40 to 741292c Compare April 30, 2025 15:51

danwinship reviewed May 2, 2025

View reviewed changes

aojea mentioned this pull request May 3, 2025

Support port ranges or whole IPs in services kubernetes/kubernetes#23864

Open

danwinship mentioned this pull request May 8, 2025

Allow disabling healthCheckNodePort kubernetes/kubernetes#130776

Closed


		### EndpointSelector as Backend

		A Route can forward traffic to the endpoints selected via selector rules defined in EndpointSelector.

		@@ -0,0 +1,240 @@
		# GEP-3539: ClusterIP Gateway - Gateway API to Expose Pods on Cluster-Internal IP Address


		## Goals

		* Define Gateway API usage to accomplish ClusterIP Service style behavior

Adding GEP-3539: Gateway API to Expose Pods on Cluster-Internal IP Address (ClusterIP Gateway) #3608

Are you sure you want to change the base?

Adding GEP-3539: Gateway API to Expose Pods on Cluster-Internal IP Address (ClusterIP Gateway) #3608

Conversation

ptrivedi commented Feb 10, 2025 • edited Loading

linux-foundation-easycla bot commented Feb 10, 2025 • edited Loading

k8s-ci-robot commented Feb 10, 2025

ptrivedi commented Feb 11, 2025

ptrivedi commented Feb 13, 2025

thockin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

youngnick Feb 27, 2025 • edited Loading

Choose a reason for hiding this comment

thockin Feb 27, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aojea Apr 22, 2025 • edited Loading

Choose a reason for hiding this comment

mmamczur Apr 23, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ptrivedi Apr 11, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

youngnick left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-ci-robot commented Feb 26, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ptrivedi Apr 14, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ptrivedi commented Feb 10, 2025 •

edited

Loading

linux-foundation-easycla bot commented Feb 10, 2025 •

edited

Loading

youngnick Feb 27, 2025 •

edited

Loading

thockin Feb 27, 2025 •

edited

Loading

aojea Apr 22, 2025 •

edited

Loading

mmamczur Apr 23, 2025 •

edited

Loading

ptrivedi Apr 11, 2025 •

edited

Loading

ptrivedi Apr 14, 2025 •

edited

Loading

ptrivedi May 1, 2025 •

edited

Loading