Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate privileges in Windows environments #192

Open
aojea opened this issue Jan 20, 2025 · 12 comments
Open

Validate privileges in Windows environments #192

aojea opened this issue Jan 20, 2025 · 12 comments
Assignees
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.

Comments

@aojea
Copy link
Contributor

aojea commented Jan 20, 2025

Thank you so much! I forgot to enable privileged mode in Windows and WSL2. Now it works properly.

I am using Docker Desktop shared between WSL2 and Windows, and I’ve tried both NAT and mirrored WSL2 network configurations.

Notably:

  • When using NAT mode, cloud-provider-kind needs to be started in WSL2 with privileged mode, and the service can only be accessed via curl from WSL2, not from Windows.

  • When using mirrored mode, cloud-provider-kind needs to be started in Windows with privileged mode, and the service can only be accessed via curl from Windows, not from WSL2.

Both configurations can assign an external IP to the LoadBalancer service. However, is there a way to make the LoadBalancer service accessible from both WSL2 and Windows?

Originally posted by @d2461795341 in #189

It will be good to add a check for windows environments to validate there is enough privileges or fail , same as we do for Mac

// Process on macOS must run using sudo
if runtime.GOOS == "darwin" && syscall.Geteuid() != 0 {
klog.Fatalf("Please run this again with `sudo`.")
}

@aojea aojea added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Jan 20, 2025
@priyanshikhetwani
Copy link
Contributor

/assign

@dprotaso
Copy link
Contributor

I just hit this :)

@dprotaso
Copy link
Contributor

Am I supposed to run cloud-provider-kind with sudo ? and it should work

@dprotaso
Copy link
Contributor

Reading the subsequent comment in the linked thread - it requires sudo and --enable-lb-port-mapping=true

@dprotaso
Copy link
Contributor

Though I'm unable to curl the Service LB IP - it doesn't seem to work hmm.

@aojea
Copy link
Contributor Author

aojea commented Jan 24, 2025

Though I'm unable to curl the Service LB IP - it doesn't seem to work hmm.

you should not need to enable-lb-portmapping, that is a flag for environments that does not allow to create direct routing to the containers so people can still find the portmaps in the host and use them, but was added just as a special option for these edge cases #126, read the description of that PR to understand the multiple combinations of deployments that can exist

If you want to use the Service LB IP you can not set that flag

// some platforms require to enable tunneling for the LoadBalancers
if runtime.GOOS == "darwin" || runtime.GOOS == "windows" || isWSL2() {
config.DefaultConfig.LoadBalancerConnectivity = config.Tunnel
}
// flag overrides autodetection
if enableLBPortMapping {
config.DefaultConfig.LoadBalancerConnectivity = config.Portmap
}

@dprotaso
Copy link
Contributor

If you want to use the Service LB IP you can not set that flag

Understood. Then it must be something different. I think my WSL2 gets into a state where cloud-provider-kind stops working and just repeats

I0124 22:55:31.138874  555465 server.go:121] updating loadbalancer tunnels on userspace
I0124 22:55:31.172692  555465 tunnel.go:34] found port maps map[10000:58098 443:58099 80:58100] associated to container kindccm-RPWYB6HN42ZHBDLSU4MXX34VENT34LIWFJOVYVFS
I0124 22:55:31.206579  555465 tunnel.go:41] setting IPv4 address 172.18.0.3 associated to container kindccm-RPWYB6HN42ZHBDLSU4MXX34VENT34LIWFJOVYVFS
E0124 22:55:31.208417  555465 controller.go:301] "Unhandled Error" err="error processing service envoy-gateway-system/envoy-knit-system-external-gateway-5ffdd475 (retrying with exponential backoff): failed to ensure load balancer: exit status 2" logger="UnhandledError"
I0124 22:55:31.208572  555465 event.go:389] "Event occurred" object="envoy-gateway-system/envoy-knit-system-external-gateway-5ffdd475" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: exit status 2"

I'm running it with sudo on a vanilla kind cluster kind create cluster.

Using Ubuntu-24.04 (Default) and Docker Desktop 4.37.1 (178610)

I think what's happening is the docker container that the cloud provider creates (envoy) isn't being cleaned up properly.

I find I have to reboot wsl to get out of this state.

@aojea
Copy link
Contributor Author

aojea commented Jan 25, 2025

check with docker ps and docker logs the logs from those containers that fail to start to see why they fail

@dprotaso
Copy link
Contributor

dprotaso commented Jan 28, 2025

ok so I have

$ docker ps
CONTAINER ID   IMAGE                      COMMAND                  CREATED      STATUS      PORTS                                                                     NAMES
e9bb90cb3a4f   kindest/node:v1.32.0       "/usr/local/bin/entr…"   2 days ago   Up 2 days   127.0.0.1:45823->6443/tcp                                                 kind-control-plane
9952c14547c6   envoyproxy/envoy:v1.30.1   "/docker-entrypoint.…"   2 days ago   Up 2 days   0.0.0.0:59093->80/tcp, 0.0.0.0:59092->443/tcp, 0.0.0.0:59091->10000/tcp   kindccm-RPWYB6HN42ZHBDLSU4MXX34VENT34LIWFJOVYVFS

docker logs shows the health check succeeding

{"health_checker_type":"HTTP","host":{"socket_address":{"protocol":"TCP","address":"172.18.0.3","port_value":30213,"resolver_name":"","ipv4_compat":false}},"cluster_name":"cluster_IPv4_80_TCP","timestamp":"2025-01-26T22:41:26.047Z","locality":{"region":"","zone":"","sub_zone":""},"successful_health_check_event":{}}
{"health_checker_type":"HTTP","host":{"socket_address":{"protocol":"TCP","address":"172.18.0.3","port_value":30851,"resolver_name":"","ipv4_compat":false}},"cluster_name":"cluster_IPv4_443_TCP","timestamp":"2025-01-26T22:41:31.346Z","locality":{"region":"","zone":"","sub_zone":""},"successful_health_check_event":{}}
{"health_checker_type":"HTTP","host":{"socket_address":{"protocol":"TCP","address":"172.18.0.3","port_value":30213,"resolver_name":"","ipv4_compat":false}},"cluster_name":"cluster_IPv4_80_TCP","timestamp":"2025-01-26T22:41:31.374Z","locality":{"region":"","zone":"","sub_zone":""},"successful_health_check_event":{}}
{"health_checker_type":"HTTP","host":{"socket_address":{"protocol":"TCP","address":"172.18.0.3","port_value":30213,"resolver_name":"","ipv4_compat":false}},"cluster_name":"cluster_IPv4_80_TCP","timestamp":"2025-01-26T22:41:34.378Z","locality":{"region":"","zone":"","sub_zone":""},"successful_health_check_event":{}}

The K8s Service has interestingly has a different IP

envoy-gateway-system   envoy-knit-system-external-gateway-5ffdd475   LoadBalancer   10.96.12.59    172.18.0.2    80:30213/TCP,443:30851/TCP                2d22h

cloud-provider-kind logs show

I0127 21:23:46.443039  779724 proxy.go:271] envoy config info: &{HealthCheckPort:31636 ServicePorts:map[IPv4_443_TCP:{Listener:{Address:0.0.0.0 Port:443 Protocol:TCP} Cluster:[{Address:172.18.0.3 Port:30851 Protocol:TCP}]} IPv4_80_TCP:{Listener:{Address:0.0.0.0 Port:80 Protocol:TCP} Cluster:[{Address:172.18.0.3 Port:30213 Protocol:TCP}]}] SessionAffinity:None SourceRanges:[]}
I0127 21:23:46.443968  779724 proxy.go:289] updating loadbalancer with config 
resources:
- "@type": type.googleapis.com/envoy.config.listener.v3.Listener
  name: listener_IPv4_443_TCP
  address:
    socket_address:
      address: 0.0.0.0
      port_value: 443
      protocol: TCP
  filter_chains:
  - filters:
    - name: envoy.filters.network.tcp_proxy
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.filters.network.tcp_proxy.v3.TcpProxy
        access_log:
        - name: envoy.file_access_log
          typed_config:
            "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
        stat_prefix: tcp_proxy
        cluster: cluster_IPv4_443_TCP
- "@type": type.googleapis.com/envoy.config.listener.v3.Listener
  name: listener_IPv4_80_TCP
  address:
    socket_address:
      address: 0.0.0.0
      port_value: 80
      protocol: TCP
  filter_chains:
  - filters:
    - name: envoy.filters.network.tcp_proxy
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.filters.network.tcp_proxy.v3.TcpProxy
        access_log:
        - name: envoy.file_access_log
          typed_config:
            "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
        stat_prefix: tcp_proxy
        cluster: cluster_IPv4_80_TCP
I0127 21:23:46.556231  779724 proxy.go:300] updating loadbalancer with config 
resources:
- "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
  name: cluster_IPv4_443_TCP
  connect_timeout: 5s
  type: STATIC
  lb_policy: RANDOM
  health_checks:
  - timeout: 5s
    interval: 3s
    unhealthy_threshold: 2
    healthy_threshold: 1
    no_traffic_interval: 5s
    always_log_health_check_failures: true
    always_log_health_check_success: true
    event_log_path: /dev/stdout
    http_health_check:
      path: /healthz
  load_assignment:
    cluster_name: cluster_IPv4_443_TCP
    endpoints:
      - lb_endpoints:
        - endpoint:
            health_check_config:
              port_value: 31636
            address:
              socket_address:
                address: 172.18.0.3
                port_value: 30851
                protocol: TCP
- "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
  name: cluster_IPv4_80_TCP
  connect_timeout: 5s
  type: STATIC
  lb_policy: RANDOM
  health_checks:
  - timeout: 5s
    interval: 3s
    unhealthy_threshold: 2
    healthy_threshold: 1
    no_traffic_interval: 5s
    always_log_health_check_failures: true
    always_log_health_check_success: true
    event_log_path: /dev/stdout
    http_health_check:
      path: /healthz
  load_assignment:
    cluster_name: cluster_IPv4_80_TCP
    endpoints:
      - lb_endpoints:
        - endpoint:
            health_check_config:
              port_value: 31636
            address:
              socket_address:
                address: 172.18.0.3
                port_value: 30213
                protocol: TCP
I0127 21:23:46.811008  779724 server.go:121] updating loadbalancer tunnels on userspace
I0127 21:23:46.847518  779724 tunnel.go:34] found port maps map[10000:59091 443:59092 80:59093] associated to container kindccm-RPWYB6HN42ZHBDLSU4MXX34VENT34LIWFJOVYVFS
I0127 21:23:46.884891  779724 tunnel.go:41] setting IPv4 address 172.18.0.2 associated to container kindccm-RPWYB6HN42ZHBDLSU4MXX34VENT34LIWFJOVYVFS
E0127 21:23:46.887681  779724 controller.go:301] "Unhandled Error" err="error processing service envoy-gateway-system/envoy-knit-system-external-gateway-5ffdd475 (retrying with exponential backoff): failed to ensure load balancer: exit status 2" logger="UnhandledError"

@dprotaso
Copy link
Contributor

Docker inspect on the envoy container shows a different IP address 172.18.0.2 -

 "Networks": {
                "kind": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": null,
                    "MacAddress": "02:42:ac:12:00:02",
                    "DriverOpts": null,
                    "NetworkID": "7861de7e728cb4e42331afdf842e6319003aad9a03f618391e30838a6f7d27e8",
                    "EndpointID": "18365511b7a35b94053d04286fb53e0937c93f2c23af9442b97e75b993fbc5ac",
                    "Gateway": "172.18.0.1",
                    "IPAddress": "172.18.0.2",
                    "IPPrefixLen": 16,
                    "IPv6Gateway": "fc00:f853:ccd:e793::1",
                    "GlobalIPv6Address": "fc00:f853:ccd:e793::2",
                    "GlobalIPv6PrefixLen": 64,
                    "DNSNames": [
                        "kindccm-RPWYB6HN42ZHBDLSU4MXX34VENT34LIWFJOVYVFS",
                        "cf044aecb4fe"
                    ]
                }
            }`

@dprotaso
Copy link
Contributor

I think the repro steps might then be

  1. Create a kind cluster & run cloud-provider-kind
  2. Create a Service type=LB
  3. Stop the kind cluster and cloud-provider-ind
  4. Start the kind cluster
  5. Start cloud-provider-kind (see it fail)

@aojea
Copy link
Contributor Author

aojea commented Jan 28, 2025

The IP is correct, the node seems to have this config

              port_value: 31636. <<<< healtcheck nodeport
            address:
              socket_address:
                address: 172.18.0.3. <<<< kind-node IP
                port_value: 30851. <<<<< node port

The cloud-provider-kind spawns a new loadbalancer that has IP 172.18.0.2 and add that IP as external IP of the Service

envoy-gateway-system envoy-knit-system-external-gateway-5ffdd475 LoadBalancer 10.96.12.59 172.18.0.2 80:30213/TCP,443:30851/TCP

So you can poll 172.18.0.2:80 and the envoy container forward to 172.18.0.3:30213 (sae for the 443 port to 30851)

@dprotaso before step 3. does it work? docker may change the IPs of containers on stop starts, so I would not be surprised something is carrying an old value, the part I'm puzzled is

E0127 21:23:46.887681 779724 controller.go:301] "Unhandled Error" err="error processing service envoy-gateway-system/envoy-knit-system-external-gateway-5ffdd475 (retrying with exponential backoff): failed to ensure load balancer: exit status 2" logger="UnhandledError"

if you have a repro ping me in slack and we can screenshare

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.
Projects
None yet
Development

No branches or pull requests

3 participants