Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Envoy crash using CDS with dynamic configuration from filesystem #34195

Closed
aojea opened this issue May 16, 2024 · 5 comments · Fixed by #37151
Closed

Envoy crash using CDS with dynamic configuration from filesystem #34195

aojea opened this issue May 16, 2024 · 5 comments · Fixed by #37151
Labels
backport/review Request to backport to stable releases no stalebot Disables stalebot from closing an issue

Comments

@aojea
Copy link

aojea commented May 16, 2024

I have a crash when using CDS with UDP, it is completely reproducible on Kubernetes CI

How to reproduce

  1. Crete the cds.yaml and lds.yaml files
  2. Start envoy with the following configuration
node:
  cluster: cloud-provider-kind
  id: cloud-provider-kind-id

dynamic_resources:
  cds_config:
    resource_api_version: V3
    path: /home/envoy/cds.yaml
  lds_config:
    resource_api_version: V3
    path: /home/envoy/lds.yaml

admin:
  access_log_path: /dev/stdout
  address:
    socket_address:
      address: 0.0.0.0
      port_value: 9901
  1. Modify the cds.yaml and lds.yaml two times with the configurations below (changing only the ports)

Details

kubernetes/kubernetes#124729

Crashdump on CDS

[2024-05-16 07:11:34.464][1][info][upstream] [source/common/upstream/cds_api_helper.cc:32] cds: add 1 cluster(s), remove 0 cluster(s)
[2024-05-16 07:11:34.465][1][info][upstream] [source/common/upstream/cds_api_helper.cc:71] cds: added/updated 1 cluster(s), skipped 0 unmodified cluster(s)
[2024-05-16 07:11:43.483][101][critical][backtrace] [./source/server/backtrace.h:127] Caught Segmentation fault, suspect faulting address 0x18
[2024-05-16 07:11:43.483][101][critical][backtrace] [./source/server/backtrace.h:111] Backtrace (use tools/stack_decode.py to get line numbers):
[2024-05-16 07:11:43.483][101][critical][backtrace] [./source/server/backtrace.h:112] Envoy version: 816188b86a0a52095b116b107f576324082c7c02/1.30.1/Clean/RELEASE/BoringSSL
[2024-05-16 07:11:43.483][101][critical][backtrace] [./source/server/backtrace.h:114] Address mapping: 5585541ca000-558556b72000 /usr/local/bin/envoy
[2024-05-16 07:11:43.483][101][critical][backtrace] [./source/server/backtrace.h:121] #0: [0x7f9a5f4e1520]
[2024-05-16 07:11:43.483][101][critical][backtrace] [./source/server/backtrace.h:121] #1: [0x5585548d4caa]
[2024-05-16 07:11:43.483][101][critical][backtrace] [./source/server/backtrace.h:121] #2: [0x5585548d2bbf]
[2024-05-16 07:11:43.483][101][critical][backtrace] [./source/server/backtrace.h:121] #3: [0x5585561ed36d]
[2024-05-16 07:11:43.484][101][critical][backtrace] [./source/server/backtrace.h:121] #4: [0x5585561f1600]
[2024-05-16 07:11:43.484][101][critical][backtrace] [./source/server/backtrace.h:121] #5: [0x55855651bcc1]
[2024-05-16 07:11:43.484][101][critical][backtrace] [./source/server/backtrace.h:121] #6: [0x55855651998f]
[2024-05-16 07:11:43.484][101][critical][backtrace] [./source/server/backtrace.h:121] #7: [0x55855651ab2f]
[2024-05-16 07:11:43.484][101][critical][backtrace] [./source/server/backtrace.h:121] #8: [0x5585561f0fa6]
[2024-05-16 07:11:43.484][101][critical][backtrace] [./source/server/backtrace.h:121] #9: [0x5585561f0d4e]
[2024-05-16 07:11:43.484][101][critical][backtrace] [./source/server/backtrace.h:121] #10: [0x5585562ea081]
[2024-05-16 07:11:43.484][101][critical][backtrace] [./source/server/backtrace.h:121] #11: [0x5585562eb62d]
[2024-05-16 07:11:43.484][101][critical][backtrace] [./source/server/backtrace.h:121] #12: [0x55855653bd40]
[2024-05-16 07:11:43.484][101][critical][backtrace] [./source/server/backtrace.h:121] #13: [0x55855653a681]
[2024-05-16 07:11:43.484][101][critical][backtrace] [./source/server/backtrace.h:121] #14: [0x558555b79f9f]
[2024-05-16 07:11:43.484][101][critical][backtrace] [./source/server/backtrace.h:121] #15: [0x5585565b7d03]
[2024-05-16 07:11:43.484][101][critical][backtrace] [./source/server/backtrace.h:121] #16: [0x7f9a5f533ac3]

Config applied
LDS

resources:
- "@type": type.googleapis.com/envoy.config.listener.v3.Listener
  name: listener_IPv4_80_UDP
  address:
    socket_address:
      address: 0.0.0.0
      port_value: 80
      protocol: UDP
  udp_listener_config:
    downstream_socket_config:
      max_rx_datagram_size: 9000
  listener_filters:
  - name: envoy.filters.udp_listener.udp_proxy
    typed_config:
      '@type': type.googleapis.com/envoy.extensions.filters.udp.udp_proxy.v3.UdpProxyConfig
      access_log:
      - name: envoy.file_access_log
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
      stat_prefix: udp_proxy
      matcher:
        on_no_match:
          action:
            name: route
            typed_config:
              '@type': type.googleapis.com/envoy.extensions.filters.udp.udp_proxy.v3.Route
              cluster: cluster_IPv4_80_UDP
      upstream_socket_config:
        max_rx_datagram_size: 9000

CDS

resources:
- "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
  name: cluster_IPv4_80_UDP
  connect_timeout: 5s
  type: STATIC
  lb_policy: RANDOM
  health_checks:
  - timeout: 5s
    interval: 3s
    unhealthy_threshold: 2
    healthy_threshold: 1
    no_traffic_interval: 5s
    always_log_health_check_failures: true
    always_log_health_check_success: true
    event_log_path: /dev/stdout
    http_health_check:
      path: /healthz
  load_assignment:
    cluster_name: cluster_IPv4_80_UDP
    endpoints:
      - lb_endpoints:
        - endpoint:
            health_check_config:
              port_value: 10256
            address:
              socket_address:
                address: 192.168.8.4
                port_value: 32557
                protocol: UDP
      - lb_endpoints:
        - endpoint:
            health_check_config:
              port_value: 10256
            address:
              socket_address:
                address: 192.168.8.2
                port_value: 32557
                protocol: UDP

Originally posted by @aojea in #14866 (comment)

I've tried with all the latest stable images and it still crashes.

Found #33824 that seems related, but after trying with a dev image that should contain the fix the crash still happens

 9dce75e79df0   envoyproxy/envoy:dev-190f9e0cfe16e779f622c16dce8e833600e5fb45   "/docker-entrypoint.…"   About a minute ago   Exited (1) About a minute ago                               kindccm-FIVC6DYE7XD6AIEWUYSGG4EXHENYBGTPO4KYLJLD
@eranr
Copy link

eranr commented Jun 11, 2024

A seemingly related crash with additional repro information can be found here:
#26206

Copy link

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Jul 11, 2024
Copy link

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 18, 2024
@phlax phlax added no stalebot Disables stalebot from closing an issue and removed stale stalebot believes this issue/PR has not been touched recently labels Jul 18, 2024
@phlax phlax reopened this Jul 18, 2024
railwaycat added a commit to railwaycat/envoy that referenced this issue Oct 22, 2024
railwaycat added a commit to railwaycat/envoy that referenced this issue Nov 14, 2024
@aojea
Copy link
Author

aojea commented Jan 14, 2025

@howardjohn @keithmattix you are more familiar with this project than me, do you know what is the version that contains this fix?

I need this for cloud provider kind kubernetes-sigs/cloud-provider-kind#176

@phlax
Copy link
Member

phlax commented Jan 14, 2025

@aojea the fix is not in any published versions afaict - there is a release imminent that will contain the fix - we could also potentially backport the fix

/backport

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/review Request to backport to stable releases no stalebot Disables stalebot from closing an issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants