- 
                Notifications
    
You must be signed in to change notification settings  - Fork 1.2k
 
Description
In what area(s)?
/area autoscale
What version of Knative?
1.13.1
Expected Behavior
Knative Service Pod and Queue Proxy respect the revision timeout passed.
Actual Behavior
Knative queue-proxy throws error:
{"severity":"ERROR","timestamp":"2024-06-25T23:47:00.022775794Z","logger":"queueproxy","caller":"network/error_handler.go:33","message":"error reverse proxying request; sockstat: sockets: used 13\nTCP: inuse 3 orphan 4 tw 14 alloc 2483 mem 309\nUDP: inuse 0 mem 8\nUDPLITE: inuse 0\nRAW: inuse 0\nFRAG: inuse 0 memory 0\n","commit":"41769de","knative.dev/key":"test/plugin-github-00008","knative.dev/pod":"plugin-github-00008-deployment-6749cf875b-v9dhz","error":"context canceled","stacktrace":"knative.dev/pkg/network.ErrorHandler.func1\n\tknative.dev/[email protected]/network/error_handler.go:33\nnet/http/httputil.(*ReverseProxy).ServeHTTP\n\tnet/http/httputil/reverseproxy.go:475\nknative.dev/serving/pkg/queue.(*appRequestMetricsHandler).ServeHTTP\n\tknative.dev/serving/pkg/queue/request_metric.go:199\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ProxyHandler.func3\n\tknative.dev/serving/pkg/queue/handler.go:76\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2136\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ForwardedShimHandler.func4\n\tknative.dev/serving/pkg/queue/forwarded_shim.go:54\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2136\nknative.dev/serving/pkg/http/handler.(*timeoutHandler).ServeHTTP.func4\n\tknative.dev/serving/pkg/http/handler/timeout.go:118"}
This causes an endless amount of retries resulting in:
plugin-github-00008-deployment-6749cf875b-jm55n                   1/2     Terminating         0               123m
plugin-github-00008-deployment-6749cf875b-lbpsf                   1/2     Terminating         0               5h16m
plugin-github-00008-deployment-6749cf875b-qxnhz                   1/2     Terminating         0               157m
plugin-github-00008-deployment-6749cf875b-rpm5g                   1/2     Terminating         0               3h58m
plugin-github-00008-deployment-6749cf875b-v9dhz                   2/2     Running             0               15m
# curl --location 'http://plugin-github-test.test.svc.cluster.local' \
--header 'Ce-Id: replay-189098f' \
--header 'Ce-Specversion: 1.0' \
--header 'Ce-Type: attestation.test' \
--header 'Ce-Source: ci.commit.codereview.test' \
--header 'Content-Type: application/json' \
--data '{}'> > > > > > 
activator request timeout# 
a bunch of pods that have to be manually deleted.
Steps to Reproduce the Problem
Apply the following configmap:
apiVersion: v1
data:
  enable-service-links: "false"
  max-revision-timeout-seconds: "604800"
  revision-idle-timeout-seconds: "0"
  revision-response-start-timeout-seconds: "600"
  revision-timeout-seconds: "18500"
kind: ConfigMap
metadata:
  annotations:
    knative.dev/example-checksum: e7973912
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","data":{"enable-service-links":"false","max-revision-timeout-seconds":"604800","revision-idle-timeout-seconds":"0","revision-response-start-timeout-seconds":"600","revision-timeout-seconds":"18500"},"kind":"ConfigMap","metadata":{"annotations":{"knative.dev/example-checksum":"e7973912"},"creationTimestamp":"2023-11-28T16:59:46Z","labels":{"app.kubernetes.io/component":"controller","app.kubernetes.io/name":"knative-serving","app.kubernetes.io/version":"1.13.1"},"name":"config-defaults","namespace":"knative-serving","resourceVersion":"488776669","uid":"2496e813-166c-4756-aa7f-f1a83f946cdd"}}
  creationTimestamp: "2023-11-28T16:59:46Z"
  labels:
    app.kubernetes.io/component: controller
    app.kubernetes.io/name: knative-serving
    app.kubernetes.io/version: 1.13.1
  name: config-defaults
  namespace: knative-serving
  resourceVersion: "488803732"
  uid: 2496e813-166c-4756-aa7f-f1a83f946cdd
Restart knative serving
kubectl delete pods -n knative-serving --all
Apply a new knative service, that has a delay in it
Apply a new trigger reading from a broker
Send a request to that broker, that will activate the trigger:
logs from service
Defaulted container "github-plugin" out of: github-plugin, queue-proxy
2024/06/26 00:58:16 /var/run/ko
2024/06/26 00:58:16 /var/run/ko
Using config file: /var/run/ko/.config.yaml
time="2024-06-26T00:58:17Z" level=info msg="beginnig to sleep."
time="2024-06-26T01:08:17Z" level=info msg="beginnig to sleep."
logs from queue proxy
{"severity":"ERROR","timestamp":"2024-06-26T01:08:15.397562115Z","logger":"queueproxy","caller":"network/error_handler.go:33","message":"error reverse proxying request; sockstat: sockets: used 9\nTCP: inuse 1 orphan 2 tw 12 alloc 207 mem 9\nUDP: inuse 0 mem 0\nUDPLITE: inuse 0\nRAW: inuse 0\nFRAG: inuse 0 memory 0\n","commit":"41769de","knative.dev/key":"test/plugin-github-test-00001","knative.dev/pod":"plugin-github-test-00001-deployment-76cd4695bf-nfzch","error":"context canceled","stacktrace":"knative.dev/pkg/network.ErrorHandler.func1\n\tknative.dev/[email protected]/network/error_handler.go:33\nnet/http/httputil.(*ReverseProxy).ServeHTTP\n\tnet/http/httputil/reverseproxy.go:475\nknative.dev/serving/pkg/queue.(*appRequestMetricsHandler).ServeHTTP\n\tknative.dev/serving/pkg/queue/request_metric.go:199\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ProxyHandler.func3\n\tknative.dev/serving/pkg/queue/handler.go:76\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2136\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ForwardedShimHandler.func4\n\tknative.dev/serving/pkg/queue/forwarded_shim.go:54\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2136\nknative.dev/serving/pkg/http/handler.(*timeoutHandler).ServeHTTP.func4\n\tknative.dev/serving/pkg/http/handler/timeout.go:118"}
timeout difference: 2024-06-26T00:58:17Z vs 2024-06-26T01:08:15.397562115Z
indicating a 10 minute timeout.
error from activator:
{"severity":"ERROR","timestamp":"2024-06-26T01:08:15.397202332Z","logger":"activator","caller":"network/error_handler.go:33","message":"error reverse proxying request; sockstat: sockets: used 146\nTCP: inuse 8 orphan 2 tw 99 alloc 2264 mem 285\nUDP: inuse 0 mem 8\nUDPLITE: inuse 0\nRAW: inuse 0\nFRAG: inuse 0 memory 0\n","commit":"41769de","knative.dev/controller":"activator","knative.dev/pod":"activator-b7789d69d-lcwb7","knative.dev/key":"test/plugin-github-test-00001","error":"context canceled","stacktrace":"knative.dev/pkg/network.ErrorHandler.func1\n\tknative.dev/[email protected]/network/error_handler.go:33\nknative.dev/serving/pkg/activator/handler.(*activationHandler).proxyRequest.func1\n\tknative.dev/serving/pkg/activator/handler/handler.go:141\nnet/http/httputil.(*ReverseProxy).ServeHTTP\n\tnet/http/httputil/reverseproxy.go:475\nknative.dev/serving/pkg/activator/handler.(*activationHandler).proxyRequest\n\tknative.dev/serving/pkg/activator/handler/handler.go:144\nknative.dev/serving/pkg/activator/handler.(*activationHandler).ServeHTTP.func1\n\tknative.dev/serving/pkg/activator/handler/handler.go:97\nknative.dev/serving/pkg/activator/net.(*revisionThrottler).try.func1\n\tknative.dev/serving/pkg/activator/net/throttler.go:237\nknative.dev/serving/pkg/activator/net.(*infiniteBreaker).Maybe\n\tknative.dev/serving/pkg/activator/net/throttler.go:767\nknative.dev/serving/pkg/activator/net.(*revisionThrottler).try\n\tknative.dev/serving/pkg/activator/net/throttler.go:227\nknative.dev/serving/pkg/activator/net.(*Throttler).Try\n\tknative.dev/serving/pkg/activator/net/throttler.go:522\nknative.dev/serving/pkg/activator/handler.(*activationHandler).ServeHTTP\n\tknative.dev/serving/pkg/activator/handler/handler.go:90\nknative.dev/serving/pkg/http/handler.(*timeoutHandler).ServeHTTP.func4\n\tknative.dev/serving/pkg/http/handler/timeout.go:118"}
activator termination
terminationGracePeriodSeconds: 604800
line of code in activator that starts timeout:
activator request timeou
seems to be referencing: https://github.com/knative/serving/blob/main/pkg/apis/config/defaults.go#L45
(the max timeout) instead of the specified max timeout from me.
More Validation
My knative service spec:
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  annotations:
  creationTimestamp: "2024-06-25T02:32:13Z"
  generation: 8
  labels:
    app.fianu.io/type: occurrence
    app.kubernetes.io/component: adapter
    app.kubernetes.io/instance: github
    app.kubernetes.io/managed-by: fianu-controller
    app.kubernetes.io/name: plugin
    app.kubernetes.io/part-of: fianu
    argocd.argoproj.io/compare-options: IgnoreExtraneous
    destinations.fianu.io/broker: fianu-default
    networking.knative.dev/visibility: cluster-local
  name: plugin-github-test
  namespace: test
spec:
  template:
    metadata:
      annotations:
        argocd.argoproj.io/compare-options: IgnoreExtraneous
        sidecar.istio.io/proxyCPU: 50m
        sidecar.istio.io/proxyCPULimit: 150m
        sidecar.istio.io/proxyMemory: 128Mi
        sidecar.istio.io/proxyMemoryLimit: 256Mi
      creationTimestamp: null
    spec:
      containerConcurrency: 0
      containers:
      - env:
        - name: FIANU_PLATFORM_SEMANTIC_VERSION
          valueFrom:
            configMapKeyRef:
              key: fianu_semantic_version
              name: fianu-hosts
        - configMapRef:
            name: fianu-hosts-env
        image: docker.io
        imagePullPolicy: Always
        name: github-plugin
        readinessProbe:
          successThreshold: 1
          tcpSocket:
            port: 0
        resources: {}
        securityContext: {}
      responseStartTimeoutSeconds: 600
      serviceAccountName: github
      timeoutSeconds: 18500timeoutSeconds doesn't seem to be respected.
I also built the activator from source, and added loggin:
Is what I added:
log.Println("making request with values ", time.Duration(*rev.Spec.TimeoutSeconds)*time.Second, responseStartTimeout, idleTimeout)From Broker:
2024/06/26 01:46:14 makign request with values  168h0m0s 10m0s 0s
With Curl
2024/06/26 01:47:53 makign request with values  168h0m0s 10m0s 0s
So, it looks like the timeout is being set properly from the spec.TimeoutSeconds
My only thoughts now, are is my delivery timeout not being respected:
spec:
  broker: default
  delivery:
    timeout: PT2H
    retry: 1
  filter:
    attributes:
      source: ci.commit.codereview.test
      type: attestation.test
  subscriber:
    ref:
      apiVersion: serving.knative.dev/v1
      kind: Service
      name: plugin-github-test
      namespace: testBut this does not make sense. Because I reproduce with curl:
# curl --location 'http://plugin-github-test.test.svc.cluster.local' \
--header 'Ce-Id: replay-189098f' \
--header 'Ce-Specversion: 1.0' \
--header 'Ce-Type: attestation.test' \
--header 'Ce-Source: ci.commit.codereview.test' \
--header 'Content-Type: application/json' \
--data '{}'> > > > > > 
activator request timeout# 
which wouldn't even go through kafka, and I get the same timeout.
Is there something I need to do on the kafka side?