Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helm chart fails to create collector deployment and initContainer image is missing #2269

Closed
FredrikAugust opened this issue Oct 24, 2023 · 9 comments
Labels
bug Something isn't working needs triage

Comments

@FredrikAugust
Copy link

Component(s)

operator, collector, instrumentation

What happened?

Description

When syncing the helm chart for the operator (latest version: v0.41.0) it fails to create the collector deployment. I've attached logs from the operator — it appears the operator sees nothing wrong.

When I try to delete the collector CR from ArgoCD it starts a very strange loop where it will create/destroy sub-resources of the collector very rapidly. The opentelemetry-opeartor will create k8s events:

failed to create objects for collector: Operation cannot be fulfilled on deployments.apps "collector-collector": the object has been modified; please apply your changes to the latest version and try again

When deleting the collector resource manually from kubernetes using kubectl/Lens IDE and re-syncing it with ArgoCD it creates the deployment successfully.

This is only half the problem though, and I'm creating one issue as I assume they're related due to the fix being the same.

The second problem is that when I have the operator and collector running and create a new pod with the autoinstrumentation annotations it will fail to create the pod with the message:

Error creating: Pod "************" is invalid: spec.initContainers[1].image: Required value

This is once again fixed by deleting and recreating the instrumentation CR. After this it works as expected.

Expected Result

The deployment should be created automatically, and instrumentation should inject correct image tag.

Kubernetes Version

1.26.5-gke.2100

Operator version

b6c75a

Collector version

0.87.0

Environment information

Environment

  • ArgoCD v2.8.4 for GitOps
  • linkerd 2.14 for service mesh

Log output

{"level":"info","ts":"2023-10-24T10:52:28Z","logger":"controllers.OpenTelemetryCollector","msg":"no upgrade routines are needed for the OpenTelemetry instance","name":"collector","namespace":"otel","version":"0.87.0","latest":"0.61.0"}
{"level":"info","ts":"2023-10-24T10:52:28Z","logger":"controllers.OpenTelemetryCollector","msg":"skipping upgrade for OpenTelemetry Collector instance","name":"collector","namespace":"otel"}

Additional context

No response

@FredrikAugust FredrikAugust added bug Something isn't working needs triage labels Oct 24, 2023
@jaronoff97
Copy link
Contributor

It's possible this is related to the issue my PR fixed here; which operator version did you upgrade to? What probably happened is that you didn't upgrade the collector CRD correctly. See here for why this can be thorny

@FredrikAugust
Copy link
Author

FredrikAugust commented Oct 26, 2023

@jaronoff97 Hi! Sorry, forgot to respond. We upgraded to helm chart v0.41.0 which should correspond to v0.87.0. I also tried to remove the Instrumentation and Collector CRDs after deleting the Helm deployment.

@jaronoff97
Copy link
Contributor

You said that deleting and recreating the instrumentation CR made this work... can you re-iterate the current issue for me? I'm a bit lost with this one

@yuriolisa
Copy link
Contributor

@FredrikAugust, did you have the chance to perform what @jaronoff97 suggested?

@FredrikAugust
Copy link
Author

@yuriolisa Hi. I sadly don't remember what I did to fix it, but it works now. I don't know if it's still a problem, sadly.

@jaronoff97
Copy link
Contributor

closing for now... please re-open if you see this again. Thanks!

@hfranz-gebit
Copy link

hfranz-gebit commented Oct 1, 2024

FYI: I ran into the same problem today (initContainer image missing).
Using operator helm ccart 0.58.0, operator 0.107.0, autoinstrumentation-java:2.7.0
This was the original broken instrumentation:

>k get instrumentation -o yaml
apiVersion: v1
items:
- apiVersion: opentelemetry.io/v1alpha1
  kind: Instrumentation
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"opentelemetry.io/v1alpha1","kind":"Instrumentation","metadata":{"annotations":{},"labels":{"app.kubernetes.io/instance":"otel-instrumentations"},"name":"ocs-instrumentation","namespace":"observability"},"spec":{"exporter":{"endpoint":"http://otel-collector-collector.observability:4318"},"propagators":["tracecontext","baggage"],"sampler":{"argument":"1","type":"parentbased_traceidratio"}}}
    creationTimestamp: "2024-10-01T10:45:41Z"
    generation: 1
    labels:
      app.kubernetes.io/instance: otel-instrumentations
    name: ocs-instrumentation
    namespace: observability
    resourceVersion: "109189638"
    uid: 00288096-f00b-4f47-86cc-ae08e846a861
  spec:
    exporter:
      endpoint: http://otel-collector-collector.observability:4318
    propagators:
    - tracecontext
    - baggage
    sampler:
      argument: "1"
      type: parentbased_traceidratio
kind: List
metadata:
  resourceVersion: ""

After deletion/recreation the instrumentation contains the required image specs

apiVersion: v1
items:
- apiVersion: opentelemetry.io/v1alpha1
  kind: Instrumentation
  metadata:
    annotations:
      instrumentation.opentelemetry.io/default-auto-instrumentation-apache-httpd-image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-apache-httpd:1.0.4
      instrumentation.opentelemetry.io/default-auto-instrumentation-dotnet-image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-dotnet:1.2.0
      instrumentation.opentelemetry.io/default-auto-instrumentation-go-image: ghcr.io/open-telemetry/opentelemetry-go-instrumentation/autoinstrumentation-go:v0.14.0-alpha
      instrumentation.opentelemetry.io/default-auto-instrumentation-java-image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:2.7.0
      instrumentation.opentelemetry.io/default-auto-instrumentation-nginx-image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-apache-httpd:1.0.4
      instrumentation.opentelemetry.io/default-auto-instrumentation-nodejs-image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:0.52.1
      instrumentation.opentelemetry.io/default-auto-instrumentation-python-image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:0.47b0
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"opentelemetry.io/v1alpha1","kind":"Instrumentation","metadata":{"annotations":{},"labels":{"app.kubernetes.io/instance":"otel-instrumentations"},"name":"ocs-instrumentation","namespace":"observability"},"spec":{"exporter":{"endpoint":"http://otel-collector-collector.observability:4318"},"propagators":["tracecontext","baggage"],"sampler":{"argument":"1","type":"parentbased_traceidratio"}}}
    creationTimestamp: "2024-10-01T14:30:13Z"
    generation: 1
    labels:
      app.kubernetes.io/instance: otel-instrumentations
      app.kubernetes.io/managed-by: opentelemetry-operator
    name: ocs-instrumentation
    namespace: observability
    resourceVersion: "109280943"
    uid: bc1f5551-c3b3-46b6-9da0-9eef4e2248bb
  spec:
    apacheHttpd:
      configPath: /usr/local/apache2/conf
      image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-apache-httpd:1.0.4
      resourceRequirements:
        limits:
          cpu: 500m
          memory: 128Mi
        requests:
          cpu: 1m
          memory: 128Mi
      version: "2.4"
    dotnet:
      image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-dotnet:1.2.0
      resourceRequirements:
        limits:
          cpu: 500m
          memory: 128Mi
        requests:
          cpu: 50m
          memory: 128Mi
    exporter:
      endpoint: http://otel-collector-collector.observability:4318
    go:
      image: ghcr.io/open-telemetry/opentelemetry-go-instrumentation/autoinstrumentation-go:v0.14.0-alpha
      resourceRequirements:
        limits:
          cpu: 500m
          memory: 32Mi
        requests:
          cpu: 50m
          memory: 32Mi
    java:
      image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:2.7.0
      resources:
        limits:
          cpu: 500m
          memory: 64Mi
        requests:
          cpu: 50m
          memory: 64Mi
    nginx:
      configFile: /etc/nginx/nginx.conf
      image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-apache-httpd:1.0.4
      resourceRequirements:
        limits:
          cpu: 500m
          memory: 128Mi
        requests:
          cpu: 1m
          memory: 128Mi
    nodejs:
      image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:0.52.1
      resourceRequirements:
        limits:
          cpu: 500m
          memory: 128Mi
        requests:
          cpu: 50m
          memory: 128Mi
    propagators:
    - tracecontext
    - baggage
    python:
      image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:0.47b0
      resourceRequirements:
        limits:
          cpu: 500m
          memory: 32Mi
        requests:
          cpu: 50m
          memory: 32Mi
    resource: {}
    sampler:
      argument: "1"
      type: parentbased_traceidratio
kind: List
metadata:
  resourceVersion: ""

It might be a timing issue (I have deployed operator, collector and instrumentation with argocd as an app-of-apps, without e.g. syncwave adjustments)

@jaronoff97
Copy link
Contributor

@hfranz-gebit this could because of a bug I fixed here #3074... which is in release 0.108.0, if you upgrade to latest this shouldn't be an issue anymore (i hope!)

@hfranz-gebit
Copy link

@jaronoff97 Great! I'll test with the new release a few times in a dev environment!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage
Projects
None yet
Development

No branches or pull requests

4 participants