Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manually instrumented traces aren't generated, but auto-instrumented ones are #4180

Closed
FredrikAugust opened this issue Oct 3, 2023 · 2 comments
Labels
bug Something isn't working triage

Comments

@FredrikAugust
Copy link

What happened?

Steps to Reproduce

We're currently using the Instrumentation CRD from the Kubernetes Operator to inject the OpenTelemetry bootstrap code. The code for that looks like this:

apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: init-instrumentation
  namespace: otel
spec:
  exporter:
    endpoint: http://collector-collector.otel.svc.cluster.local:4317
  propagators:
    - b3
  nodejs:
    env:
      - name: OTEL_SERVICE_NAME
        value: "$(APP)"
      - name: OTEL_METRICS_EXPORTER
        value: "none"
      - name: OTEL_NODE_RESOURCE_DETECTORS
        value: "env,host,os,process,container,gcp"

Nothing special to see here.

For the kubernetes deployment (or Argo Rollouts in our case), we have the following annotations:

instrumentation.opentelemetry.io/inject-nodejs: "otel/init-instrumentation"
instrumentation.opentelemetry.io/container-names: "{{ .Chart.Name }}"

This successfully picks up our Redis-4 and Google Datastore (and more) SDKs and generates traces without any problems for these. Except for a bug with Redis-4 I've outlined in another issue in the contrib repo this works fine all the way from generating the traces to shipping them to the collector.

We also generate our own traces using the @opentelemetry/api package, and have been doing for a couple of months, but a couple of weeks ago they stopped working (dates are provided in additional details). The code we use to generate these spans look like this:

    return api.trace
            .getTracer("command-processor")
            .startActiveSpan(`command-handler:${commandName}`, async (span) => {
                try {
                    /* ... */
                } catch (err) {
                    /* ... */
                } finally {
                    span.end();
                }
            });

We know the code is being run as the result from this is being used elsewhere (and frankly it's the core of our application;).

I've tried enabling the debug logs for the instrumentation, but I still see zero trace (no pun intended) of the traces.

Frankly I've been trying to debug this for a while now, but I'm absolutely clueless as to what could have caused this.

Expected Result

Just like the auto-instrumented traces, the manually instrumented ones should work fine.

Actual Result

None of the manually generated traces are generated.

Additional Details

We first observed the problem in our staging environment the 15th of September. This environment is generally speaking updated every day so if the problem was introduced through an upgrade it would likely have been published around this date.

The only change to our code which could affect this was that we upgraded from @opentelemetry/api 1.4.1 to 1.6.0 this day, but reverting this change does not seem to fix it.

We deployed to production a couple of days later, and that seemed to introduce the problem there as well. Could this be because a new deployment would trigger the auto-instrumentation to fetch the latest version?

OpenTelemetry Setup Code

"use strict";
Object.defineProperty(exports, "__esModule", { value: true });
const auto_instrumentations_node_1 = require("@opentelemetry/auto-instrumentations-node");
const exporter_trace_otlp_grpc_1 = require("@opentelemetry/exporter-trace-otlp-grpc");
const exporter_metrics_otlp_grpc_1 = require("@opentelemetry/exporter-metrics-otlp-grpc");
const exporter_prometheus_1 = require("@opentelemetry/exporter-prometheus");
const sdk_metrics_1 = require("@opentelemetry/sdk-metrics");
const resource_detector_alibaba_cloud_1 = require("@opentelemetry/resource-detector-alibaba-cloud");
const resource_detector_aws_1 = require("@opentelemetry/resource-detector-aws");
const resource_detector_container_1 = require("@opentelemetry/resource-detector-container");
const resource_detector_gcp_1 = require("@opentelemetry/resource-detector-gcp");
const resources_1 = require("@opentelemetry/resources");
const api_1 = require("@opentelemetry/api");
const sdk_node_1 = require("@opentelemetry/sdk-node");
function getMetricReader() {
    switch (process.env.OTEL_METRICS_EXPORTER) {
        case undefined:
        case '':
        case 'otlp':
            api_1.diag.info('using otel metrics exporter');
            return new sdk_metrics_1.PeriodicExportingMetricReader({
                exporter: new exporter_metrics_otlp_grpc_1.OTLPMetricExporter(),
            });
        case 'prometheus':
            api_1.diag.info('using prometheus metrics exporter');
            return new exporter_prometheus_1.PrometheusExporter({});
        case 'none':
            api_1.diag.info('disabling metrics reader');
            return undefined;
        default:
            throw Error(`no valid option for OTEL_METRICS_EXPORTER: ${process.env.OTEL_METRICS_EXPORTER}`);
    }
}
const sdk = new sdk_node_1.NodeSDK({
    autoDetectResources: true,
    instrumentations: [(0, auto_instrumentations_node_1.getNodeAutoInstrumentations)()],
    traceExporter: new exporter_trace_otlp_grpc_1.OTLPTraceExporter(),
    metricReader: getMetricReader(),
    resourceDetectors: [
        // Standard resource detectors.
        resource_detector_container_1.containerDetector,
        resources_1.envDetector,
        resources_1.hostDetector,
        resources_1.osDetector,
        resources_1.processDetector,
        // Cloud resource detectors.
        resource_detector_alibaba_cloud_1.alibabaCloudEcsDetector,
        // Ordered AWS Resource Detectors as per:
        // https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/resourcedetectionprocessor/README.md#ordering
        resource_detector_aws_1.awsEksDetector,
        resource_detector_aws_1.awsEc2Detector,
        resource_detector_gcp_1.gcpDetector,
    ],
});
sdk.start();
//# sourceMappingURL=autoinstrumentation.js.maproot@init-workflow-64856466c-2jg5r:/otel-auto-instrumentation#

package.json

As we're using the kubernetes injection method I don't think this is relevant. 

The api version is `1.6.0` as mentioned above.

Relevant log output

No response

@FredrikAugust FredrikAugust added bug Something isn't working triage labels Oct 3, 2023
@FredrikAugust
Copy link
Author

I tried reverting to an old build and that appears to work so I suppose this is caused by us. Will close for now.

@nrichardson-akasa
Copy link

@FredrikAugust
I'm getting the same issue on the latest version. Have you re-upgraded and seen the issue fixed or found a workaround?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

2 participants