Redis 4.x.x instrumentation causes application to crash when watched keys change as errors are thrown twice #1708

FredrikAugust · 2023-10-02T09:09:24Z

What version of OpenTelemetry are you using?

Using the image otel/opentelemetry-collector-contrib:0.83.0

What version of Node are you using?

Node 18.17.1

What did you do?

We use the auto-instrumentation to instrument our application, and this correctly injects redis-4. We use the multi command throughout our application.

What did you expect to see?

When a watched key changes within a multi command we expect it to behave the same way as it would without the instrumentation.

What did you see instead?

If we enable the instrumentation, we see a message

no original function multi to wrap

Printed to the console when we start the application, and the application will crash when a watched key changes in a multi command. This does not happen if we disable the instrumentation.

/app/node_modules/@redis/client/dist/lib/multi-command.js:60
            throw new errors_1.WatchError();
                  ^

WatchError: One (or more) of the watched keys has been changed
    at RedisMultiCommand.handleExecReplies (/app/node_modules/@redis/client/dist/lib/multi-command.js:60:19)
    at Commander.exec (/app/node_modules/@redis/client/dist/lib/client/multi-command.js:81:82)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)

Additional context

#1672

This issue is related as it requests the ability to disable a single instrumentation, and it is where we first identified the issue.

The text was updated successfully, but these errors were encountered:

drob · 2023-10-02T22:57:07Z

Seeing the same behavior on node 16 as well.

FredrikAugust · 2023-10-04T14:32:22Z

The only two places we use .watch() in our backend, the code looks roughly like this:

try {
  await getClient().executeIsolated(async (client) => {
    await client.watch(`lock`);
  
    const owner = await client.get(`lock`);

    return await client.multi().set(`lock`, ..., { PXAT: ... }).exec();
  });
} catch (err) {
  if (err instanceof WatchError) throw new LockInUseError();
}

But our application crashes because of an uncaught WatchError error. I can't see any places in our application where this could occur, and sadly the stacktrace doesn't reveal where it's thrown from.

FredrikAugust · 2023-10-04T15:01:57Z

I managed to patch this in a fork, but I'm unsure if it's the right way to proceed.

In instrumentation.ts in the function _getPatchMultiCommandsExec there is a return of the execRes promise. As it stands right now, and I'm not quite sure how this works — errors from this promise are returned twice. That means that if you try to catch it in a test by provoking a watch error, you will see two responses.

To further illustrate this. If you change the delayed multi test in the test suite for this library to fail by watching and then modifying the key before the exec, mocha will register this as two tests, where one will fail and one will succeed. I honestly have no clue how this works, but again the proposed solution gets rid of this.

So a solution to our problem was to add .catch((err: Error) => {}) to the .then-chain of this execRes variable.

Thus we can catch the exception from running .exec on a multi chain normally. Let me know if this made any sense. I'd be happy to help as this is quite a severe problem for our product!:)

FredrikAugust · 2023-10-04T15:03:55Z

@blumamir do you have any insights into this? I see that you're a frequent contributor to this package:)

FredrikAugust · 2023-10-05T09:10:20Z

I've created a PR to more clearly show what fixes the issue for us. Would you be able to confirm it fixes it for you as well, @drob?

trentm · 2023-10-12T00:12:27Z

I think I have a fix. Details in a bit.

@pichlermarc I'm a member of the OTel org now. Should that mean I should be able to assign myself to this issue?

trentm · 2023-10-12T03:04:46Z

@FredrikAugust Thanks very much for your PR, it provides a repro and the source of the problem.

1. the crash

This is the wrapper for RedisClientMultiCommand.prototype.exec()

  private _getPatchMultiCommandsExec() {
    const plugin = this;
    return function execPatchWrapper(original: Function) {
      return function execPatch(this: any) {
        const execRes = original.apply(this, arguments);
        // ...
        execRes
          .then((redisRes: unknown[]) => {
            const openSpans = this[OTEL_OPEN_SPANS];
            // ...
            for (let i = 0; i < openSpans.length; i++) {
              // ...
              plugin._endSpanWithResponse(...)
          });

The issue is that if the execRes promise rejects, then there will be a thrown unhandledRejection. So the observed two errors are (a) the one from the returned execRes and (b) this one.

You are right that it can be suppressed, as you did in your patch by adding an empty .catch(() => {}). However, I think the right answer will be to close the open spans (marking them as failed). I'll open a PR to do that.

2. the `no original function multi to wrap` message

This is an unrelated issue.

This is a log message from the shimmer dep (https://github.com/othiym23/shimmer/blob/master/index.js#L32) used to wrap the RedisClient.prototype.multi method.
In https://github.com/redis/node-redis/pull/2324/files the RedisClient definition of MULTI and multi changed to this:

export default class RedisClient<...> {
    ...
    MULTI(): RedisClientMultiCommandType<M, F, S> {
        return new (this as any).Multi(
            this.multiExecutor.bind(this),
            this.#options?.legacyMode
        );
    }

    multi = this.MULTI;

the resultant built JS is:

class RedisClient extends events_1.EventEmitter {
    ...
    constructor(options) {
        super();
        ...
        Object.defineProperty(this, "multi", {
            enumerable: true,
            configurable: true,
            writable: true,
            value: this.MULTI
        });

So multi cannot be wrapped because it doesn't exist on the prototype, it is a property of the instance. I think wrapping can be guarded by a if (redisClientPrototype?.multi) { to avoid the log message. I'll open a separate PR for that.

…) handling The instrumentation was not handling a rejection of the promise from client.multi(), resulting in unended spans and an unhandleRejection event. Fixes: open-telemetry#1708

pichlermarc · 2023-10-12T08:35:27Z

I think I have a fix. Details in a bit.

@pichlermarc I'm a member of the OTel org now. Should that mean I should be able to assign myself to this issue?

I think only approvers and maintainers (users write-access to repo) can do that, I assigned you :)

FredrikAugust · 2023-10-12T10:29:48Z

@trentm Thanks a lot for the great explanations! I arrived at the same conclusions so happy to see that I wasn't completely lost:) Apologies for not updating the issue as I understood the problem(s) in greater detail.

As for the second issue I saw that it will change between "multi" and "MULTI" depending on what redis SDK version you're running. This can be observed by running the test suite for this repo. So I assume it's trying to patch both of the commands (even though they're the same?) and soft-fails when it can't find the "other"?

trentm · 2023-10-12T15:11:08Z

So I assume it's trying to patch both of the commands (even though they're the same?) and soft-fails when it can't find the "other"?

Yes, exactly. In #1729 I change it to only attempt to patch multi or MULTI if it actually exists on the prototype -- so that should work for all @redis/client versions.

FredrikAugust · 2023-10-16T07:27:30Z

Thanks for the great work, @trentm!<3

drob · 2023-10-17T22:26:08Z

I only had the no original function multi to wrap log spam, not the crash. Confirming that it's fixed for me in @opentelemetry/instrumentation-redis-4 v0.35.3.

Thank you, @trentm!!

FredrikAugust added the bug Something isn't working label Oct 2, 2023

FredrikAugust mentioned this issue Oct 3, 2023

Manually instrumented traces aren't generated, but auto-instrumented ones are open-telemetry/opentelemetry-js#4180

Closed

FredrikAugust changed the title ~~Redis-4 instrumentation causes application to crash when multi command soft-fails~~ Redis 4.x.x instrumentation causes application to crash when watched keys change as errors are thrown twice Oct 4, 2023

FredrikAugust mentioned this issue Oct 5, 2023

fix(instrumentation-redis-4): multi.exec patch should only throw error once #1717

Closed

pichlermarc added priority:p1 Bugs which cause problems in end-user applications such as crashes, data inconsistencies pkg:instrumentation-redis-4 labels Oct 11, 2023

trentm mentioned this issue Oct 12, 2023

fix(instrumentation-redis-4): avoid shimmer warning by only wrapping multi/MULTI if they exist #1729

Merged

trentm mentioned this issue Oct 12, 2023

fix(instrumentation-redis-4): fix unhandledRejection in client.multi(...) handling #1730

Merged

pichlermarc assigned trentm Oct 12, 2023

pichlermarc added the has:reproducer This bug/feature has a minimal reproduction provided label Oct 12, 2023

pichlermarc closed this as completed in #1730 Oct 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Redis 4.x.x instrumentation causes application to crash when watched keys change as errors are thrown twice #1708

Redis 4.x.x instrumentation causes application to crash when watched keys change as errors are thrown twice #1708

FredrikAugust commented Oct 2, 2023

drob commented Oct 2, 2023

Uh oh!

FredrikAugust commented Oct 4, 2023 •

edited

Loading

Uh oh!

FredrikAugust commented Oct 4, 2023 •

edited

Loading

Uh oh!

FredrikAugust commented Oct 4, 2023

Uh oh!

FredrikAugust commented Oct 5, 2023

Uh oh!

trentm commented Oct 12, 2023

Uh oh!

trentm commented Oct 12, 2023

Uh oh!

pichlermarc commented Oct 12, 2023

Uh oh!

FredrikAugust commented Oct 12, 2023

Uh oh!

trentm commented Oct 12, 2023

Uh oh!

FredrikAugust commented Oct 16, 2023

Uh oh!

drob commented Oct 17, 2023

Uh oh!

Redis 4.x.x instrumentation causes application to crash when watched keys change as errors are thrown twice #1708

Redis 4.x.x instrumentation causes application to crash when watched keys change as errors are thrown twice #1708

Comments

FredrikAugust commented Oct 2, 2023

What version of OpenTelemetry are you using?

What version of Node are you using?

What did you do?

What did you expect to see?

What did you see instead?

Additional context

drob commented Oct 2, 2023

Uh oh!

FredrikAugust commented Oct 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FredrikAugust commented Oct 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FredrikAugust commented Oct 4, 2023

Uh oh!

FredrikAugust commented Oct 5, 2023

Uh oh!

trentm commented Oct 12, 2023

Uh oh!

trentm commented Oct 12, 2023

1. the crash

2. the no original function multi to wrap message

Uh oh!

pichlermarc commented Oct 12, 2023

Uh oh!

FredrikAugust commented Oct 12, 2023

Uh oh!

trentm commented Oct 12, 2023

Uh oh!

FredrikAugust commented Oct 16, 2023

Uh oh!

drob commented Oct 17, 2023

Uh oh!

FredrikAugust commented Oct 4, 2023 •

edited

Loading

FredrikAugust commented Oct 4, 2023 •

edited

Loading

2. the `no original function multi to wrap` message