Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redis 4.x.x instrumentation causes application to crash when watched keys change as errors are thrown twice #1708

Closed
FredrikAugust opened this issue Oct 2, 2023 · 12 comments · Fixed by #1730
Assignees
Labels
bug Something isn't working has:reproducer This bug/feature has a minimal reproduction provided pkg:instrumentation-redis-4 priority:p1 Bugs which cause problems in end-user applications such as crashes, data inconsistencies

Comments

@FredrikAugust
Copy link

What version of OpenTelemetry are you using?

Using the image otel/opentelemetry-collector-contrib:0.83.0

What version of Node are you using?

Node 18.17.1

What did you do?

We use the auto-instrumentation to instrument our application, and this correctly injects redis-4. We use the multi command throughout our application.

What did you expect to see?

When a watched key changes within a multi command we expect it to behave the same way as it would without the instrumentation.

What did you see instead?

If we enable the instrumentation, we see a message

no original function multi to wrap

Printed to the console when we start the application, and the application will crash when a watched key changes in a multi command. This does not happen if we disable the instrumentation.

/app/node_modules/@redis/client/dist/lib/multi-command.js:60
            throw new errors_1.WatchError();
                  ^

WatchError: One (or more) of the watched keys has been changed
    at RedisMultiCommand.handleExecReplies (/app/node_modules/@redis/client/dist/lib/multi-command.js:60:19)
    at Commander.exec (/app/node_modules/@redis/client/dist/lib/client/multi-command.js:81:82)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)

Additional context

#1672

This issue is related as it requests the ability to disable a single instrumentation, and it is where we first identified the issue.

@FredrikAugust FredrikAugust added the bug Something isn't working label Oct 2, 2023
@drob
Copy link

drob commented Oct 2, 2023

Seeing the same behavior on node 16 as well.

@FredrikAugust
Copy link
Author

FredrikAugust commented Oct 4, 2023

The only two places we use .watch() in our backend, the code looks roughly like this:

try {
  await getClient().executeIsolated(async (client) => {
    await client.watch(`lock`);
  
    const owner = await client.get(`lock`);

    return await client.multi().set(`lock`, ..., { PXAT: ... }).exec();
  });
} catch (err) {
  if (err instanceof WatchError) throw new LockInUseError();
}

But our application crashes because of an uncaught WatchError error. I can't see any places in our application where this could occur, and sadly the stacktrace doesn't reveal where it's thrown from.

@FredrikAugust
Copy link
Author

FredrikAugust commented Oct 4, 2023

I managed to patch this in a fork, but I'm unsure if it's the right way to proceed.

In instrumentation.ts in the function _getPatchMultiCommandsExec there is a return of the execRes promise. As it stands right now, and I'm not quite sure how this works — errors from this promise are returned twice. That means that if you try to catch it in a test by provoking a watch error, you will see two responses.

To further illustrate this. If you change the delayed multi test in the test suite for this library to fail by watching and then modifying the key before the exec, mocha will register this as two tests, where one will fail and one will succeed. I honestly have no clue how this works, but again the proposed solution gets rid of this.

So a solution to our problem was to add .catch((err: Error) => {}) to the .then-chain of this execRes variable.

Thus we can catch the exception from running .exec on a multi chain normally. Let me know if this made any sense. I'd be happy to help as this is quite a severe problem for our product!:)

@FredrikAugust FredrikAugust changed the title Redis-4 instrumentation causes application to crash when multi command soft-fails Redis 4.x.x instrumentation causes application to crash when watched keys change as errors are thrown twice Oct 4, 2023
@FredrikAugust
Copy link
Author

@blumamir do you have any insights into this? I see that you're a frequent contributor to this package:)

@FredrikAugust
Copy link
Author

I've created a PR to more clearly show what fixes the issue for us. Would you be able to confirm it fixes it for you as well, @drob?

@pichlermarc pichlermarc added priority:p1 Bugs which cause problems in end-user applications such as crashes, data inconsistencies pkg:instrumentation-redis-4 labels Oct 11, 2023
@trentm
Copy link
Contributor

trentm commented Oct 12, 2023

I think I have a fix. Details in a bit.

@pichlermarc I'm a member of the OTel org now. Should that mean I should be able to assign myself to this issue?

@trentm
Copy link
Contributor

trentm commented Oct 12, 2023

@FredrikAugust Thanks very much for your PR, it provides a repro and the source of the problem.

1. the crash

This is the wrapper for RedisClientMultiCommand.prototype.exec()

  private _getPatchMultiCommandsExec() {
    const plugin = this;
    return function execPatchWrapper(original: Function) {
      return function execPatch(this: any) {
        const execRes = original.apply(this, arguments);
        // ...
        execRes
          .then((redisRes: unknown[]) => {
            const openSpans = this[OTEL_OPEN_SPANS];
            // ...
            for (let i = 0; i < openSpans.length; i++) {
              // ...
              plugin._endSpanWithResponse(...)
          });

The issue is that if the execRes promise rejects, then there will be a thrown unhandledRejection. So the observed two errors are (a) the one from the returned execRes and (b) this one.

You are right that it can be suppressed, as you did in your patch by adding an empty .catch(() => {}). However, I think the right answer will be to close the open spans (marking them as failed). I'll open a PR to do that.

2. the no original function multi to wrap message

This is an unrelated issue.

This is a log message from the shimmer dep (https://github.com/othiym23/shimmer/blob/master/index.js#L32) used to wrap the RedisClient.prototype.multi method.
In https://github.com/redis/node-redis/pull/2324/files the RedisClient definition of MULTI and multi changed to this:

export default class RedisClient<...> {
    ...
    MULTI(): RedisClientMultiCommandType<M, F, S> {
        return new (this as any).Multi(
            this.multiExecutor.bind(this),
            this.#options?.legacyMode
        );
    }

    multi = this.MULTI;

the resultant built JS is:

class RedisClient extends events_1.EventEmitter {
    ...
    constructor(options) {
        super();
        ...
        Object.defineProperty(this, "multi", {
            enumerable: true,
            configurable: true,
            writable: true,
            value: this.MULTI
        });

So multi cannot be wrapped because it doesn't exist on the prototype, it is a property of the instance. I think wrapping can be guarded by a if (redisClientPrototype?.multi) { to avoid the log message. I'll open a separate PR for that.

trentm added a commit to trentm/opentelemetry-js-contrib that referenced this issue Oct 12, 2023
…) handling

The instrumentation was not handling a rejection of the promise from
client.multi(), resulting in unended spans and an unhandleRejection
event.

Fixes: open-telemetry#1708
@pichlermarc
Copy link
Member

I think I have a fix. Details in a bit.

@pichlermarc I'm a member of the OTel org now. Should that mean I should be able to assign myself to this issue?

I think only approvers and maintainers (users write-access to repo) can do that, I assigned you :)

@pichlermarc pichlermarc added the has:reproducer This bug/feature has a minimal reproduction provided label Oct 12, 2023
@FredrikAugust
Copy link
Author

@trentm Thanks a lot for the great explanations! I arrived at the same conclusions so happy to see that I wasn't completely lost:) Apologies for not updating the issue as I understood the problem(s) in greater detail.

As for the second issue I saw that it will change between "multi" and "MULTI" depending on what redis SDK version you're running. This can be observed by running the test suite for this repo. So I assume it's trying to patch both of the commands (even though they're the same?) and soft-fails when it can't find the "other"?

@trentm
Copy link
Contributor

trentm commented Oct 12, 2023

So I assume it's trying to patch both of the commands (even though they're the same?) and soft-fails when it can't find the "other"?

Yes, exactly. In #1729 I change it to only attempt to patch multi or MULTI if it actually exists on the prototype -- so that should work for all @redis/client versions.

@FredrikAugust
Copy link
Author

Thanks for the great work, @trentm!<3

@drob
Copy link

drob commented Oct 17, 2023

I only had the no original function multi to wrap log spam, not the crash. Confirming that it's fixed for me in @opentelemetry/instrumentation-redis-4 v0.35.3.

Thank you, @trentm!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working has:reproducer This bug/feature has a minimal reproduction provided pkg:instrumentation-redis-4 priority:p1 Bugs which cause problems in end-user applications such as crashes, data inconsistencies
Projects
None yet
4 participants