Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(worker): optimize job retrieval for failed jobs in chunks #3127

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

YotamZiv298
Copy link

Why

The current implementation of moveStalledJobsToWait processes failed jobs by making individual Job.fromId calls within a loop. This can lead to performance bottlenecks when dealing with many stalled jobs, as each job requires a separate Redis operation.

This change is necessary to:

  • Reduce the number of Redis operations
  • Improve memory usage by preventing large arrays of pending promises
  • Enhance performance when handling stalled jobs at scale

How

  • Implemented chunk processing for failed jobs using Promise.all
  • Reduced number of Redis operations by batching job retrievals

Copy link
Contributor

@manast manast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@YotamZiv298
Copy link
Author

Weird, tests fail… I'll check it out later today.

@manast
Copy link
Contributor

manast commented Mar 10, 2025

@roggervalf it seems this PR introduces new flakiness in some unrelated tests, we need to decide if merging it as is or trying to find the root cause.

Copy link
Collaborator

@roggervalf roggervalf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

@roggervalf
Copy link
Collaborator

it seems this PR introduces new flakiness in some unrelated tests, we need to decide if merging it as is or trying to find the root cause.

it seems that they are unrelated to this change

@roggervalf roggervalf force-pushed the move-stalled-jobs-to-wait-performance branch from 2508229 to de80f1e Compare March 11, 2025 04:54
@roggervalf roggervalf force-pushed the move-stalled-jobs-to-wait-performance branch from de80f1e to c073942 Compare March 11, 2025 14:29
@roggervalf roggervalf changed the title refactor(worker): optimize job retrieval for failed jobs in chunks perf(worker): optimize job retrieval for failed jobs in chunks Mar 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants