Skip to content

[14.x] Add worker keepalive support for lease-based queue drivers#60637

Open
brecht-vermeersch wants to merge 8 commits into
laravel:masterfrom
brecht-vermeersch:queue-keepalive
Open

[14.x] Add worker keepalive support for lease-based queue drivers#60637
brecht-vermeersch wants to merge 8 commits into
laravel:masterfrom
brecht-vermeersch:queue-keepalive

Conversation

@brecht-vermeersch

@brecht-vermeersch brecht-vermeersch commented Jul 1, 2026

Copy link
Copy Markdown

This got a bit longer than intended, but I wanted to capture the background and tradeoffs clearly :)

Summary

This adds opt-in keepalive support to the queue worker for drivers that use a lease, visibility timeout, or reservation window while a job is in progress.

I ran into this while working on an Azure Storage Queue driver. That driver has the same basic problem as other lease-based queues: if a job runs longer than the queue’s lease, the message can become visible again and get picked up a second time even though the first worker is still processing it.

Today, the usual way to deal with this in Laravel is to make sure the worker timeout and the queue lease are configured with enough headroom.

For most queue drivers, that means setting the connection's retry_after value high enough to cover the longest expected job, and keeping the worker --timeout a little lower so a stuck worker is killed before the job is made available again. For SQS, Laravel relies on the queue's own visibility timeout instead of a retry_after setting.

That works, but it is still a static limit. If a job occasionally runs longer than expected, the message can still become visible again before the worker finishes processing it.

The idea here is to give queue connections a small contract they can implement if they know how to renew that lease while the job is still running.

Why this belongs in the worker

This is mainly useful for queues where “in progress” is time-bound on the transport side. A few examples that could benefit from this:

  • Official drivers such as SQS and Beanstalkd
  • Drivers with reservation-style semantics such as Redis or Database
  • Community drivers built on services like Azure Storage Queue
  • Other custom drivers that expose a visibility timeout or lease concept

I used Symfony Messenger’s keepalive support as a reference point here. Symfony has a similar feature for transports that can mark a message as still being processed, and that was a useful starting point.

Design

The feature is opt-in.

A connection advertises support by implementing Illuminate\Contracts\Queue\KeepsJobsAlive, and the worker only attempts keepalive calls when the current connection implements that contract. The keepalive interval is configured at the worker level through WorkerOptions and the --keepalive option on queue:work / queue:listen.

I intentionally kept this worker-level for now. I started out exploring per-job configuration, but after stepping back it felt like extra API surface without a strong use case.

Implementation notes

The main wrinkle is that the worker already uses SIGALRM for job timeouts.

Because of that, I could not just bolt on a second independent alarm loop. Instead, the worker now tracks two deadlines for the current job:

  • the timeout deadline
  • the next keepalive deadline

It arms a single alarm for whichever one comes first. When the alarm fires, timeout still wins if both deadlines have been reached.

There is an internal breaking change in Worker to make that work. In particular, one protected method was removed as part of simplifying the signal handling path, so this is aimed at a new major release. Even with that change, I tried to keep the overall diff as small as I could and avoid turning this into a larger scheduler abstraction.

Tradeoffs

I looked at two directions here:

  • keep the existing signal-based model and fold keepalive into it
  • move keepalive work into a child process or helper process

A child-process approach would avoid doing more work from the alarm path, but it adds a lot more coordination, process management, and failure handling. I went with signals because it fits the worker’s existing timeout model, keeps the implementation much smaller, and is also the direction Symfony took for Messenger’s keepalive support.

That does come with an important caveat: a transport’s keepAlive() implementation should stay cheap.

For transports that renew a lease over HTTP, long blocking calls in the keepalive path are a real concern. Drivers should use aggressive request timeouts and avoid treating keepalive as a general-purpose API call. If a transport cannot renew its lease quickly and predictably, it may not be a good fit for this model.

Driver guidance

One thing worth calling out for driver authors: the transport lease should be longer than the keepalive interval.

Symfony’s Amazon SQS transport enforces the weaker rule that the queue visibility timeout must not be smaller than the keepalive interval. I think the practical guidance should be a bit stricter than that: leave some headroom. Alarms are not perfectly punctual, and network-backed renewals can be delayed. Running the lease and keepalive cadence edge-to-edge leaves very little margin for jitter.

Scope

This PR adds the worker support and the opt-in contract, but does not update any specific transport yet. I think transport-specific implementations are easier to review as follow-up changes once the worker-level behavior is agreed on.

@brecht-vermeersch brecht-vermeersch marked this pull request as draft July 1, 2026 19:49
@brecht-vermeersch

Copy link
Copy Markdown
Author

For now, this is only implemented for the daemon worker path. It does not currently apply to runNextJob / --once, which matches the existing timeout behavior as well.

@brecht-vermeersch brecht-vermeersch marked this pull request as ready for review July 1, 2026 20:27
@brecht-vermeersch brecht-vermeersch changed the base branch from 13.x to master July 2, 2026 05:38
@brecht-vermeersch brecht-vermeersch marked this pull request as draft July 2, 2026 05:39
@brecht-vermeersch brecht-vermeersch changed the title Add worker keepalive support for lease-based queue drivers [14.x] Add worker keepalive support for lease-based queue drivers Jul 2, 2026
@brecht-vermeersch brecht-vermeersch marked this pull request as ready for review July 2, 2026 05:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants