refactor!: Align `RequestQueueClient` interface with its Python counterpart by janbuchar · Pull Request #3729 · apify/crawlee

janbuchar · 2026-06-10T10:04:21Z

request-queue sibling of refactor!: Align DatasetClient and KeyValueStoreClient interfaces with their Python counterparts #3627
Closes Rework the storage client system #3075
Coincidentally closes Remove unlocking requests from BasicCrawler #2840
the memory storage changes don't deserve too much attention — we're going to replace the implementation with crawlee-storage soon anyway
- this also covers MemoryStorageEmulator, which is slated for removal in Split off FileSystemStorage from MemoryStorage #3593. Its getRequestQueueItems helper now drains the queue via fetchNextRequest() and reclaims everything afterwards. Because reclaimRequest recomputes orderNo from scratch, this does not faithfully restore the original ordering (forefront requests lose their sign/priority, and same-millisecond reclaims can collapse relative order). That's fine here: all current callers use non-forefront requests with order-tolerant assertions, and the helper is going away with Split off FileSystemStorage from MemoryStorage #3593 — so it's not worth fixing.
related to Move Apify RequestQueue-specific logic to SDK #3328

Reduce `RequestQueueClient` from 12 methods to 9. Replace `listHead`/`addRequest`/`batchAddRequests`/`updateRequest`/ `listAndLockHead`/`prolongRequestLock`/`deleteRequestLock` with `addBatchOfRequests`/`fetchNextRequest`/`markRequestAsHandled`/ `reclaimRequest`/`isEmpty`, and drop the now-dead head/lock option types (`QueueHead`, `ListOptions`, `ListAndLockOptions`, `ListAndLockHeadResult`, `ProlongRequestLockOptions`, `ProlongRequestLockResult`, `DeleteRequestLockOptions`, `RequestQueueHeadItem`). Locking/coordination of multiple clients on the same queue is now an internal concern of the client implementation, not part of the interface. Part of #3075.

Reimplement the in-memory request queue client against the new interface. The client now owns the pending/in-progress/handled bookkeeping (an `inProgress` set on top of the existing `orderNo`-based ordering): - `addBatchOfRequests` replaces `addRequest`/`batchAddRequests` - `fetchNextRequest` pops the next pending request and marks it in progress - `markRequestAsHandled`/`reclaimRequest` replace `updateRequest` and operate on in-progress requests (returning `null` when not in progress) - `getRequest` is keyed by `uniqueKey` - `isEmpty` reports whether any pending request remains (in-progress requests are not counted) The lock-based `listHead`/`listAndLockHead`/`prolongRequestLock`/ `deleteRequestLock` methods are removed. Part of #3075.

…estQueue` class `RequestProvider`, `RequestQueueV1` and `RequestQueueV2` no longer differ in behaviour — request coordination (locking, queue-head management) is now an internal concern of the storage client — so they are merged into a single concrete `RequestQueue` class: - `RequestProvider` becomes `RequestQueue`; the `request_provider.ts` and `request_queue_v2.ts` modules are removed and the implementation lives in `request_queue.ts`. - `fetchNextRequest`/`markRequestHandled`/`reclaimRequest`/`isEmpty` delegate to the slim client; the queue-head, locking, consistency and `recentlyHandledRequests` bookkeeping is gone. - `isFinished` returns `false` while a background add batch is in flight, otherwise `client.isEmpty()`. - `isEmpty()` reflects only pending requests (the next `fetchNextRequest()` would return `null`), preserving the crawler's task-scheduling contract. - `getRequest` is keyed by `uniqueKey`. The `RequestProvider`/`RequestQueueV1`/`RequestQueueV2` exports are removed; use `RequestQueue` instead. `RequestProviderOptions` is renamed to `RequestQueueOptions`. Part of #3075.

Adapt `BasicCrawler` to the slim `RequestQueueClient` and the merged `RequestQueue` class: - Use `RequestQueue` in place of the removed `RequestProvider`/`RequestQueueV1` (instanceof checks, `open()`, parameter and field types). - The same-domain-delay path no longer pokes the queue's private `inProgress` set; it relies on `reclaimRequest` to return the request to the queue. - The error-handling safety net reclaims the request via `reclaimRequest` instead of calling the removed `deleteRequestLock`. Reclaiming a request that is no longer in progress is a harmless no-op on the client. Part of #3075.

Rewrite the request-queue test suites against the new API. Obsolete white-box tests of the removed locking/queue-head machinery are replaced with behavioral tests using a real MemoryStorage-backed client: - memory-storage forefront/handledRequestCount/ignore-non-json tests now drive `addBatchOfRequests`/`fetchNextRequest`/`markRequestAsHandled`/`reclaimRequest`/ `isEmpty`. - core `request-queue-v2` and `request_queue` tests cover the new lifecycle and `isEmpty`/`isFinished` semantics. - `MemoryStorageEmulator.getRequestQueueItems` drains pending requests via `fetchNextRequest` and reclaims them, since `listHead` no longer exists. Part of #3075.

Expand the `RequestQueueClient` migration table in the v4 upgrading guide with the full method mapping, the new fetch/handle/reclaim lifecycle, the `isEmpty` semantics, and the merge of `RequestProvider`/`RequestQueueV1`/`RequestQueueV2` into a single `RequestQueue` class. List the removed head/lock types. Drop the obsolete request-locking experiment guide (locking is no longer an opt-in experiment) and remove its now-empty "Experiments" sidebar category. Update the parallel scraping guide to use `RequestQueue` and drop the `requestLocking` experiment flag. Closes #3075.

janbuchar · 2026-06-10T16:00:14Z

@vdusek — a heads-up on a RequestQueueClient.is_empty semantic that surfaced while porting this RQ-client rework to crawlee-js. It's not a bug, but the contract is subtler than the method name and the base docstring suggest, and I think it's worth documenting on the Python side too.

TL;DR — In crawlee-python, RequestQueueClient.is_empty() does not mean "the next fetch_next_request() would return None". It means "there is no outstanding work left at all" — including requests currently in progress / locked (fetched but not yet handled or reclaimed). It's effectively a building block for is_finished(), not an "is there anything fetchable right now" check.

Where this comes from — all three clients agree, even though the base interface doesn't state it:

FS client: returns False as soon as len(state.in_progress_requests) > 0.
Apify single client: return not self._head_requests and not self._requests_in_progress.
Apify shared client: return len(head.items) == 0 and not self._queue_has_locked_requests.

The RequestQueue.is_empty() storage-wrapper docstring states it correctly ("either pending or being processed"), but the RequestQueueClient.is_empty() base interface docstring just says "True if the request queue is empty" — which reads like the weaker guarantee. The fetch_next_request docstrings ("a None return value does not mean processing finished … use is_finished") reinforce the wrong mental model.

Why it matters (multi-consumer) — is_empty() feeds is_finished(), and the autoscaled pool calls is_finished() on every orchestrator loop iteration, not only when idle. So a narrower is_empty() would let a consumer stop while it (or, in the shared case, another consumer) still holds the last requests under a lock. queue_has_locked_requests is the multi-consumer analogue of the single client's _requests_in_progress set.

is_empty() feeds is_finished()

... and also this is kinda weird in itself.

A request fetched via `fetchNextRequest` is locked (in progress) for 3 minutes by persisting a future `orderNo` to disk. If the process ends before the request is handled or reclaimed, that lock used to linger until it expired, blocking the request for the next consumer of the same on-disk queue. The in-memory client now tracks the requests it has locked and releases them in `MemoryStorage.teardown()`, resetting their `orderNo` (sign preserved, so forefront/normal ordering survives) so they become immediately fetchable again. Only this client needs the cleanup: the Apify platform releases a run's locks automatically on migrate/abort, and the file-system storage does not lock at all.

`RequestQueueClient.isEmpty()` previously reported `true` as soon as no request was immediately fetchable, ignoring requests that are merely locked (in progress) by another consumer. With multiple consumers sharing a queue, a consumer could therefore see the queue as empty — and let the crawler finish — while another consumer still held the last requests under a lock. `listPendingHead` now also reports whether it skipped any unhandled-but- locked request, and `isEmpty()` returns `true` only when nothing is pending AND nothing is locked. This mirrors the Apify platform shared client, whose `isEmpty` accounts for `queueHasLockedRequests`. Tests that encoded the old "in-progress means empty" semantics are updated accordingly (an in-progress request now keeps the queue non-empty and unfinished until it is handled).

janbuchar force-pushed the refactor/align-request-queue-client-interface branch from 6803f1f to acc2337 Compare June 10, 2026 10:13

janbuchar force-pushed the refactor/align-request-queue-client-interface branch 2 times, most recently from c16bb30 to 01d189a Compare June 10, 2026 13:16

janbuchar added 5 commits June 10, 2026 16:31

janbuchar force-pushed the refactor/align-request-queue-client-interface branch from 01d189a to 1d3ccf5 Compare June 10, 2026 14:35

janbuchar marked this pull request as draft June 10, 2026 14:40

janbuchar added 2 commits June 10, 2026 18:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor!: Align `RequestQueueClient` interface with its Python counterpart#3729

refactor!: Align `RequestQueueClient` interface with its Python counterpart#3729
janbuchar wants to merge 8 commits into
v4from
refactor/align-request-queue-client-interface

janbuchar commented Jun 10, 2026 •

edited

Loading

Uh oh!

janbuchar commented Jun 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

janbuchar commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

janbuchar commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

janbuchar commented Jun 10, 2026 •

edited

Loading

janbuchar commented Jun 10, 2026 •

edited

Loading