router: make full map_callrw with split args by mrForza · Pull Request #644 · tarantool/vshard

mrForza · 2026-03-04T09:49:40Z

This patch introduces a new way of map_callrw execution by which we can
pass some arguments to all storages and split buckets' arguments to those
storages that have at least one bucket of bucket_ids. To achieve this we
introduce a new string option - mode to map_callrw api.

Also we change the logic of router_ref_storage_all ref function. Firstly
we ref all storages and get back an amount of "moved" buckets according
to the previously built router's cache. Then if there are no "moved"
buckets we accumulate and check total amount of buckets on all storages
and finish map_callrw ref stage. Otherwise, if there are some "moved"
buckets we perform the second network hop by checking on which replicasets
do the remaining "moved" buckets reside on.

Closes #559

Serpentian

Well done, only one major comment (upgrade), we must address, other ones are nits and smth to think about)

Serpentian · 2026-03-12T15:54:03Z

+    end
+    -- Netbox async requests work only with active connections.
+    -- So, we need to wait for the master connection explicitly.
+    timeout = deadline - fiber_clock()


Let's consider returning the remaining timeout from grouped_buckets, so that we don't have to do that crutchy things with deadline. Moreover, fiber_clock is not so cheap, as it seems (it goes to C and back), and if you return the remaining timeout from grouped_buckets you'll need to make one additional fiber_clock there and not two of them inside router_ref_prepare

Consider moving that to the separate commit, since the same applies even before refactoring

The grouped_buckets function is used in router API. If our new version of grouped_buckets will return timeout in addition to grouped_buckets and err it can cause some problems of our clients who used this function in their services and don't expect returning of 3 params instead of 2.

Firstly, buckets_group is internal, since its name starts with _ in public API, so we can change it however we like, user application should not rely on that function. Secondly, if we really care, we can always make a wrapper around buckets_group, so that it returns only 2 arguments, as it was before.

But it's too minor to care about, so feel free to resolve. Up to you

Serpentian · 2026-03-12T17:05:39Z

+-- high-level ref functions (such as router_ref_storage_all and router_ref_
+-- storage_by_buckets).
+--
+local function router_ref_send(router, timeout, args_builder, grouped_buckets)


Hmm, did you consider making it even more general: router_map_callrw_send and router_map_callrw_collect and also reuse them in the replicasets_map_reduce too? There we have almost the same code. Not a call to action, just something to think about.

Yes, router_map_callrw_prepare/send/collect sounds better.

But I don't think that we can easily reuse these functions in replicasets_map_reduce because:

We need to change api of router_map_callrw_send by

changing router param into replicasets_all, as replicasets_map_reduce does not have router variable in it.

adding extra argument - return_raw

The logic of sending map stage in replicasets_map_reduce is more complex than logic of router_map_callrw_send. In last one we only do RPC with no arguments or with grouped_buckets[rs_id] and that's all. But in replicasets_map_reduce we need to dynamically change those arguments which should be passed in RPC. Before RPC we add bucket arguments to args and after RPC we delete them. The main problem here is that we can only easily add groupped buckets in args builder, but not delete, because the deletion is happened after RPC. If we want to do it in our new router_map_callrw_send function we need to complicate args_builder and use table.deepcopy (it can slow down perf).

We need to change api of router_map_callrw_collect by passing extra return_raw argument.

Also router_map_callrw_collect will be complicated as we should add two different ways of extracting results.

changing router param into replicasets_all, as replicasets_map_reduce does not have router variable in it.

1.1 Yes, and it will become replicasets_map_send, which is way cleaner, since you will explicitly pass the replicasets and not use some non obvious loigic to create the list of replicasets based on optional argument grouped_buckets (which won't be needed). You can return replicasets from the router_map_prepare.

adding extra argument - return_raw

1.2. Let's just pass the opts and add is_async explicitly inside the function. And call the funciton router_map_callrw_send_async. The name should always say, what the function does.

The logic of sending map stage in replicasets_map_reduce is more complex than logic of router_map_callrw_send.

Agree, see no other way than deepcopy.

We need to change api of router_map_callrw_collect by passing extra return_raw argument.
Also router_map_callrw_collect will be complicated as we should add two different ways of extracting results.

You can always just create the new function router_map_callrw_collect_raw and not pass the raw to it

Again, up to you, if you think, that the current variant is better, I'm ok

Serpentian · 2026-03-12T17:14:45Z

+    return res
+end
+
+local function storage_ref_check_existent(rid, bucket_ids)


Maybe storage_ref_check_with_present_buckets, so that the name is similar to storage_ref_make_with_buckets

storage_ref_check_with_present_buckets is very long name. May be a little shorter?

But storage_ref_check_existent doesn't say, what the function does. Maybe smb will propose better naming, I don't really like the current one, but I don't insist

Serpentian · 2026-03-12T17:21:28Z

+    local allstatus = consts.BUCKET
+    for _, bucket_id in pairs(bucket_ids) do
+        local bucket = box.space._bucket:get{bucket_id}
+        if bucket and bucket.status ~= allstatus.GARBAGE and


No need for allstatus, we have BGARGABE. If you don't mind, we can also refactor that small thing in the bucket_get_moved

And in the buckets_are_all_rw_not_cache) Will be ok in the second commit

kamenkremen

Great patch! Left some minor comments below

kamenkremen · 2026-03-20T16:49:51Z

+    local futures = {}
+    local opts_async = {is_async = true}
+    local replicasets_all = router.replicasets
+    local rs_ids = grouped_buckets and grouped_buckets or replicasets_all


Isn't that equivalent to local rs_ids = grouped_buckets or replicasets_all?

kamenkremen · 2026-03-20T17:36:23Z


 --
-- Perform Ref stage of the Ref-Map-Reduce process on a subset of all the
 -- replicasets, which contains all the listed bucket IDs.


Comment got cut out

kamenkremen · 2026-03-20T17:36:42Z

-    --
-    -- Ref stage: collect.
-    --
+    futures = futures or {}


futures is already either {}(because of line 820) or some table, so we don't need this

kamenkremen · 2026-03-20T17:36:53Z

+    -- Group the buckets by replicasets according to the router cache.
+    grouped_buckets, err = buckets_group(router, bucket_ids, timeout)
+    if err ~= nil then
+       return nil, err


Nit: triple space in indentation

kamenkremen · 2026-03-20T17:54:32Z

+        return nil, lerror.make('Router can\'t execute map_callrw with ' ..
+                                '\'partial\' mode and nil bucket_ids')


Suggested change

return nil, lerror.make('Router can\'t execute map_callrw with ' ..

'\'partial\' mode and nil bucket_ids')

return nil, lerror.make("Router can't execute map_callrw with " ..

"'partial' mode and nil bucket_ids")

I prefer to use single quotes for string literals in lua. The same for other comments below.

Sign in to view

+        return nil, lerror.make('Router can\'t execute map_callrw with ' ..
+                                '\'full\' mode and numeric bucket_ids')


Sign in to view

+    t.assert_equals(res.err.message, 'Router can\'t execute map_callrw ' ..
+                    'with \'full\' mode and numeric bucket_ids')


Sign in to view

+    t.assert_equals(res.err.message, 'Router can\'t execute map_callrw ' ..
+                    'with \'partial\' mode and nil bucket_ids')


Before this patch the main `map_callrw` ref functions such as `router_ref_storage_all` and `router_ref_storage_by_buckets` were enormous (71 and 108 lines of code). Also these functions have a large number of similar functional code blocks such as "sending refs", "collecting refs" e.t.c. Since in tarantool#559 patch we will extend the logic of full map_callrw making it able to work with split args, the `router_ref_storage_all` can double in size. It can lead to degradation of our codebase due to less readability. To fix it we firstly determine general and repeated code blocks in ref functions: 1) `ref-prepare`: groups buckets by replicasets with router's cache, builds a table of "target" replicasets and waits necessary masters. 2) `ref-send`: sends refs to the remote storage asynchronously and builds a table of future objects for the next processing. 3) `ref-collect`: waits until future objects are ready in order to extract payload from it (responses of storages' functions). 4) `ref-process`: a custom logic for `full` or `partial` map_callrw modes which describes how we should process results from future objects. After defining the main stages of ref map_callrw functions we should unify them so that we can use them in both `router_ref_storage_all` and `router_ref_storage_by_buckets`. Needed for tarantool#559 NO_TEST=refactoring NO_DOC=refactoring

In this patch we change `allstatus.GARBAGE/SENT` on `BACTIVE/BSENT` to not repeat the code. Needed for tarantool#214 NO_DOC=refactoring NO_TEST=refactoring

This patch takes initialization of `rid` out to `router_map_callrw` and passes this variable to ref-functions. It is needed for future features tidiness, for example - `make full map_callrw with split args` in which the logic of `router_map_callrw` becomes more complex. Needed for tarantool#559 NO_DOC=refactoring NO_TEST=refactoring

Before this patch the `router-luatest/reload_test` checked router's services only with old routers. However in future patch (tarantoolgh-214) we need to check map_callrw with old storages. In order to make it able we: 1) change `vtest.cluster_new` so that we can pass server_config with certain ENV (LUA_PATH) variable into it. It can help us to create a new cluster on old version of vshard. 2) change `reload_router` to more general `reload_server` in order to unify the process of servers (router / storage) upgrade in `router-luatest/reload_test`. 3) unify the process of cluster creation - `create_cluster_on_specific_version` and the process of getting server's config with new ENV (LUA_PATH) variable - `get_config_for_specific_vshard_version`. Needed for tarantool#214 NO_DOC=refactoring NO_TEST=refactoring

@TarantoolBot

This patch introduces a new way of `map_callrw` execution by which we can pass some arguments to all storages and split buckets' arguments to those storages that have at least one bucket of `bucket_ids`. To achieve this we introduce a new string option - `mode` to `map_callrw` api. Also we change the logic of `router_ref_storage_all` ref function. Firstly we ref all storages and get back an amount of "moved" buckets according to the previously built router's cache. Then if there are no "moved" buckets we accumulate and check total amount of buckets on all storages and finish map_callrw ref stage. Otherwise, if there are some "moved" buckets we perform the second network hop by checking on which replicasets do the remaining "moved" buckets reside on. Closes tarantool#559 @TarantoolBot document Title: vshard: `mode` option for `router.map_callrw()` This string option regulates on which storages the user function will be executed via `map_callrw`. Possible values: 1) mode = 'partial'. In this mode user function will be executed on storages that have at least one bucket of 'bucket_ids'. The 'bucket_ids' option can be presented in two ways: like a numeric array of buckets' ids or like a map of buckets' arguments. In first one user function will only receive args, in second one it will additionally receive buckets' arguments. 2) mode = 'full'. In this mode user function will be executed with args on all storages in cluster. If we pass 'bucket_ids' like a map of bucket's arguments the user function will additionally receive buckets' arguments on those storages that have at least one bucket of 'bucket_ids'. If we didn't specify the 'mode' option, then it is set based on 'bucket_ids' option - if 'bucket_ids' is presented, the mode will be 'partial' otherwise 'full'. Also now `map_callrw` ends with error in cases of `<mode = 'full', bucket_ids = {1, 2, ...}>` and `<mode = 'partial', bucket_ids = nil>`.

Serpentian

I don't have any major comments, I think it's time to ask @Gerold103 for review, so that we can be sure, that there no major flaws, which I've missed

Serpentian · 2026-04-08T16:50:57Z

            local server = g.cluster:build_server({
                alias = replica_name,
                box_cfg = box_cfg,
+                env = server_config.env


It's not a server config, if you use just one variable from it, let's instead set server_config.alias and server_config.box_cfg and pass it fully to build_server

Serpentian · 2026-04-08T16:54:32Z

 local global_cfg

+local function get_config_for_specific_vshard_version(hash)
+    git_util.exec('checkout', {args = hash .. ' -f', dir = g.vshard_copy_path})


This is definitely not about get config, it checkouts the repo. From the function name it looks as read-only, but it's not. Let's either say about that in the function name or move it out of the function (I'd prefer the latter, it's not that widely used)

Serpentian · 2026-04-08T17:04:18Z

    -- The test works in the following directory
    local vardir = vtest.vardir or fio.tempdir()
    g.vshard_copy_path_load =  vardir .. '/vshard_copy'
    t.assert_equals(fio.mkdir(g.vshard_copy_path_load), true)


The catch_flaky workflow fails on that line, it looks like the test doesn't support parallel runs, since the vardir points to /tmp/t/00X_router-luatest , which is shared among several workers. We should use smth else

Serpentian · 2026-04-08T17:14:56Z

+
+g.before_test('test_map_callrw', function(g)
+    g.cluster:drop()
+    -- Full mapp_callrw with split args was introduced just right after


Nit: mapp

And let's be consistent and specify the name of the commit:

-- Latest meaningful commit: -- "router: fix reload problem with global function refs".

Serpentian · 2026-04-08T17:16:59Z

+    -- storage versions in order to check that there will be no crashes
+    -- of storages due to changes in storage_ref_* functions.
+    create_cluster_on_specific_version(
+        '1be7b8e1055ecd2f2033d6304408e246b0f2ba46')


Not an option, the hash of commits change on merge, let's pick some commit from the master and not your branch

Serpentian · 2026-04-09T08:32:26Z

+
+g.after_test('test_map_callrw', function(g)
+    g.cluster:drop()
+    create_cluster_on_specific_version(g.latest_hash)


You don't need to recreate it, it's enough to drop your cluster, it'll be automatically created in before_each later

Serpentian · 2026-04-09T08:35:15Z

+    return res
+end
+
+local function storage_ref_check_existent(rid, bucket_ids)


But storage_ref_check_existent doesn't say, what the function does. Maybe smb will propose better naming, I don't really like the current one, but I don't insist

Serpentian · 2026-04-09T08:38:14Z

    local replicasets_all = router.replicasets
    local deadline = fiber_clock() + timeout
    if bucket_ids then
-        bucket_ids = bucket_ids or {}


Should be in the first commit

Serpentian · 2026-04-09T08:43:58Z

    local bucket_ids = {}
    for _, res in pairs(results) do
-        for _, bucket in pairs(res.moved) do
+        if type(res) ~= 'table' then


Why do we start checking for return type and why we don't do that in the router_map_callrw_process_existent

#644 (comment)

Serpentian · 2026-04-09T09:10:10Z

-            args, grouped_args, {
-                timeout = timeout, return_raw = do_return_raw
-            })
+        opts = {timeout = timeout, return_raw = do_return_raw, mode = mode}


You don't need mode, it's unused in the replicasets_map_reduce

Gerold103

Thanks for working on this complex topic and being patient about comments 🙏.

Gerold103 · 2026-04-13T21:36:17Z

+-- high-level ref functions (such as router_ref_storage_all and router_ref_
+-- storage_by_buckets).
+--
+local function router_map_callrw_send(router, timeout, args_builder,


The name send is very generic. When I saw this function, I was confused what it is sending - ref? function calls? Since this stage is about refs exclusively, I suggest to include "ref" into the name.

Same about router_map_callrw_collect() - collect what: results of refs? results of the user function calls? From its code I see that it is only applicable to refs. So lets name its name reflect that, and be in sync with the send-function.

Something like '..._ref_send' + '..._ref_wait'?

fixed, router_map_callrw_ref_send and router_map_callrw_ref_wait sound better

Gerold103 · 2026-04-13T21:38:47Z

+-- Waits until all future objects are ready and extracts results from it.
+--
+local function router_map_callrw_collect(futures, timeout)
+    local results = {}


There was at least one reason in the previous code to have the stages not extracted into separate functions - the ability to reuse Lua tables. Lua table, as we use them here, are essentially just hash-tables. With all the consequences, that they are allocated on the heap, might need re-allocation, re-hashing, etc. And the goal was to minimize the number of such tables created and destroyed per request.

But I see also your point about code readability. I just hope it won't hurt perf. Probably we shouldn't notice much, since map-requests in general are quite expensive anyway due to other bigger reasons, than Lua tables.

I don't quite understand how the usage of hash-tables in separate functions can significantly degradate the perf of map_callrw.

And the goal was to minimize the number of such tables created and destroyed per request.

In our new map_callrw version we additionally create only results table after router_map_callrw_ref_send - this is the only place where some extra time can be spend on hash-table actions. futures, grouped_buckets, replicasets_to_check(wait) and bucket_ids we also created in the previous version of map_callrw (there is nothing new here).

Gerold103 · 2026-04-13T21:44:32Z

+    local replicasets_all = router.replicasets
+    local rs_ids = grouped_buckets or replicasets_all
+    for rs_id, _ in pairs(rs_ids) do
+        local args_ref = args_builder(rs_id)


This looks expensive and complicated. It worries me that this requires a function call + each of them will return a new Lua table, while previously it was possible to reuse them.

I don't know tbh. Is this code really much more readable? For example: local rs_ids = grouped_buckets or replicasets_all. That looks quite untrivial to me. The caller not only must be aware of how to make this args_builder callback, but also know that grouped_buckets is optional and its default is "all replicasets".

Gerold103 · 2026-04-13T21:47:35Z

-        for _, bucket in pairs(res.moved) do
+        if type(res) ~= 'table' then
+            goto continue
+        end


What is it then if not a table?

A number for example. During the send stage of router_ref_storage_all we make RPCs of next functions: storage_ref and storage_ref_make_with_buckets.

storage_ref always returns a number.
storage_ref_make_with_buckets always returns a table.

In router_map_callrw_process_moved we use type(res) ~= 'table' in order to explicitly distinguish from which function the result was came back (we need only moved buckets which can be received only from storage_ref_make_with_buckets).

Gerold103 · 2026-04-13T21:48:48Z

    return bucket_ids
 end

+local function router_map_callrw_process_existent(router, results)


I suspect you have meant here existing, not existent. The meaning is slightly different. It also took me a while to understand what this function does. Given that it is used in a single place, and looks very short, maybe lets better inline it. Perhaps the purpose of its code will be more obvious in the inlined place.

Gerold103 · 2026-04-13T21:51:23Z

-    args_builder = function() return {'storage_ref', rid, timeout} end
+    args_builder = function(rs_id)
+        local buckets = grouped_buckets[rs_id] or {}
+        if grouped_buckets[rs_id] then
+            return {'storage_ref_make_with_buckets', rid, timeout, buckets}
+        else
+            return {'storage_ref', rid, timeout}
+        end
+    end
    timeout, err, err_id, futures = router_map_callrw_send(router, timeout,
                                                           args_builder)


Out of curiosity. How much shorter is the code now compared with when we would inline router_map_callrw_send right here?

Gerold103 · 2026-04-13T21:52:20Z

    return ok, ret1, ret2, ret3
 end

+local function bucket_get_existent(bucket_ids)


Same about existent vs existing.

Gerold103 · 2026-04-13T21:53:47Z

+        local bucket = box.space._bucket:get{bucket_id}
+        if bucket and bucket.status ~= BGARBAGE and bucket.status ~= BSENT then


I am not completely sure, so this is why I am asking. Do you know if space:get() returns nil or box.NULL when nothing is found? If the latter, then this code will break when nothing is found, because box.NULL is cast to true implicitly. But perhaps I am wrong and :get() returns nil.

get method of space object always returns nil if there is no such tuple.

reproducer

box.cfg{} box.schema.create_space('test') box.space.test:format({{name = 'id', type = 'unsigned'}}) box.space.test:create_index('pk', {parts = {'id'}}) a = box.space.test:get(123) type(a) -- nil b = box.NULL type(b) -- cdata

Gerold103 · 2026-04-13T21:55:05Z

+local function bucket_get_existent(bucket_ids)
+    local res = {}
+    for _, bucket_id in pairs(bucket_ids) do
+        local bucket = box.space._bucket:get{bucket_id}


This is expensive to go space-get on each bucket ID, given the potential number of those ids. I suggest to use bucket ref cache instead. High chance, that most buckets, if not all, will be already there. Lua tables and orders of magnitude faster than spaces.

mrForza requested a review from Serpentian March 4, 2026 11:10

mrForza assigned Serpentian Mar 4, 2026

Serpentian requested a review from kamenkremen March 6, 2026 12:18

Serpentian assigned kamenkremen Mar 6, 2026

Serpentian reviewed Mar 12, 2026

View reviewed changes

Serpentian requested a review from Gerold103 March 12, 2026 18:23

Serpentian assigned Gerold103 and mrForza and unassigned Serpentian Mar 12, 2026

kamenkremen reviewed Mar 20, 2026

View reviewed changes

kamenkremen removed their assignment Mar 20, 2026

mrForza force-pushed the gh-559-full-map-call-rw-with-split-args branch from 1bbdade to 8962951 Compare March 30, 2026 10:54

mrForza added 5 commits April 1, 2026 15:50

storage: refactoring of bucket_get_moved

4e61f99

In this patch we change `allstatus.GARBAGE/SENT` on `BACTIVE/BSENT` to not repeat the code. Needed for tarantool#214 NO_DOC=refactoring NO_TEST=refactoring

mrForza force-pushed the gh-559-full-map-call-rw-with-split-args branch from 8962951 to 8f5223c Compare April 1, 2026 12:58

mrForza requested a review from Serpentian April 1, 2026 13:33

mrForza assigned Serpentian and unassigned Gerold103 and mrForza Apr 1, 2026

Serpentian reviewed Apr 9, 2026

View reviewed changes

Serpentian assigned Gerold103 and mrForza and unassigned Serpentian Apr 9, 2026

Gerold103 reviewed Apr 13, 2026

View reviewed changes

		return nil, lerror.make('Router can\'t execute map_callrw with ' ..
		'\'partial\' mode and nil bucket_ids')

		return nil, lerror.make('Router can\'t execute map_callrw with ' ..
		'\'full\' mode and numeric bucket_ids')

		t.assert_equals(res.err.message, 'Router can\'t execute map_callrw ' ..
		'with \'full\' mode and numeric bucket_ids')

		t.assert_equals(res.err.message, 'Router can\'t execute map_callrw ' ..
		'with \'partial\' mode and nil bucket_ids')

		local bucket = box.space._bucket:get{bucket_id}
		if bucket and bucket.status ~= BGARBAGE and bucket.status ~= BSENT then

Conversation

mrForza commented Mar 4, 2026

Uh oh!

Serpentian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Serpentian Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Serpentian Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kamenkremen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as duplicate.

Uh oh!

This comment was marked as duplicate.

Uh oh!

This comment was marked as duplicate.

Uh oh!

Serpentian left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Serpentian Apr 8, 2026 •

edited

Loading

Serpentian Apr 8, 2026 •

edited

Loading