router: make full map_callrw with split args#644
Conversation
Serpentian
left a comment
There was a problem hiding this comment.
Well done, only one major comment (upgrade), we must address, other ones are nits and smth to think about)
| end | ||
| -- Netbox async requests work only with active connections. | ||
| -- So, we need to wait for the master connection explicitly. | ||
| timeout = deadline - fiber_clock() |
There was a problem hiding this comment.
Let's consider returning the remaining timeout from grouped_buckets, so that we don't have to do that crutchy things with deadline. Moreover, fiber_clock is not so cheap, as it seems (it goes to C and back), and if you return the remaining timeout from grouped_buckets you'll need to make one additional fiber_clock there and not two of them inside router_ref_prepare
Consider moving that to the separate commit, since the same applies even before refactoring
There was a problem hiding this comment.
The grouped_buckets function is used in router API. If our new version of grouped_buckets will return timeout in addition to grouped_buckets and err it can cause some problems of our clients who used this function in their services and don't expect returning of 3 params instead of 2.
There was a problem hiding this comment.
Firstly, buckets_group is internal, since its name starts with _ in public API, so we can change it however we like, user application should not rely on that function. Secondly, if we really care, we can always make a wrapper around buckets_group, so that it returns only 2 arguments, as it was before.
But it's too minor to care about, so feel free to resolve. Up to you
| -- high-level ref functions (such as router_ref_storage_all and router_ref_ | ||
| -- storage_by_buckets). | ||
| -- | ||
| local function router_ref_send(router, timeout, args_builder, grouped_buckets) |
There was a problem hiding this comment.
Hmm, did you consider making it even more general: router_map_callrw_send and router_map_callrw_collect and also reuse them in the replicasets_map_reduce too? There we have almost the same code. Not a call to action, just something to think about.
There was a problem hiding this comment.
Yes, router_map_callrw_prepare/send/collect sounds better.
But I don't think that we can easily reuse these functions in replicasets_map_reduce because:
- We need to change api of
router_map_callrw_sendby- changing
routerparam intoreplicasets_all, asreplicasets_map_reducedoes not haveroutervariable in it. - adding extra argument -
return_raw
- changing
- The logic of sending map stage in
replicasets_map_reduceis more complex than logic ofrouter_map_callrw_send. In last one we only do RPC with no arguments or withgrouped_buckets[rs_id]and that's all. But inreplicasets_map_reducewe need to dynamically change those arguments which should be passed in RPC. Before RPC we add bucket arguments toargsand after RPC we delete them. The main problem here is that we can only easily add groupped buckets in args builder, but not delete, because the deletion is happened after RPC. If we want to do it in our newrouter_map_callrw_sendfunction we need to complicateargs_builderand usetable.deepcopy(it can slow down perf). - We need to change api of
router_map_callrw_collectby passing extrareturn_rawargument. - Also
router_map_callrw_collectwill be complicated as we should add two different ways of extracting results.
There was a problem hiding this comment.
changing router param into replicasets_all, as replicasets_map_reduce does not have router variable in it.
1.1 Yes, and it will become replicasets_map_send, which is way cleaner, since you will explicitly pass the replicasets and not use some non obvious loigic to create the list of replicasets based on optional argument grouped_buckets (which won't be needed). You can return replicasets from the router_map_prepare.
adding extra argument - return_raw
1.2. Let's just pass the opts and add is_async explicitly inside the function. And call the funciton router_map_callrw_send_async. The name should always say, what the function does.
The logic of sending map stage in replicasets_map_reduce is more complex than logic of router_map_callrw_send.
- Agree, see no other way than deepcopy.
We need to change api of router_map_callrw_collect by passing extra return_raw argument.
Also router_map_callrw_collect will be complicated as we should add two different ways of extracting results.
- You can always just create the new function
router_map_callrw_collect_rawand not pass therawto it
Again, up to you, if you think, that the current variant is better, I'm ok
| return res | ||
| end | ||
|
|
||
| local function storage_ref_check_existent(rid, bucket_ids) |
There was a problem hiding this comment.
Maybe storage_ref_check_with_present_buckets, so that the name is similar to storage_ref_make_with_buckets
There was a problem hiding this comment.
storage_ref_check_with_present_buckets is very long name. May be a little shorter?
There was a problem hiding this comment.
But storage_ref_check_existent doesn't say, what the function does. Maybe smb will propose better naming, I don't really like the current one, but I don't insist
| local allstatus = consts.BUCKET | ||
| for _, bucket_id in pairs(bucket_ids) do | ||
| local bucket = box.space._bucket:get{bucket_id} | ||
| if bucket and bucket.status ~= allstatus.GARBAGE and |
There was a problem hiding this comment.
No need for allstatus, we have BGARGABE. If you don't mind, we can also refactor that small thing in the bucket_get_moved
There was a problem hiding this comment.
And in the buckets_are_all_rw_not_cache) Will be ok in the second commit
kamenkremen
left a comment
There was a problem hiding this comment.
Great patch! Left some minor comments below
| local futures = {} | ||
| local opts_async = {is_async = true} | ||
| local replicasets_all = router.replicasets | ||
| local rs_ids = grouped_buckets and grouped_buckets or replicasets_all |
There was a problem hiding this comment.
Isn't that equivalent to local rs_ids = grouped_buckets or replicasets_all?
|
|
||
| -- | ||
| -- Perform Ref stage of the Ref-Map-Reduce process on a subset of all the | ||
| -- replicasets, which contains all the listed bucket IDs. |
| -- | ||
| -- Ref stage: collect. | ||
| -- | ||
| futures = futures or {} |
There was a problem hiding this comment.
futures is already either {}(because of line 820) or some table, so we don't need this
| -- Group the buckets by replicasets according to the router cache. | ||
| grouped_buckets, err = buckets_group(router, bucket_ids, timeout) | ||
| if err ~= nil then | ||
| return nil, err |
There was a problem hiding this comment.
Nit: triple space in indentation
| return nil, lerror.make('Router can\'t execute map_callrw with ' .. | ||
| '\'partial\' mode and nil bucket_ids') |
There was a problem hiding this comment.
| return nil, lerror.make('Router can\'t execute map_callrw with ' .. | |
| '\'partial\' mode and nil bucket_ids') | |
| return nil, lerror.make("Router can't execute map_callrw with " .. | |
| "'partial' mode and nil bucket_ids") |
There was a problem hiding this comment.
I prefer to use single quotes for string literals in lua. The same for other comments below.
| return nil, lerror.make('Router can\'t execute map_callrw with ' .. | ||
| '\'full\' mode and numeric bucket_ids') |
This comment was marked as duplicate.
This comment was marked as duplicate.
Sorry, something went wrong.
| t.assert_equals(res.err.message, 'Router can\'t execute map_callrw ' .. | ||
| 'with \'full\' mode and numeric bucket_ids') |
This comment was marked as duplicate.
This comment was marked as duplicate.
Sorry, something went wrong.
| t.assert_equals(res.err.message, 'Router can\'t execute map_callrw ' .. | ||
| 'with \'partial\' mode and nil bucket_ids') |
This comment was marked as duplicate.
This comment was marked as duplicate.
Sorry, something went wrong.
1bbdade to
8962951
Compare
Before this patch the main `map_callrw` ref functions such as `router_ref_storage_all` and `router_ref_storage_by_buckets` were enormous (71 and 108 lines of code). Also these functions have a large number of similar functional code blocks such as "sending refs", "collecting refs" e.t.c. Since in tarantool#559 patch we will extend the logic of full map_callrw making it able to work with split args, the `router_ref_storage_all` can double in size. It can lead to degradation of our codebase due to less readability. To fix it we firstly determine general and repeated code blocks in ref functions: 1) `ref-prepare`: groups buckets by replicasets with router's cache, builds a table of "target" replicasets and waits necessary masters. 2) `ref-send`: sends refs to the remote storage asynchronously and builds a table of future objects for the next processing. 3) `ref-collect`: waits until future objects are ready in order to extract payload from it (responses of storages' functions). 4) `ref-process`: a custom logic for `full` or `partial` map_callrw modes which describes how we should process results from future objects. After defining the main stages of ref map_callrw functions we should unify them so that we can use them in both `router_ref_storage_all` and `router_ref_storage_by_buckets`. Needed for tarantool#559 NO_TEST=refactoring NO_DOC=refactoring
In this patch we change `allstatus.GARBAGE/SENT` on `BACTIVE/BSENT` to not repeat the code. Needed for tarantool#214 NO_DOC=refactoring NO_TEST=refactoring
This patch takes initialization of `rid` out to `router_map_callrw` and passes this variable to ref-functions. It is needed for future features tidiness, for example - `make full map_callrw with split args` in which the logic of `router_map_callrw` becomes more complex. Needed for tarantool#559 NO_DOC=refactoring NO_TEST=refactoring
Before this patch the `router-luatest/reload_test` checked router's services only with old routers. However in future patch (tarantoolgh-214) we need to check map_callrw with old storages. In order to make it able we: 1) change `vtest.cluster_new` so that we can pass server_config with certain ENV (LUA_PATH) variable into it. It can help us to create a new cluster on old version of vshard. 2) change `reload_router` to more general `reload_server` in order to unify the process of servers (router / storage) upgrade in `router-luatest/reload_test`. 3) unify the process of cluster creation - `create_cluster_on_specific_version` and the process of getting server's config with new ENV (LUA_PATH) variable - `get_config_for_specific_vshard_version`. Needed for tarantool#214 NO_DOC=refactoring NO_TEST=refactoring
This patch introduces a new way of `map_callrw` execution by which we can pass some arguments to all storages and split buckets' arguments to those storages that have at least one bucket of `bucket_ids`. To achieve this we introduce a new string option - `mode` to `map_callrw` api. Also we change the logic of `router_ref_storage_all` ref function. Firstly we ref all storages and get back an amount of "moved" buckets according to the previously built router's cache. Then if there are no "moved" buckets we accumulate and check total amount of buckets on all storages and finish map_callrw ref stage. Otherwise, if there are some "moved" buckets we perform the second network hop by checking on which replicasets do the remaining "moved" buckets reside on. Closes tarantool#559 @TarantoolBot document Title: vshard: `mode` option for `router.map_callrw()` This string option regulates on which storages the user function will be executed via `map_callrw`. Possible values: 1) mode = 'partial'. In this mode user function will be executed on storages that have at least one bucket of 'bucket_ids'. The 'bucket_ids' option can be presented in two ways: like a numeric array of buckets' ids or like a map of buckets' arguments. In first one user function will only receive args, in second one it will additionally receive buckets' arguments. 2) mode = 'full'. In this mode user function will be executed with args on all storages in cluster. If we pass 'bucket_ids' like a map of bucket's arguments the user function will additionally receive buckets' arguments on those storages that have at least one bucket of 'bucket_ids'. If we didn't specify the 'mode' option, then it is set based on 'bucket_ids' option - if 'bucket_ids' is presented, the mode will be 'partial' otherwise 'full'. Also now `map_callrw` ends with error in cases of `<mode = 'full', bucket_ids = {1, 2, ...}>` and `<mode = 'partial', bucket_ids = nil>`.
8962951 to
8f5223c
Compare
Serpentian
left a comment
There was a problem hiding this comment.
I don't have any major comments, I think it's time to ask @Gerold103 for review, so that we can be sure, that there no major flaws, which I've missed
| local server = g.cluster:build_server({ | ||
| alias = replica_name, | ||
| box_cfg = box_cfg, | ||
| env = server_config.env |
There was a problem hiding this comment.
It's not a server config, if you use just one variable from it, let's instead set server_config.alias and server_config.box_cfg and pass it fully to build_server
| local global_cfg | ||
|
|
||
| local function get_config_for_specific_vshard_version(hash) | ||
| git_util.exec('checkout', {args = hash .. ' -f', dir = g.vshard_copy_path}) |
There was a problem hiding this comment.
This is definitely not about get config, it checkouts the repo. From the function name it looks as read-only, but it's not. Let's either say about that in the function name or move it out of the function (I'd prefer the latter, it's not that widely used)
| -- The test works in the following directory | ||
| local vardir = vtest.vardir or fio.tempdir() | ||
| g.vshard_copy_path_load = vardir .. '/vshard_copy' | ||
| t.assert_equals(fio.mkdir(g.vshard_copy_path_load), true) |
There was a problem hiding this comment.
The catch_flaky workflow fails on that line, it looks like the test doesn't support parallel runs, since the vardir points to /tmp/t/00X_router-luatest , which is shared among several workers. We should use smth else
|
|
||
| g.before_test('test_map_callrw', function(g) | ||
| g.cluster:drop() | ||
| -- Full mapp_callrw with split args was introduced just right after |
There was a problem hiding this comment.
Nit: mapp
And let's be consistent and specify the name of the commit:
-- Latest meaningful commit:
-- "router: fix reload problem with global function refs".
| -- storage versions in order to check that there will be no crashes | ||
| -- of storages due to changes in storage_ref_* functions. | ||
| create_cluster_on_specific_version( | ||
| '1be7b8e1055ecd2f2033d6304408e246b0f2ba46') |
There was a problem hiding this comment.
Not an option, the hash of commits change on merge, let's pick some commit from the master and not your branch
|
|
||
| g.after_test('test_map_callrw', function(g) | ||
| g.cluster:drop() | ||
| create_cluster_on_specific_version(g.latest_hash) |
There was a problem hiding this comment.
You don't need to recreate it, it's enough to drop your cluster, it'll be automatically created in before_each later
| return res | ||
| end | ||
|
|
||
| local function storage_ref_check_existent(rid, bucket_ids) |
There was a problem hiding this comment.
But storage_ref_check_existent doesn't say, what the function does. Maybe smb will propose better naming, I don't really like the current one, but I don't insist
| local replicasets_all = router.replicasets | ||
| local deadline = fiber_clock() + timeout | ||
| if bucket_ids then | ||
| bucket_ids = bucket_ids or {} |
There was a problem hiding this comment.
Should be in the first commit
| local bucket_ids = {} | ||
| for _, res in pairs(results) do | ||
| for _, bucket in pairs(res.moved) do | ||
| if type(res) ~= 'table' then |
There was a problem hiding this comment.
Why do we start checking for return type and why we don't do that in the router_map_callrw_process_existent
| args, grouped_args, { | ||
| timeout = timeout, return_raw = do_return_raw | ||
| }) | ||
| opts = {timeout = timeout, return_raw = do_return_raw, mode = mode} |
There was a problem hiding this comment.
You don't need mode, it's unused in the replicasets_map_reduce
Gerold103
left a comment
There was a problem hiding this comment.
Thanks for working on this complex topic and being patient about comments 🙏.
| -- high-level ref functions (such as router_ref_storage_all and router_ref_ | ||
| -- storage_by_buckets). | ||
| -- | ||
| local function router_map_callrw_send(router, timeout, args_builder, |
There was a problem hiding this comment.
The name send is very generic. When I saw this function, I was confused what it is sending - ref? function calls? Since this stage is about refs exclusively, I suggest to include "ref" into the name.
Same about router_map_callrw_collect() - collect what: results of refs? results of the user function calls? From its code I see that it is only applicable to refs. So lets name its name reflect that, and be in sync with the send-function.
Something like '..._ref_send' + '..._ref_wait'?
There was a problem hiding this comment.
fixed, router_map_callrw_ref_send and router_map_callrw_ref_wait sound better
| -- Waits until all future objects are ready and extracts results from it. | ||
| -- | ||
| local function router_map_callrw_collect(futures, timeout) | ||
| local results = {} |
There was a problem hiding this comment.
There was at least one reason in the previous code to have the stages not extracted into separate functions - the ability to reuse Lua tables. Lua table, as we use them here, are essentially just hash-tables. With all the consequences, that they are allocated on the heap, might need re-allocation, re-hashing, etc. And the goal was to minimize the number of such tables created and destroyed per request.
But I see also your point about code readability. I just hope it won't hurt perf. Probably we shouldn't notice much, since map-requests in general are quite expensive anyway due to other bigger reasons, than Lua tables.
There was a problem hiding this comment.
I don't quite understand how the usage of hash-tables in separate functions can significantly degradate the perf of map_callrw.
And the goal was to minimize the number of such tables created and destroyed per request.
In our new map_callrw version we additionally create only results table after router_map_callrw_ref_send - this is the only place where some extra time can be spend on hash-table actions. futures, grouped_buckets, replicasets_to_check(wait) and bucket_ids we also created in the previous version of map_callrw (there is nothing new here).
| local replicasets_all = router.replicasets | ||
| local rs_ids = grouped_buckets or replicasets_all | ||
| for rs_id, _ in pairs(rs_ids) do | ||
| local args_ref = args_builder(rs_id) |
There was a problem hiding this comment.
This looks expensive and complicated. It worries me that this requires a function call + each of them will return a new Lua table, while previously it was possible to reuse them.
I don't know tbh. Is this code really much more readable? For example: local rs_ids = grouped_buckets or replicasets_all. That looks quite untrivial to me. The caller not only must be aware of how to make this args_builder callback, but also know that grouped_buckets is optional and its default is "all replicasets".
| for _, bucket in pairs(res.moved) do | ||
| if type(res) ~= 'table' then | ||
| goto continue | ||
| end |
There was a problem hiding this comment.
What is it then if not a table?
There was a problem hiding this comment.
A number for example. During the send stage of router_ref_storage_all we make RPCs of next functions: storage_ref and storage_ref_make_with_buckets.
storage_ref always returns a number.
storage_ref_make_with_buckets always returns a table.
In router_map_callrw_process_moved we use type(res) ~= 'table' in order to explicitly distinguish from which function the result was came back (we need only moved buckets which can be received only from storage_ref_make_with_buckets).
| return bucket_ids | ||
| end | ||
|
|
||
| local function router_map_callrw_process_existent(router, results) |
There was a problem hiding this comment.
I suspect you have meant here existing, not existent. The meaning is slightly different. It also took me a while to understand what this function does. Given that it is used in a single place, and looks very short, maybe lets better inline it. Perhaps the purpose of its code will be more obvious in the inlined place.
| args_builder = function() return {'storage_ref', rid, timeout} end | ||
| args_builder = function(rs_id) | ||
| local buckets = grouped_buckets[rs_id] or {} | ||
| if grouped_buckets[rs_id] then | ||
| return {'storage_ref_make_with_buckets', rid, timeout, buckets} | ||
| else | ||
| return {'storage_ref', rid, timeout} | ||
| end | ||
| end | ||
| timeout, err, err_id, futures = router_map_callrw_send(router, timeout, | ||
| args_builder) |
There was a problem hiding this comment.
Out of curiosity. How much shorter is the code now compared with when we would inline router_map_callrw_send right here?
| return ok, ret1, ret2, ret3 | ||
| end | ||
|
|
||
| local function bucket_get_existent(bucket_ids) |
There was a problem hiding this comment.
Same about existent vs existing.
| local bucket = box.space._bucket:get{bucket_id} | ||
| if bucket and bucket.status ~= BGARBAGE and bucket.status ~= BSENT then |
There was a problem hiding this comment.
I am not completely sure, so this is why I am asking. Do you know if space:get() returns nil or box.NULL when nothing is found? If the latter, then this code will break when nothing is found, because box.NULL is cast to true implicitly. But perhaps I am wrong and :get() returns nil.
There was a problem hiding this comment.
get method of space object always returns nil if there is no such tuple.
reproducer
box.cfg{}
box.schema.create_space('test')
box.space.test:format({{name = 'id', type = 'unsigned'}})
box.space.test:create_index('pk', {parts = {'id'}})
a = box.space.test:get(123)
type(a) -- nil
b = box.NULL
type(b) -- cdata| local function bucket_get_existent(bucket_ids) | ||
| local res = {} | ||
| for _, bucket_id in pairs(bucket_ids) do | ||
| local bucket = box.space._bucket:get{bucket_id} |
There was a problem hiding this comment.
This is expensive to go space-get on each bucket ID, given the potential number of those ids. I suggest to use bucket ref cache instead. High chance, that most buckets, if not all, will be already there. Lua tables and orders of magnitude faster than spaces.
This patch introduces a new way of
map_callrwexecution by which we canpass some arguments to all storages and split buckets' arguments to those
storages that have at least one bucket of
bucket_ids. To achieve this weintroduce a new string option -
modetomap_callrwapi.Also we change the logic of
router_ref_storage_allref function. Firstlywe ref all storages and get back an amount of "moved" buckets according
to the previously built router's cache. Then if there are no "moved"
buckets we accumulate and check total amount of buckets on all storages
and finish map_callrw ref stage. Otherwise, if there are some "moved"
buckets we perform the second network hop by checking on which replicasets
do the remaining "moved" buckets reside on.
Closes #559