-
Notifications
You must be signed in to change notification settings - Fork 18
Description
Continuation of #85 and part of #70 efforts.
If we use ECC with K+M encoding (K data blocks and M parity blocks in each EC group), a dataset contains S*K data blocks and S*M parity blocks. Here we discuss various ways to distribute them over N nodes and currently we support (and discuss here) only the N=K+M case.
Note that the block order of the original data has special meaning - it's the typical retrieval order (in particular, currently we support only sequential download of the entire dataset). For simplicity, we extend dataset with empty blocks to the nearest S*K blocks size. OTOH, parity block order isn't important and we can choose it arbitrarily.
Layout is defined by two functions providing mapping of:
- node number to indexes of data and parity blocks it should store (used by contract to fill slots)
- EC group number to indexes of data and parity blocks it includes (used by uploader to construct ECC, used by downloader to recover data)
We have the following requirements to Layouts, in order of decreasing importance:
- each EC group is distributed over all nodes (in order to maximize reliability)
- (almost) each EC group contains adjacent data blocks, i.e. blocks i+1..., i+K (in order to maximize ECC recovery performance)
- simple calculation of the functions describing Layout
Extra requirement that will allow us to develop extensible contracts:
- Layout functions should still work when extra data blocks (and corresponding parity blocks) are added to the dataset