-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get locality plots #257
Comments
I'm currently generating the csv for all the targets. Let me know if anyone needs the csv to plot it |
Initial plotting. I'll get a better computer to do the rest of the blocks but I think this shows the picture pretty well. Each dot represents a leaf access. Remap was done to row 27 in code to do the plotting
You can check this really quickly by looping through targets and printing if a target is less than 20,000 or something. Ex: block 149796 position accesses that are to the right of the forest: 3645,18770,39935,22413,5460,24055,15002,19616,43166,16032,18813,7304,41912,16383,25441,24913 |
Very cool. The things that jump out at me: Also then there's these weird empty chunks, where there are, I guess, 100,000 leaves that are a total dead-zone; the whole range gets created (and created very quickly, as you can tell by the fact that the curve "jumps" upwards in the region where the gap is created) and then never accessed. What are these UTXOs? Spam I guess... is this mainnet of testnet? Maybe we could look at better unspendable detection if this is just spam. Then again, clustered together spam is better than spread apart spam. Also interesting is that in the ~20K or so block after those regions are created, nothing diffuses into those empty regions, and the regions themselves don't move. With a high-up enough swap they might eventually. |
-- to be updated with blocks closer to the tip --
|
I think these really came out well. Think I can stop plotting now :) code:
|
Have u read/ viewed the presentation of the paper "Merkle Trees Optimized for Stateless Clients in Bitcoin"? |
We’re aware. You may want to read our discussion on a different accumulator design |
Thanks, I added a comment there. |
Not quite sure what you mean by
Not sure what you mean by |
I meant freq of access of each Merkle Tree leaf (how frequent its proof is requested), I was thinking would it be more efficient to make the Merkle Tree weighted? |
In utreexo, the shortest trees and the newest leaves are on the rightmost side. In Bitcoin, there is a spending pattern in which the newest utxos are spent the quickest. Therefore, newest leaves are the shortest lived. With this in mind, we know that we already have the shortest path to prove a leaf. Since the rightmost trees are the shortest and the shortest lived leaves are at the rightmost trees, we already have this efficiency. There’s no need to add additional complexity which will hurt readability and speed of the code. With a swapless design, the locality (newest leaves being on the rightmost side) will improve even more. |
-I thought the whole idea from these plots is that this statement is not quite accurate and needs more investigation? -So is it useful to distinguish bet these kinds? |
@Shymaa-Arafat What we do know, in the case of IBD, is when each leaf will be accessed, since the whole set of transactions is known ahead of time. Clustering many leaves together so that they're all simultaneously accessed at the same time would reduce proof sizes, and that's what these plots are trying to look at. So far we're not looking at "predicting" TXO lifespan, as for IBD we already know them with certainty. |
Maybe I do have a misunderstanding here, u mean if a TXO has value of say 2 btc then 1btc is spent/transformed/exchanged/..., this one is dead and u create another TXO to hold the remaining value????
It is just an idea of using these past values in the IBD to predict the future, but maybe it is not best suited here as it address the structure of the original forest not the cashing mechanism |
Yes, that's how bitcoin and UTXOs work, entries are created, then read and deleted. Nothing gets modified or read without being deleted (except maybe for mempool transactions that don't make it into a block). There might be ways to predict TXO durations based on heuristics but we're not doing that yet. Duration of the input TXOs likely does correlate with duration of output TXOs so that could help give hints on what to cache, but so far we're focusing more on IBD where we have all the data for certain and we can likely get a bigger improvement. |
Although I'm still not sure this the best suited issue for this discussion, but a first thought idea would be to replace the old TXO that became input with the new one; ie no swaps just modify the corresponding hashes in the proof |
On the 2021-02-22 call, we discussed locality and realized... we don't actually know what kind of locality we're getting! Are all the leaves getting proofs all clustered at the right, with occasional leaves on the left? We hope so, but don't have much data there.
We have the data though; we just need to visualize / analyze it. A good place to get the data is in
accumulator.BatchProof.Targets
, either on the server end when it's being built, or the client end when it's being consumed.Or actually, just reading it off of disk should also be easy. It's just a list of
uint64
s for every utxo consumed in a block. (Could beuint32
but that's another issue).We should start with a histogram, or 3D histogram. For ever block, just make a histogram of
accumulator.BatchProof.Targets
. Since the values range from 0 to tens of millions, and there's only a few thousand entries, visualizing it directly with 1 horizontal pixel per position, and 1 vertical pixel per number of targets at that position would be too sparse. So we can group ranges together, such that 1 horizontal pixel represents 100 position values, and then pixels can pile up on top of each other in the bin. There's probably some matplotlib stuff for this, and also to make a 3D plot of these histograms over time each block gets one row on the z-axis)This would give a good starting point for understanding the locality we're getting. Are there big hills on the right, and a few little dots on the left? Is it scattered randomly and uniformly? How does it change over time? This could lead to better ideas about measuring the locality and proof sizes, and hopefully ideas about how to improve locality to get faster performance and smaller proofs.
The text was updated successfully, but these errors were encountered: