-
Notifications
You must be signed in to change notification settings - Fork 143
Use hash writer for aristo computeKey #3572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
computeKey
…ent computing branchKey multiple times within a single call.
computeKey
computeKey
computeKey
w.startList(2) | ||
w.append(pfx.toHexPrefix(isLeaf = true).data()) | ||
w.wrapEncoding(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't this call to wrapEnconding
depend on leafData
being of a particular type? if yes, why use auto
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about wrapEncoding
but the reason I changed this line was that we are passing a type account or slot and don't need to pass in a block of statements. Perhaps it would be better to make it even more strict and use Account | UInt256
as the type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok - it feels odd that it should be needed cc @chirag-parmar - ie append
should be able to figure this out ideally - but it seems to work, so ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the late reply. the call to wrapEncoding
doesn't depend on the leafData
being of a particular type. Only leaf nodes have nested serializations i.e. Serializing an object/list/tuple that already has a serialized rlp value in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
append should be able to figure this out ideally
I didn't understand what it must be able to figure out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what I would have expected maybe is a call like appendEncoded(leafData)
which does the extra round of encoding of the passed-in item - wrapEncoding
looks like an internal low-level operation, specially with that number being passed in, because presumably it must be followed up with actual data - that gives the caller an unnecessary opportunity to use it the wrong way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense actually. status-im/nim-eth#813
so .. the final thing to do with this branch is to run a benchmark: using an unhashed snapshot like https://eth1-db.nimbus.team/mainnet-20250831-nohash.tar.gz, run the node pre/post to measure how long it takes to create the full hash tree (it's printed when starting the node) - afair, 25% of cpu time was spent allocating rlp encodings, so we should see a good boost here. |
Co-authored-by: Jacek Sieka <[email protected]>
I ran a test using an unhashed snapshot I have from just after the merge at around block 15000000. Here are the results. When running the baseline build from master:
When running the hash writer build from the branch:
Total run time for the master build was 2h33m19s while the total run time for the hash writer build was 3h1m22s. Unfortunately with these changes the stateroot computation appears to be slower than the master branch. @chirag-parmar Perhaps you could do a benchmark as well to confirm these results? |
I'm not sure why we are measuring the compute time. The purpose of the hash writer was to save on memory and not time. It still does improve on time a little as seen here but only for large data objects like entire blocks.
I'll try and run the benchmarks and measure the memory usage too. But I wouldn't expect results too far off from @bhartnett 's benchmarks. |
not really - it's both - memory allocations take time and the idea is to avoid creating work for the garbage collector, which ultimately saves both time and memory, when done right. 25% of time is spent allocating and collecting, and by creating the hashes directly we should be able to avoid that work entirely. |
This is true. But we use the saved time to do two passes of the data to be encoded. And we also do allocate some memory for prefixes. Maybe I can try and use the
I agree. I will look into how I can customize the optimization for aristo db so that we save on time as well. |
closed duplicate: #3226
fixes: #3164
Uses the rlp hash writer for Aristo DB's
computeKey
implementation