WIP: First part of the DHT blog post #390

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Draft

rklaehn wants to merge 13 commits into main from dht1

Contributor

rklaehn commented Sep 10, 2025

No description provided.


          First part of the DHT blog post

6c23dc0

vercel bot commented Sep 10, 2025 •

edited

Loading

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
iroh-computer	Ready	Preview	Comment	Sep 16, 2025 11:53am

rklaehn marked this pull request as draft

September 10, 2025 13:14

vercel bot deployed to Preview

September 10, 2025 13:15

View deployment

n0bot bot added this to iroh

github-project-automation bot moved this to 🏗 In progress in iroh


          Add simulation part with images.

011510b

vercel bot deployed to Preview

September 11, 2025 07:56

View deployment


          Add a what's next section.

fa95815

vercel bot deployed to Preview

September 11, 2025 10:18

View deployment

flub reviewed

View reviewed changes

Contributor

flub left a comment

Partial review only, but I noticed more changes already so submitting now.

src/app/blog/lets-write-a-dht/page.mdx Outdated Show resolved Hide resolved

src/app/blog/lets-write-a-dht/page.mdx Outdated Show resolved Hide resolved

src/app/blog/lets-write-a-dht/page.mdx Outdated Show resolved Hide resolved

src/app/blog/lets-write-a-dht/page.mdx Outdated Show resolved Hide resolved

src/app/blog/lets-write-a-dht/page.mdx Outdated Show resolved Hide resolved

src/app/blog/lets-write-a-dht/page.mdx Outdated

    
              For that reason, mature systems such as the mainline DHT restrict value size to 1000 bytes, and we are going to limit value size to 1024 bytes or 1KiB.

              You could write a DHT to store arbitrary values, but in almost all cases the value should have some relationship with the key. E.g. for mainline, the value in most cases is a set of socket addresses where you can download the data for the SHA1 hash of the key. So in principle you could validate the key by checking if you can actually download the data from the socket addresses contained in the data. In some mainline extensions, like bep_0044, the key is the SHA1 hash of an ED25519 public key, and the value contains the actual public key, a signature computed from the corresponding private key, and some user data. Again, it is possible to validate the value based on the key - if the SHA1 hash of the public key contained in the value does not match the lookup key, the value is invalid for the key.

Contributor

flub Sep 11, 2025

You sort of seem to suggest that enforcing only storing values that can be validated is a good idea. But you don't actually go as far as suggesting that. What is the aim of this paragraph? It may need to be a bit stronger worded in it's suggestion/recommendation?

Contributor Author

rklaehn Sep 11, 2025

I think it is a good idea, and mainline kinda sorta does it. All the extensions have validatable values (bep_0044 and the one for immutable data).

For the main use case, you can not store arbitrary socket addrs for a SHA1 hash, but only the socket addrs of your own node as seen from the callee. The only free parameter is the port number.

I think I want to do this as well, but for blake3 providers you either let the DHT node do BLAKE3 probes (costly), or you store a signed record or something. Not quite worked out yet.

Contributor Author

rklaehn Sep 11, 2025

So, possibility 1: values for content discovery are just node ids. Tracker can lazily do BLAKE3 probes to make sure the data is there, and then purge values if not.

Possibility 2: values are a signed promise by the announcer that the data is there. Tracker can check the signature, but that does not really tell us if the promise is upheld. If you want to know if the data is there, you have to check.

src/app/blog/lets-write-a-dht/page.mdx Outdated Show resolved Hide resolved

src/app/blog/lets-write-a-dht/page.mdx Outdated Show resolved Hide resolved

src/app/blog/lets-write-a-dht/page.mdx Outdated Show resolved Hide resolved

src/app/blog/lets-write-a-dht/page.mdx Outdated Show resolved Hide resolved

rklaehn and others added 4 commits

September 11, 2025 14:29


          Add TLA first time DHT is used.

196c3fe

Co-authored-by: Floris Bruynooghe <[email protected]>


          Update src/app/blog/lets-write-a-dht/page.mdx

c5f7c9c

Co-authored-by: Floris Bruynooghe <[email protected]>


          Update src/app/blog/lets-write-a-dht/page.mdx

03db192

Co-authored-by: Floris Bruynooghe <[email protected]>


          Update src/app/blog/lets-write-a-dht/page.mdx

a1c8638

Co-authored-by: Floris Bruynooghe <[email protected]>

vercel bot deployed to Preview

September 11, 2025 12:32

View deployment


          PR review

628576d

vercel bot deployed to Preview

September 11, 2025 12:43

View deployment

flub reviewed

View reviewed changes

Contributor

flub left a comment

Did get a little further... but not yet to the end.

src/app/blog/lets-write-a-dht/page.mdx Outdated

    
              And that's it. That is the entire rpc protocol. Many DHT implementations also add a `Ping` call, but since querying the routing table is so cheap, if you want to know if a node is alive you might as well ask it for the closest nodes to some random key and get some extra information for free.

              ## RPC client

Contributor

flub Sep 11, 2025

You could skip this entire section for a blog post, it would also shorten it. This is more tutorial-material. To me the protocol is the important bit, this is just boilerplate.

Contributor Author

rklaehn Sep 12, 2025

Maybe publish the crate (under a better name) and just point to the docs / impl?

Contributor

flub Sep 12, 2025

Yeah, that's also a good option if you can point to the code in a repo somewhere. Though maybe @b5 is interested in a tutorial as well to turn into another video?

src/app/blog/lets-write-a-dht/page.mdx Outdated Show resolved Hide resolved

src/app/blog/lets-write-a-dht/page.mdx Outdated

    
              So let's define the routing table. First of all we need some simple integer arithmetic like xor and leading_zeros for 256 bit numbers. There are various crates that provide this, but since we don't need anything fancy like multiplication or division, we just quickly implemented it inline.

              The routing table itself is just a 2d array of node ids. Each row (k-bucket) has a small fixed upper size, so we are going to use the [ArrayVec] crate to prevent allocations. For each node id we just keep a tiny bit of extra information - a timestamp when we have last seen any evidence that the node actually exists and responds, to decide which nodes to check for liveness.

Contributor

flub Sep 11, 2025

I haven't really been paying much attention to this till now, so there might be other occurances. But perhaps we should stick to "node ID" in text when referring to the iroh NodeId. It's mostly a stylistic question though.

src/app/blog/lets-write-a-dht/page.mdx Outdated Show resolved Hide resolved

Member

ramfox commented Sep 12, 2025

Okay, this is great. Note, I did not read for grammar or correctness, just read it while thinking about structure.

My suggestion here would be to immediately lean into the fact that this is going to be a series and already split this into 3 separate posts.

The first post would be about what a DHT is and what properties we want in a DHT. We can link to code as a preview for what is to come, and add a conclusion that describes the next few blog posts.

The second post would be the implementation post, ending with something to the effect of "so now we have an implementation, how do we know it works? Next blog post we will describe how to test our DHT implementation and tips for testing iroh networks on the order of 100k locally by using irpc in our protocols."

The third post would be illustrating the way you tested the DHT using irpc. The conclusion can describe some topics for future posts on DHTs.

Each post should have a table of contents and the top with links to each blog post in the series as they get published, as well as a link to the next post in the series at the bottom of each page.

@rklaehn what are your thoughts on splitting it this way? Your post is already basically structured in this fashion, it would just be splitting it explicitly.

Contributor Author

rklaehn commented Sep 12, 2025

Okay, this is great. Note, I did not read for grammar or correctness, just read it while thinking about structure.

My suggestion here would be to immediately lean into the fact that this is going to be a series and already split this into 3 separate posts.

The first post would be about what a DHT is and what properties we want in a DHT. We can link to code as a preview for what is to come, and add a conclusion that describes the next few blog posts.

The second post would be the implementation post, ending with something to the effect of "so now we have an implementation, how do we know it works? Next blog post we will describe how to test our DHT implementation and tips for testing iroh networks on the order of 100k locally by using irpc in our protocols."

The third post would be illustrating the way you tested the DHT using irpc. The conclusion can describe some topics for future posts on DHTs.

Each post should have a table of contents and the top with links to each blog post in the series as they get published, as well as a link to the next post in the series at the bottom of each page.

@rklaehn what are your thoughts on splitting it this way? Your post is already basically structured in this fashion, it would just be splitting it explicitly.

Yeah, makes sense. I wonder if the first part is interesting enough, but we can follow it up qucikly with at least the second part.

Member

ramfox commented Sep 15, 2025

@rklaehn I do think it's interesting enough, it's just not to you because you understand it all already.

Most people in our space know what a DHT is, but they haven't spent the time actually thinking practically about they should want in a DHT, which is what your blog post especially highlights.

rklaehn and others added 2 commits

September 16, 2025 09:46


          Update src/app/blog/lets-write-a-dht/page.mdx

e62a439

Co-authored-by: Floris Bruynooghe <[email protected]>


          Update src/app/blog/lets-write-a-dht/page.mdx

811eef3

Co-authored-by: Floris Bruynooghe <[email protected]>

vercel bot deployed to Preview

September 16, 2025 07:48

View deployment


          Replace text plots with images

07d4f82

vercel bot deployed to Preview

September 16, 2025 08:47

View deployment

rklaehn added 2 commits

September 16, 2025 14:46


          Split blog post into 3 parts

b049a7c


          New titles

a3a3eaa

vercel bot deployed to Preview

September 16, 2025 11:53

View deployment

Contributor Author

rklaehn commented Sep 16, 2025

@ramfox I split it now. Hope it is roughly where you intended the split. We can review the first part so that it is ready to go and then split off the rest into another PR.

For at least the second part I would love to publish the code, so I can refer to the code on docs.rs, but that requires the conn pool to be published, which requires a blobs release, which requires an iroh release.

But for the first part, which is really just an appetizer, we could do without I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet