Skip to content

Conversation

Cluster444
Copy link
Contributor

This is the a zig implementation of the rapidhash successor to wyhash.

It's an improvement on small key performance and quality. Especially now that wyhash is no longer being maintained.

@Cluster444 Cluster444 marked this pull request as draft November 27, 2024 19:41
@Cluster444 Cluster444 marked this pull request as ready for review November 27, 2024 19:44
@Rexicon226
Copy link
Contributor

Rexicon226 commented Nov 27, 2024

Mind posting the benchmarks? Both for small keys and larger ones, I'm curious how it compares.
Also, you should look into benchmarking this with SMHasher, this is how I did it for XXHash3 tiehuis/smhasher@499a3ca. I'm curious to see how this implementation compares against the original.

@Cluster444
Copy link
Contributor Author

Here's the results of the builtin benchmark.

My use case is 32-64 byte keys, so these are nice results.

I'll look into getting a comparison against the full C version. Thought that will be more of a test of the compiler difference wouldn't it? I imagine the C version compiled with zig cc or any llvm based compiler (I'm on Mac), would spit out largely similar results.

wyhash        8B  1939 MiB/s 254244862 Hashes/s
rapidhash     8B  3636 MiB/s 476664955 Hashes/s
cityhash-64   8B  3452 MiB/s 452557617 Hashes/s

wyhash       16B  3996 MiB/s 261919935 Hashes/s
rapidhash    16B  6827 MiB/s 447457051 Hashes/s
cityhash-64  16B  5973 MiB/s 391473478 Hashes/s

wyhash       32B  6772 MiB/s 221935524 Hashes/s
rapidhash    32B 12398 MiB/s 406277176 Hashes/s
cityhash-64  32B  9793 MiB/s 320917933 Hashes/s

wyhash       64B  7956 MiB/s 130366529 Hashes/s
rapidhash    64B 11091 MiB/s 181728281 Hashes/s
cityhash-64  64B 10020 MiB/s 164176690 Hashes/s

wyhash      128B 10192 MiB/s  83499477 Hashes/s
rapidhash   128B 12685 MiB/s 103918014 Hashes/s
cityhash-64 128B  7246 MiB/s  59359939 Hashes/s

wyhash      256B 11923 MiB/s  48840452 Hashes/s
rapidhash   256B 13968 MiB/s  57214819 Hashes/s
cityhash-64 256B  7722 MiB/s  31632683 Hashes/s

@Cluster444
Copy link
Contributor Author

@Rexicon226 Here's the output from rapidhash and zig-rapidhash

C rapidhash

./SMHasher --test=Sanity,Speed,Speed,Bulk,SpeedSmall rapidhash
--- Testing rapidhash "rapidhash v1" GOOD

[[[ Sanity Tests ]]]

Verification value 0xAF404C4B ....... PASS
Running sanity check 1     .......... PASS
Running AppendedZeroesTest .......... PASS

[[[ Speed Tests ]]]

WARNING: timer resolution is 9223372036854775807 (0x7fffffffffffffff) ticks (0x7ebe6b423d900 - 0x7ebe6b423d900). Broken VDSO?
Bulk speed test - 262144-byte keys
Alignment  7 - 10.536 bytes/cycle - 30143.77 MiB/sec @ 3 ghz
Alignment  6 - 10.533 bytes/cycle - 30135.01 MiB/sec @ 3 ghz
Alignment  5 - 10.539 bytes/cycle - 30152.90 MiB/sec @ 3 ghz
Alignment  4 - 10.538 bytes/cycle - 30148.37 MiB/sec @ 3 ghz
Alignment  3 - 10.534 bytes/cycle - 30137.96 MiB/sec @ 3 ghz
Alignment  2 - 10.535 bytes/cycle - 30141.22 MiB/sec @ 3 ghz
Alignment  1 - 10.534 bytes/cycle - 30137.74 MiB/sec @ 3 ghz
Alignment  0 - 10.532 bytes/cycle - 30132.20 MiB/sec @ 3 ghz
Average      - 10.535 bytes/cycle - 30141.15 MiB/sec @ 3 ghz

Small key speed test -    1-byte keys -    20.85 cycles/hash
Small key speed test -    2-byte keys -    20.84 cycles/hash
Small key speed test -    3-byte keys -    20.84 cycles/hash
Small key speed test -    4-byte keys -    21.98 cycles/hash
Small key speed test -    5-byte keys -    21.97 cycles/hash
Small key speed test -    6-byte keys -    21.97 cycles/hash
Small key speed test -    7-byte keys -    21.97 cycles/hash
Small key speed test -    8-byte keys -    21.98 cycles/hash
Small key speed test -    9-byte keys -    21.97 cycles/hash
Small key speed test -   10-byte keys -    21.98 cycles/hash
Small key speed test -   11-byte keys -    21.96 cycles/hash
Small key speed test -   12-byte keys -    21.97 cycles/hash
Small key speed test -   13-byte keys -    21.96 cycles/hash
Small key speed test -   14-byte keys -    21.98 cycles/hash
Small key speed test -   15-byte keys -    21.95 cycles/hash
Small key speed test -   16-byte keys -    21.97 cycles/hash
Small key speed test -   17-byte keys -    23.00 cycles/hash
Small key speed test -   18-byte keys -    23.00 cycles/hash
Small key speed test -   19-byte keys -    23.00 cycles/hash
Small key speed test -   20-byte keys -    23.00 cycles/hash
Small key speed test -   21-byte keys -    23.00 cycles/hash
Small key speed test -   22-byte keys -    23.00 cycles/hash
Small key speed test -   23-byte keys -    23.00 cycles/hash
Small key speed test -   24-byte keys -    23.00 cycles/hash
Small key speed test -   25-byte keys -    23.00 cycles/hash
Small key speed test -   26-byte keys -    23.00 cycles/hash
Small key speed test -   27-byte keys -    23.00 cycles/hash
Small key speed test -   28-byte keys -    23.00 cycles/hash
Small key speed test -   29-byte keys -    23.00 cycles/hash
Small key speed test -   30-byte keys -    23.00 cycles/hash
Small key speed test -   31-byte keys -    23.00 cycles/hash
Small key speed test -   32-byte keys -    23.00 cycles/hash
Average                                    22.379 cycles/hash
Average, weighted by key length freq.      22.420 cycles/hash (using 93.0% of top-7m Tranco DNS names dataset)
Average, weighted by key length freq.      21.983 cycles/hash (using 27.1% of startup-1M UMASH trace dataset)


Input vcode 0x00000001, Output vcode 0x00000001, Result vcode 0x00000001
Verification value is 0x00000001 - Testing took 15.384311 seconds

Zig rapidhash

./SMHasher --test=Sanity,Speed,Speed,Bulk,SpeedSmall zig-rapidhash
--- Testing zig-rapidhash "zig rapidhash, 64-bit" GOOD

[[[ Sanity Tests ]]]

Verification value 0x97F29902 ....... FAIL! (Expected 0xAF404C4B)
Running sanity check 1     . 0: 0xC7 == 0xC7  FAIL  !!!!!
Running AppendedZeroesTest .......... PASS

[[[ Speed Tests ]]]

WARNING: timer resolution is 9223372036854775807 (0x7fffffffffffffff) ticks (0x7ec399e91db00 - 0x7ec399e91db00). Broken VDSO?
Bulk speed test - 262144-byte keys
Alignment  7 - 10.583 bytes/cycle - 30279.43 MiB/sec @ 3 ghz
Alignment  6 - 10.584 bytes/cycle - 30280.69 MiB/sec @ 3 ghz
Alignment  5 - 10.584 bytes/cycle - 30281.71 MiB/sec @ 3 ghz
Alignment  4 - 10.583 bytes/cycle - 30278.28 MiB/sec @ 3 ghz
Alignment  3 - 10.587 bytes/cycle - 30289.40 MiB/sec @ 3 ghz
Alignment  2 - 10.585 bytes/cycle - 30284.79 MiB/sec @ 3 ghz
Alignment  1 - 10.586 bytes/cycle - 30287.49 MiB/sec @ 3 ghz
Alignment  0 - 10.583 bytes/cycle - 30278.12 MiB/sec @ 3 ghz
Average      - 10.584 bytes/cycle - 30282.49 MiB/sec @ 3 ghz

Small key speed test -    1-byte keys -    20.94 cycles/hash
Small key speed test -    2-byte keys -    20.93 cycles/hash
Small key speed test -    3-byte keys -    20.94 cycles/hash
Small key speed test -    4-byte keys -    20.00 cycles/hash
Small key speed test -    5-byte keys -    20.33 cycles/hash
Small key speed test -    6-byte keys -    20.41 cycles/hash
Small key speed test -    7-byte keys -    20.32 cycles/hash
Small key speed test -    8-byte keys -    20.30 cycles/hash
Small key speed test -    9-byte keys -    20.15 cycles/hash
Small key speed test -   10-byte keys -    20.00 cycles/hash
Small key speed test -   11-byte keys -    20.33 cycles/hash
Small key speed test -   12-byte keys -    20.27 cycles/hash
Small key speed test -   13-byte keys -    20.18 cycles/hash
Small key speed test -   14-byte keys -    20.00 cycles/hash
Small key speed test -   15-byte keys -    20.00 cycles/hash
Small key speed test -   16-byte keys -    20.45 cycles/hash
Small key speed test -   17-byte keys -    23.00 cycles/hash
Small key speed test -   18-byte keys -    23.00 cycles/hash
Small key speed test -   19-byte keys -    23.00 cycles/hash
Small key speed test -   20-byte keys -    23.00 cycles/hash
Small key speed test -   21-byte keys -    23.00 cycles/hash
Small key speed test -   22-byte keys -    23.00 cycles/hash
Small key speed test -   23-byte keys -    23.00 cycles/hash
Small key speed test -   24-byte keys -    23.00 cycles/hash
Small key speed test -   25-byte keys -    23.00 cycles/hash
Small key speed test -   26-byte keys -    23.00 cycles/hash
Small key speed test -   27-byte keys -    23.00 cycles/hash
Small key speed test -   28-byte keys -    23.00 cycles/hash
Small key speed test -   29-byte keys -    23.00 cycles/hash
Small key speed test -   30-byte keys -    23.00 cycles/hash
Small key speed test -   31-byte keys -    23.00 cycles/hash
Small key speed test -   32-byte keys -    23.00 cycles/hash
Average                                    21.673 cycles/hash
Average, weighted by key length freq.      21.416 cycles/hash (using 93.0% of top-7m Tranco DNS names dataset)
Average, weighted by key length freq.      20.322 cycles/hash (using 27.1% of startup-1M UMASH trace dataset)


Input vcode 0x00000001, Output vcode 0x00000001, Result vcode 0x00000001
Verification value is 0x00000001 - Testing took 13.916414 seconds

@Cluster444
Copy link
Contributor Author

Two things I'm noticing, the verification fails at the top. I'm looking into that, not sure what that's being based on exactly. Maybe it's the seed?

Although it seems like the Zig version is winning in the 4-16 byte range, keeping on par with the 1-3 byte range, with the expected jump at 17+.

@Rexicon226
Copy link
Contributor

Interesting, thanks for running these.

The verification must succeed; otherwise, something is wrong with the implementation. The seed provided by smhasher is 0. You can find the logic behind the verification pass here:
https://github.com/rurban/smhasher/blob/b081a0ad4d4c51642a98a568dcee4b3922a94bad/KeysetTest.cpp#L16-L79
I'm sure a bit of printf debugging will yield answers.

Regarding the benchmarks themselves, I'm happy as long as they're roughly the same cycles per hash.

Ping me when you figure out the issue, and I'll happily give this a code review.

@Cluster444
Copy link
Contributor Author

@Rexicon226 Fixed it! My inner loops were reading wrong indexes.

Latest commit.

--- Testing zig-rapidhash "zig rapidhash, 64-bit" GOOD

[[[ Sanity Tests ]]]

Verification value 0xAF404C4B ....... PASS
Running sanity check 1     .......... PASS
Running AppendedZeroesTest .......... PASS

[[[ Speed Tests ]]]

WARNING: timer resolution is 9223372036854775807 (0x7fffffffffffffff) ticks (0x9f6c89c809c0 - 0x9f6c89c809c0). Broken VDSO?
Bulk speed test - 262144-byte keys
Alignment  7 - 10.634 bytes/cycle - 30424.91 MiB/sec @ 3 ghz
Alignment  6 - 10.635 bytes/cycle - 30426.98 MiB/sec @ 3 ghz
Alignment  5 - 10.636 bytes/cycle - 30430.89 MiB/sec @ 3 ghz
Alignment  4 - 10.630 bytes/cycle - 30411.87 MiB/sec @ 3 ghz
Alignment  3 - 10.635 bytes/cycle - 30427.39 MiB/sec @ 3 ghz
Alignment  2 - 10.639 bytes/cycle - 30438.34 MiB/sec @ 3 ghz
Alignment  1 - 10.638 bytes/cycle - 30435.97 MiB/sec @ 3 ghz
Alignment  0 - 10.638 bytes/cycle - 30436.33 MiB/sec @ 3 ghz
Average      - 10.636 bytes/cycle - 30429.08 MiB/sec @ 3 ghz

Small key speed test -    1-byte keys -    20.94 cycles/hash
Small key speed test -    2-byte keys -    20.94 cycles/hash
Small key speed test -    3-byte keys -    20.94 cycles/hash
Small key speed test -    4-byte keys -    20.14 cycles/hash
Small key speed test -    5-byte keys -    20.33 cycles/hash
Small key speed test -    6-byte keys -    20.00 cycles/hash
Small key speed test -    7-byte keys -    20.32 cycles/hash
Small key speed test -    8-byte keys -    20.31 cycles/hash
Small key speed test -    9-byte keys -    20.14 cycles/hash
Small key speed test -   10-byte keys -    20.00 cycles/hash
Small key speed test -   11-byte keys -    20.00 cycles/hash
Small key speed test -   12-byte keys -    20.27 cycles/hash
Small key speed test -   13-byte keys -    20.18 cycles/hash
Small key speed test -   14-byte keys -    20.13 cycles/hash
Small key speed test -   15-byte keys -    20.00 cycles/hash
Small key speed test -   16-byte keys -    20.43 cycles/hash
Small key speed test -   17-byte keys -    23.00 cycles/hash
Small key speed test -   18-byte keys -    23.00 cycles/hash
Small key speed test -   19-byte keys -    23.00 cycles/hash
Small key speed test -   20-byte keys -    23.00 cycles/hash
Small key speed test -   21-byte keys -    23.00 cycles/hash
Small key speed test -   22-byte keys -    23.00 cycles/hash
Small key speed test -   23-byte keys -    23.00 cycles/hash
Small key speed test -   24-byte keys -    23.00 cycles/hash
Small key speed test -   25-byte keys -    23.00 cycles/hash
Small key speed test -   26-byte keys -    23.00 cycles/hash
Small key speed test -   27-byte keys -    26.00 cycles/hash
Small key speed test -   28-byte keys -    24.68 cycles/hash
Small key speed test -   29-byte keys -    23.00 cycles/hash
Small key speed test -   30-byte keys -    23.00 cycles/hash
Small key speed test -   31-byte keys -    23.00 cycles/hash
Small key speed test -   32-byte keys -    23.00 cycles/hash
Average                                    21.804 cycles/hash
Average, weighted by key length freq.      21.468 cycles/hash (using 93.0% of top-7m Tranco DNS names dataset)
Average, weighted by key length freq.      20.289 cycles/hash (using 27.1% of startup-1M UMASH trace dataset)

@Cluster444
Copy link
Contributor Author

Here's the full output of running all the tests.

https://gist.github.com/Cluster444/57f403fb7121b601295d5aeb049f702f

@The-King-of-Toasters
Copy link
Contributor

Can you share how you ran this test using smhasher?

@Rexicon226
Copy link
Contributor

Rexicon226 commented Dec 1, 2024

Can you share how you ran this test using smhasher?

See my first comment where I show how I did it for XXHash3. I assume it was a similar process.

@Cluster444
Copy link
Contributor Author

Can you share how you ran this test using smhasher?

See my first comment where I show how I did it for XXHash3. I assume it was a similar process.

Yup, I the same zig build and link direct into smhasher.

@Cluster444
Copy link
Contributor Author

@Rexicon226 Inlined into file level struct. That was a little weird on Mac because apparently it is case insensitive. Kind of surprised, a git mv --force was needed to get it to actually rename.

Let me know if there's anything else!

Copy link
Contributor

@Rexicon226 Rexicon226 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the delay, I've been quite busy.
I gave a couple of comments for you to work on :)

const e = len - 4;
a = (r32(k) << 32) | r32(k[e..]);
b = ((r32(k[d..]) << 32) | r32(k[(e - d)..]));
} else if (len > 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you put this if case into brackets, I think it'd be more readable that way.

return mix(a ^ sc[0] ^ len, b ^ sc[1]);
}

test "RapidHash.hash" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the test case below implementation-related functions.

for (sizes, outcomes) |s, e| {
const r = hash(RAPID_SEED, bytes[0..s]);

expectEqual(e, r) catch |err| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nit, you can use expectEqualSlice here instead.

Comment on lines +14 to +17
var a: u64 = 0;
var b: u64 = 0;
var k = input;
var is: [3]u64 = .{ seed, 0, 0 };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be able to name these variables something more descriptive? The code is a bit hard to read right now since it's difficult to understand what each variable is representing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya for sure, that was mostly to ease writing it since it was easier for me to read, I'll do some rename swaps on those.


if (len <= 16) {
if (len >= 4) {
const d: u64 = ((len & 24) >> @intCast(len >> 3));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra parenthesis here and in a couple of places below.

mitchellh added a commit to ghostty-org/ghostty that referenced this pull request Dec 23, 2024
Fixes #2497 

While investigating the issue I added an integrity check that found a
problem hiding in the insert logic that was unrelated- in fixing that I
greatly simplified the insert logic.

It turns out that #2497 is ultimately just a case of bad luck,
pathological inputs that result in very non-uniform hashes so the
clustering overwhelms things. The solution was just to add a check and
claim we're out of memory.

I tried adding an entropy folding function to fix the hash a little but
it had a measurable negative impact on performance and isn't necessary
so I've not included it here. Currently there's an open PR to Zig to
[add RapidHash](ziglang/zig#22085), which is the
successor to Wyhash and apparently has much better statistical
characteristics on top of being faster. I imagine it will land in time
for 0.14 so whenever we update to 0.14 we should probably switch our
standard hash function to RapidHash, which I imagine should yield
improvements across the board.

Using the AutoHasher may also be not the best idea, I may explore ways
to improve how we generate our style hashes in the future.
Copy link
Collaborator

@linusg linusg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given it outperforms wyhash this PR should probably also switch the default hasher used by auto-hashing containers.

@rofrol
Copy link
Contributor

rofrol commented Feb 11, 2025

Is Swiss Tables related? https://x.com/petermattis/status/1889080982273163466

@andrewrk andrewrk merged commit b7512c3 into ziglang:master Feb 22, 2025
10 checks passed
@andrewrk
Copy link
Member

Thanks! Follow-up PR welcome to switch std lib and/or compiler from Wyhash, provided that it comes with perf data points.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants