Skip to content

Conversation

gaujay
Copy link
Contributor

@gaujay gaujay commented Aug 20, 2025

Lookup performance got slightly degraded when an is-empty check was added to HashMap get_inner. See #636.

This restore original perfs by removing the intermediate function (based on a good suggestion by @cuviper).
Benchmark results before:

test loadfactor_lookup_14500        ... bench:       3,435.00 ns/iter (+/- 273.03)
test loadfactor_lookup_16500        ... bench:       3,475.58 ns/iter (+/- 141.44)
test loadfactor_lookup_18500        ... bench:       3,467.79 ns/iter (+/- 125.96)
test loadfactor_lookup_20500        ... bench:       3,434.16 ns/iter (+/- 44.57)
test loadfactor_lookup_22500        ... bench:       3,451.52 ns/iter (+/- 228.06)
test loadfactor_lookup_24500        ... bench:       3,467.88 ns/iter (+/- 136.52)
test loadfactor_lookup_26500        ... bench:       3,418.82 ns/iter (+/- 152.68)
test loadfactor_lookup_28500        ... bench:       3,454.76 ns/iter (+/- 84.24)
test loadfactor_lookup_fail_14500   ... bench:       2,844.63 ns/iter (+/- 85.92)
test loadfactor_lookup_fail_16500   ... bench:       2,940.08 ns/iter (+/- 79.74)
test loadfactor_lookup_fail_18500   ... bench:       3,105.13 ns/iter (+/- 63.65)
test loadfactor_lookup_fail_20500   ... bench:       3,415.21 ns/iter (+/- 99.32)
test loadfactor_lookup_fail_22500   ... bench:       3,914.20 ns/iter (+/- 92.81)
test loadfactor_lookup_fail_24500   ... bench:       5,109.40 ns/iter (+/- 48.22)
test loadfactor_lookup_fail_26500   ... bench:       7,326.55 ns/iter (+/- 179.53)
test loadfactor_lookup_fail_28500   ... bench:      11,827.75 ns/iter (+/- 238.75)

test lookup_fail_foldhash_highbits  ... bench:       5,726.04 ns/iter (+/- 472.72)
test lookup_fail_foldhash_random    ... bench:       5,521.27 ns/iter (+/- 580.26)
test lookup_fail_foldhash_serial    ... bench:       5,549.89 ns/iter (+/- 87.08)
test lookup_fail_std_highbits       ... bench:      18,990.59 ns/iter (+/- 181.14)
test lookup_fail_std_random         ... bench:      19,021.96 ns/iter (+/- 360.65)
test lookup_fail_std_serial         ... bench:      18,748.49 ns/iter (+/- 4,630.62)
test lookup_foldhash_highbits       ... bench:       6,421.81 ns/iter (+/- 119.19)
test lookup_foldhash_random         ... bench:       6,294.16 ns/iter (+/- 56.87)
test lookup_foldhash_serial         ... bench:       5,969.87 ns/iter (+/- 70.54)
test lookup_std_highbits            ... bench:      19,669.20 ns/iter (+/- 200.34)
test lookup_std_random              ... bench:      19,929.59 ns/iter (+/- 662.02)
test lookup_std_serial              ... bench:      19,517.45 ns/iter (+/- 449.41)
test rehash_in_place                ... bench:     433,843.33 ns/iter (+/- 29,279.29)

After:

test loadfactor_lookup_14500        ... bench:       3,047.94 ns/iter (+/- 364.92)
test loadfactor_lookup_16500        ... bench:       3,095.86 ns/iter (+/- 545.26)
test loadfactor_lookup_18500        ... bench:       3,070.70 ns/iter (+/- 478.62)
test loadfactor_lookup_20500        ... bench:       3,048.14 ns/iter (+/- 419.18)
test loadfactor_lookup_22500        ... bench:       3,084.05 ns/iter (+/- 305.21)
test loadfactor_lookup_24500        ... bench:       3,086.63 ns/iter (+/- 405.92)
test loadfactor_lookup_26500        ... bench:       3,061.43 ns/iter (+/- 478.68)
test loadfactor_lookup_28500        ... bench:       3,046.33 ns/iter (+/- 383.78)
test loadfactor_lookup_fail_14500   ... bench:       2,622.96 ns/iter (+/- 86.19)
test loadfactor_lookup_fail_16500   ... bench:       2,741.12 ns/iter (+/- 79.61)
test loadfactor_lookup_fail_18500   ... bench:       2,912.38 ns/iter (+/- 213.65)
test loadfactor_lookup_fail_20500   ... bench:       3,276.13 ns/iter (+/- 241.55)
test loadfactor_lookup_fail_22500   ... bench:       3,908.32 ns/iter (+/- 350.57)
test loadfactor_lookup_fail_24500   ... bench:       4,606.70 ns/iter (+/- 83.94)
test loadfactor_lookup_fail_26500   ... bench:       7,264.71 ns/iter (+/- 204.14)
test loadfactor_lookup_fail_28500   ... bench:      12,356.40 ns/iter (+/- 183.73)

test lookup_fail_foldhash_highbits  ... bench:       5,433.38 ns/iter (+/- 92.46)
test lookup_fail_foldhash_random    ... bench:       5,223.50 ns/iter (+/- 487.38)
test lookup_fail_foldhash_serial    ... bench:       5,126.33 ns/iter (+/- 40.59)
test lookup_fail_std_highbits       ... bench:      18,683.85 ns/iter (+/- 344.21)
test lookup_fail_std_random         ... bench:      18,956.15 ns/iter (+/- 182.67)
test lookup_fail_std_serial         ... bench:      18,673.58 ns/iter (+/- 350.75)
test lookup_foldhash_highbits       ... bench:       5,660.51 ns/iter (+/- 609.27)
test lookup_foldhash_random         ... bench:       5,607.51 ns/iter (+/- 105.56)
test lookup_foldhash_serial         ... bench:       5,173.60 ns/iter (+/- 78.03)
test lookup_std_highbits            ... bench:      19,263.92 ns/iter (+/- 348.80)
test lookup_std_random              ... bench:      19,260.00 ns/iter (+/- 383.08)
test lookup_std_serial              ... bench:      18,953.56 ns/iter (+/- 225.67)
test rehash_in_place                ... bench:     359,362.92 ns/iter (+/- 29,944.62)

Note: removing the is-empty check altogether is still worth considering, but this is the next best thing.

@gaujay gaujay force-pushed the unbloat-get_inner-perf branch 2 times, most recently from 3cc53ed to 3c1f781 Compare August 20, 2025 20:42
@Amanieu
Copy link
Member

Amanieu commented Aug 20, 2025

I don't think we can reasonably use likely here since it can be quite common for some applications (such as rustc) to frequently perform lookups on empty hash tables. Everything else looks fine.

@gaujay gaujay force-pushed the unbloat-get_inner-perf branch from 3c1f781 to f1c039c Compare August 21, 2025 18:52
@gaujay
Copy link
Contributor Author

gaujay commented Aug 21, 2025

Removed likely as suggested.
Other maps (e.g. Abseil, Boost, Folly) don't seem to use this condition and actually implement (like Hashbrown elsewhere) a special "dummy" static control group to explicitly avoid it in their lookup hot path.

@Amanieu
Copy link
Member

Amanieu commented Aug 22, 2025

We do have the dummy static control group, but it doesn't help if you have to hash a whole string before reaching it. Thankfully this should be a fairly well-predicted branch in practice.

@Amanieu Amanieu added this pull request to the merge queue Aug 22, 2025
Merged via the queue into rust-lang:master with commit fe72a08 Aug 22, 2025
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants