`<bit>`: Use popcount for `has_single_bit()` if it's always available #5534

pps83 · 2025-05-23T09:29:05Z

provides 1.05 to 1.1 times speed up on x64 targets.

x64 results

Benchmark                               Time 
---------------------------------------------
bm_has_single_bit_if<uint8_t>        13.4 ns 
bm_has_single_bit_if_new<uint8_t>    12.7 ns 
bm_has_single_bit_if<uint16_t>       13.9 ns 
bm_has_single_bit_if_new<uint16_t>   10.8 ns 
bm_has_single_bit_if<uint32_t>       15.9 ns 
bm_has_single_bit_if_new<uint32_t>   12.6 ns 
bm_has_single_bit_if<uint64_t>       14.3 ns 
bm_has_single_bit_if_new<uint64_t>   11.0 ns 
bm_has_single_bit<uint8_t>           6.34 ns 
bm_has_single_bit_new<uint8_t>       5.88 ns 
bm_has_single_bit<uint16_t>          6.19 ns 
bm_has_single_bit_new<uint16_t>      4.52 ns 
bm_has_single_bit<uint32_t>          5.57 ns 
bm_has_single_bit_new<uint32_t>      5.99 ns 
bm_has_single_bit<uint64_t>          6.07 ns 
bm_has_single_bit_new<uint64_t>      5.41 ns

fixes #5359

pps83 · 2025-05-23T09:37:46Z

@AlexGuteniev, do I need to update benchmarks, or simply put benchmark numbers as part of the commit message?

AlexGuteniev · 2025-05-23T10:34:41Z

@AlexGuteniev, do I need to update benchmarks, or simply put benchmark numbers as part of the commit message?

I think the current benchmarks are fine, just post before/after number

AlexGuteniev · 2025-05-23T10:38:46Z

Ideally, we'd need ARM64 numbers as well. But I think we can do without them.

stl/inc/bit

AlexGuteniev · 2025-05-23T10:53:05Z

I think the current benchmarks are fine, just post before/after number

You also need to pass somehow /arch:AVX or greater to make it different, perhaps by temporarily alterint CmakeFIle.txt locally, or by any other working method.

pps83 · 2025-05-23T10:55:16Z

You also need to pass somehow /arch:AVX or greater to make it different, perhaps by temporarily alterint CmakeFIle.txt locally, or by any other working method.

shouldn't unit tests automatically run with/without these switches?

AlexGuteniev · 2025-05-23T10:57:19Z

shouldn't unit tests automatically run with/without these switches?

Unit tests should.
I mean benchmarks, which currently don't

AlexGuteniev · 2025-05-23T11:09:25Z

Also please edit the PR description to link it with the issue using one of Github keywords.

pps83 · 2025-05-23T11:27:16Z

I'm getting conflicting results for the test: popcount on my pc is somehow slower by around 30%
it's a tiger-lake-u core i7-11390H, shouldn't have slow popcount :)

pps83 · 2025-05-23T11:35:28Z

I'm getting conflicting results for the test: popcount on my pc is somehow slower by around 30% it's a tiger-lake-u core i7-11390H, shouldn't have slow popcount :)

no idea how is that it doesn't get inlined :)

pps83 · 2025-05-23T11:39:17Z

wow, accidentially, i had /Only __inline (/Ob1) option. Apparently, this destroys most of std code that doesn't have inlines. Imo, all these functions should be marked inline.

pps83 · 2025-05-23T11:42:38Z

popcount is on the right. So, it does improve, but will be 30% slower for projects using /Ob1

AlexGuteniev · 2025-05-23T11:42:43Z

Imo, all these functions should be marked inline.

No; I asked a while ago, see #619

AlexGuteniev · 2025-05-23T11:55:38Z

provides 5-10% speed up

✋ we don't do that here.

See https://github.com/microsoft/STL/wiki/Benchmarking-the-STL#calculating-speedups

So you probably mean 1.05 to 1.1 times speed up

pps83 · 2025-05-23T12:01:17Z

So you probably mean 1.05 to 1.1 times speed up

updated. I based it on my previous commit message: #5367

(actual git history doesn't have that anyways)

AlexGuteniev · 2025-05-23T12:31:22Z

Mixed results for _if version, but consistent improvement of the branchless one

i5-1235U (Alder Lake)

P Cores

Benchmark	Before	After	Speedup
bm_has_single_bit_if<uint8_t>	9.78 ns	9.90 ns	0.99
bm_has_single_bit_if<uint16_t>	12.1 ns	10.1 ns	1.20
bm_has_single_bit_if<uint32_t>	11.4 ns	12.3 ns	0.93
bm_has_single_bit_if<uint64_t>	9.66 ns	12.7 ns	0.76
bm_has_single_bit<uint8_t>	4.44 ns	5.56 ns	0.80
bm_has_single_bit<uint16_t>	4.66 ns	3.59 ns	1.30
bm_has_single_bit<uint32_t>	4.66 ns	3.59 ns	1.30
bm_has_single_bit<uint64_t>	4.90 ns	3.47 ns	1.41

E Cores

Benchmark	Before	After	Speedup
bm_has_single_bit_if<uint8_t>	14.7 ns	14.0 ns	1.05
bm_has_single_bit_if<uint16_t>	16.6 ns	14.7 ns	1.13
bm_has_single_bit_if<uint32_t>	14.7 ns	14.3 ns	1.03
bm_has_single_bit_if<uint64_t>	14.7 ns	14.1 ns	1.04
bm_has_single_bit<uint8_t>	9.09 ns	8.84 ns	1.03
bm_has_single_bit<uint16_t>	9.09 ns	7.24 ns	1.26
bm_has_single_bit<uint32_t>	9.07 ns	7.25 ns	1.25
bm_has_single_bit<uint64_t>	9.08 ns	7.27 ns	1.25

…NT_INTRINSICS` defined.

stl/inc/bit

StephanTLavavej · 2025-05-28T16:38:30Z

Thanks! 😻 Results on my 5950X, comparing main to this PR with set _CL_=/arch:AVX2 injected non-intrusively:

Benchmark	Before	After	Speedup
`bm_has_single_bit_if<uint8_t>`	12.9 ns	11.2 ns	1.15
`bm_has_single_bit_if<uint16_t>`	12.9 ns	11.6 ns	1.11
`bm_has_single_bit_if<uint32_t>`	13.1 ns	12.9 ns	1.02
`bm_has_single_bit_if<uint64_t>`	13.3 ns	13.1 ns	1.02
`bm_has_single_bit<uint8_t>`	5.10 ns	5.30 ns	0.96
`bm_has_single_bit<uint16_t>`	5.09 ns	3.62 ns	1.41
`bm_has_single_bit<uint32_t>`	5.10 ns	3.62 ns	1.41
`bm_has_single_bit<uint64_t>`	5.13 ns	3.61 ns	1.42

I'm not worried about the 0.96, that looks like noise (I reran and it was slightly better), and the 1.41 speedups are very healthy indeed.

StephanTLavavej · 2025-05-28T17:49:59Z

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

pps83 · 2025-05-28T19:45:33Z

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

no changes planned.

StephanTLavavej · 2025-05-28T22:41:26Z

0️⃣ 1️⃣ 0️⃣

<bit>: Use popcount for has_single_bit() if it's always available

53ce129

pps83 requested a review from a team as a code owner May 23, 2025 09:29

github-project-automation bot added this to STL Code Reviews May 23, 2025

github-project-automation bot moved this to Initial Review in STL Code Reviews May 23, 2025

pps83 mentioned this pull request May 23, 2025

<bit>: Could has_single_bit() be faster? #5359

Closed

AlexGuteniev reviewed May 23, 2025

View reviewed changes

stl/inc/bit Outdated Show resolved Hide resolved

comments

dda8819

AlexGuteniev approved these changes May 23, 2025

View reviewed changes

StephanTLavavej self-assigned this May 27, 2025

StephanTLavavej added the performance Must go faster label May 27, 2025

StephanTLavavej added 2 commits May 28, 2025 09:17

Always define _POPCNT_INTRINSICS_ALWAYS_AVAILABLE, leave `_HAS_POPC…

420c391

…NT_INTRINSICS` defined.

Avoid code duplication.

9f7ee26

StephanTLavavej reviewed May 28, 2025

View reviewed changes

stl/inc/bit Outdated Show resolved Hide resolved

stl/inc/bit Outdated Show resolved Hide resolved

StephanTLavavej approved these changes May 28, 2025

View reviewed changes

StephanTLavavej removed their assignment May 28, 2025

StephanTLavavej moved this from Initial Review to Ready To Merge in STL Code Reviews May 28, 2025

StephanTLavavej moved this from Ready To Merge to Merging in STL Code Reviews May 28, 2025

StephanTLavavej merged commit 9ef878d into microsoft:main May 28, 2025
48 checks passed

github-project-automation bot moved this from Merging to Done in STL Code Reviews May 28, 2025

pps83 deleted the bit-has_single_bit-faster2 branch May 31, 2025 13:05

<bit>: Use popcount for has_single_bit() if it's always available #5534

<bit>: Use popcount for has_single_bit() if it's always available #5534

Uh oh!

Conversation

pps83 commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pps83 commented May 23, 2025

Uh oh!

AlexGuteniev commented May 23, 2025

Uh oh!

AlexGuteniev commented May 23, 2025

Uh oh!

Uh oh!

AlexGuteniev commented May 23, 2025

Uh oh!

pps83 commented May 23, 2025

Uh oh!

AlexGuteniev commented May 23, 2025

Uh oh!

AlexGuteniev commented May 23, 2025

Uh oh!

pps83 commented May 23, 2025

Uh oh!

pps83 commented May 23, 2025

Uh oh!

pps83 commented May 23, 2025

Uh oh!

pps83 commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlexGuteniev commented May 23, 2025

Uh oh!

AlexGuteniev commented May 23, 2025

Uh oh!

pps83 commented May 23, 2025

Uh oh!

AlexGuteniev commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

StephanTLavavej commented May 28, 2025

Uh oh!

StephanTLavavej commented May 28, 2025

Uh oh!

pps83 commented May 28, 2025

Uh oh!

Uh oh!

StephanTLavavej commented May 28, 2025

0️⃣ 1️⃣ 0️⃣

Uh oh!

Uh oh!

`<bit>`: Use popcount for `has_single_bit()` if it's always available #5534

`<bit>`: Use popcount for `has_single_bit()` if it's always available #5534

pps83 commented May 23, 2025 •

edited

Loading

pps83 commented May 23, 2025 •

edited

Loading

AlexGuteniev commented May 23, 2025 •

edited

Loading