Skip to content

Conversation

AlexGuteniev
Copy link
Contributor

πŸ“œ The optimization

There are two implementations of search_n β€” in std and in std::ranges. For bidirectional iterators, both implementations take advantage of the contiguous range to search for. They jump forward by the value of n and try to match from the end. This allows skipping some comparisons. When there are more mismatches than matches, it ends up in fast pass over the range and few comparisons.

This means than for large values of n and non-pathological input, the algorithm is not even likely to benefit from vectorization.

For small values of n, however, the algorithm performs worse.

The worst case is n=1, where the algortihm is just find with extra steps. The PR forwards this case directly to find, where it may pick the vectorization or memchr, and even if it doesn't, it would still stop looking into doing extra steps.

βš–οΈ Predicate check

Unlike many other algorithms, such as find, the search_n algorithm takes both value and predicate. We want to forward to predicate-less find, as we're trying to engage vectorization, so we can do this when seeing the default equal_to predicate. Binding the value and the predicate into a bigger predicate and passing that to find_if would work for more cases, but would not be (manually) vectorized.

Since the value type and iterator type are unrelated, the comparison is potentially heterogenous, so it is hard to verify if non-void specialization of std::equal_to<T> does the same as default comparison, or not. We'll skip that, and check just for std::equal_to<void> and ranges::equal_to.

βœ… Test coverage

There's no attempt of comprehensive coverage of std::search_n πŸ™€. Just some ad-hoc tests, mostly negative. Creating one seems out of scope for this PR. The n=1 case seems to be covered indirectly via P0024R2_parallel_algorithms_search_n test, along with many other cases.

For ranges::search_n there's a pre-existing test that does at least some minimum coverage, expanded that with n=1 case.

⏱️Benchmark results

Benchmark Before After
bm<uint8_t, AlgType::Std>/3000 525 ns 17.5 ns
bm<uint8_t, AlgType::Rng>/3000 995 ns 17.5 ns
bm<uint16_t, AlgType::Std>/3000 587 ns 40.0 ns
bm<uint16_t, AlgType::Rng>/3000 1506 ns 38.8 ns
bm<uint32_t, AlgType::Std>/3000 582 ns 67.8 ns
bm<uint32_t, AlgType::Rng>/3000 1500 ns 68.5 ns
bm<uint64_t, AlgType::Std>/3000 571 ns 146 ns
bm<uint64_t, AlgType::Rng>/3000 1466 ns 147 ns

@AlexGuteniev AlexGuteniev requested a review from a team as a code owner March 18, 2025 19:18
@github-project-automation github-project-automation bot moved this to Initial Review in STL Code Reviews Mar 18, 2025
@StephanTLavavej StephanTLavavej added the performance Must go faster label Mar 18, 2025
@StephanTLavavej StephanTLavavej self-assigned this Mar 18, 2025
@StephanTLavavej
Copy link
Member

Thanks for the detailed PR description and significant optimization! πŸ’š I pushed very minor nitpicks to the benchmark.

@StephanTLavavej StephanTLavavej removed their assignment Mar 20, 2025
@StephanTLavavej StephanTLavavej moved this from Initial Review to Ready To Merge in STL Code Reviews Mar 20, 2025
@StephanTLavavej StephanTLavavej moved this from Ready To Merge to Merging in STL Code Reviews Mar 21, 2025
@StephanTLavavej StephanTLavavej self-assigned this Mar 21, 2025
@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

StephanTLavavej added a commit to StephanTLavavej/STL that referenced this pull request Mar 21, 2025
@StephanTLavavej StephanTLavavej merged commit 0a0514c into microsoft:main Mar 24, 2025
39 checks passed
@github-project-automation github-project-automation bot moved this from Merging to Done in STL Code Reviews Mar 24, 2025
@StephanTLavavej
Copy link
Member

Thanks for finding this optimization opportunity! 😹 πŸ•΅οΈ πŸŽ‰

@AlexGuteniev AlexGuteniev deleted the n-equals-one branch March 25, 2025 05:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants