Skip to content

Conversation

AlexGuteniev
Copy link
Contributor

@AlexGuteniev AlexGuteniev commented Apr 26, 2025

Resolves #5035

To avoid trying both aligned and unaligned allocators, try just unaligned. This makes sure we're checking the worst case, where vectorization would be of less benefit.

Only change the container for potentially-vectorized algorithm benchmark. For, like, random it does not make sense. Also vector<bool> is vectorized, if we consider GPR-based vectorization as still vectorization, but this will not be sensitive to the alignment.

Some vector algorithms that are potentially sensitive to the alignment are in fact sensitive to alignment, some are not or almost not. The ones that are sensitive are simplest searches, or data movement. Consider adjacent_find as a good example of the sensitive one, here's how the results became worse:

Benchmark Before After
bm<AlgType::Std, int8_t>/2525/1142 19.9 ns 22.1 ns
bm<AlgType::Std, int16_t>/2525/1142 33.2 ns 51.5 ns
bm<AlgType::Std, int32_t>/2525/1142 75.5 ns 89.1 ns
bm<AlgType::Std, int64_t>/2525/1142 139 ns 163 ns
bm<AlgType::Rng, int8_t>/2525/1142 16.7 ns 20.1 ns
bm<AlgType::Rng, int16_t>/2525/1142 33.0 ns 50.5 ns
bm<AlgType::Rng, int32_t>/2525/1142 76.9 ns 89.0 ns
bm<AlgType::Rng, int64_t>/2525/1142 141 ns 163 ns

adjacent_find does two AVX loads of the same input data at a time, with aligned allocator just one of the loads is unaligned, with unaligned allocator both of them are unaligned. So it stresses the processor ability to deal with unaligned data.

Skipped also bitset benchmarks. They are harder to unalign, since they use some stack containers with deduced types. I'm confident that bitset from/to string conversion are examples of algorithm that are not very sensitive to the alignment.

Some of recently added benchmark are not changed in this PR, because they already use not highly aligned allocator.

Also I expected replace_copy as one of the sensitive, but auto-vectorization is broken at all in latest Preview 🐛.
Created DevCom-10895463

Drive-by: proper optimization barriers in replace family benchmarks.

@AlexGuteniev AlexGuteniev requested a review from a team as a code owner April 26, 2025 17:54
@github-project-automation github-project-automation bot moved this to Initial Review in STL Code Reviews Apr 26, 2025
@AlexGuteniev AlexGuteniev changed the title Benchmark: use not_higghly_aligned allocator in more places Benchmark: use not_highly_aligned allocator in more places Apr 26, 2025
@AlexGuteniev AlexGuteniev changed the title Benchmark: use not_highly_aligned allocator in more places Benchmark: use not_highly_aligned_allocator in more places Apr 26, 2025
@StephanTLavavej StephanTLavavej added the test Related to test code label Apr 27, 2025
@StephanTLavavej StephanTLavavej self-assigned this Apr 27, 2025
@StephanTLavavej StephanTLavavej removed their assignment Apr 28, 2025
@StephanTLavavej StephanTLavavej moved this from Initial Review to Ready To Merge in STL Code Reviews Apr 28, 2025
@StephanTLavavej StephanTLavavej moved this from Ready To Merge to Merging in STL Code Reviews May 9, 2025
@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

StephanTLavavej added a commit to StephanTLavavej/STL that referenced this pull request May 9, 2025
@StephanTLavavej StephanTLavavej merged commit f11402a into microsoft:main May 10, 2025
39 checks passed
@github-project-automation github-project-automation bot moved this from Merging to Done in STL Code Reviews May 10, 2025
@StephanTLavavej
Copy link
Member

Thanks for improving the consistency of our benchmarks! 📈 📉 📊

@AlexGuteniev AlexGuteniev deleted the unalign branch May 10, 2025 10:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Related to test code
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Provide an intentional (mis)alignment that corresponds to typical usage in benchmarks for plain arrays
2 participants