Benchmark: use not_highly_aligned_allocator
in more places
#5443
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Resolves #5035
To avoid trying both aligned and unaligned allocators, try just unaligned. This makes sure we're checking the worst case, where vectorization would be of less benefit.
Only change the container for potentially-vectorized algorithm benchmark. For, like, random it does not make sense. Also
vector<bool>
is vectorized, if we consider GPR-based vectorization as still vectorization, but this will not be sensitive to the alignment.Some vector algorithms that are potentially sensitive to the alignment are in fact sensitive to alignment, some are not or almost not. The ones that are sensitive are simplest searches, or data movement. Consider
adjacent_find
as a good example of the sensitive one, here's how the results became worse:adjacent_find
does two AVX loads of the same input data at a time, with aligned allocator just one of the loads is unaligned, with unaligned allocator both of them are unaligned. So it stresses the processor ability to deal with unaligned data.Skipped also
bitset
benchmarks. They are harder to unalign, since they use some stack containers with deduced types. I'm confident thatbitset
from/to string conversion are examples of algorithm that are not very sensitive to the alignment.Some of recently added benchmark are not changed in this PR, because they already use not highly aligned allocator.
Also I expected
replace_copy
as one of the sensitive, but auto-vectorization is broken at all in latest Preview 🐛.Created DevCom-10895463
Drive-by: proper optimization barriers in replace family benchmarks.