Specialize some algorithms for vector<bool> #750

miscco · 2020-04-26T20:30:25Z

This addresses #625

I went ahead and added specialization for the following classes of algorithms:

copy/move
fill
equal

What is still open are find / count, which all need bitops support

Also lexicographical_compare was not really something I would want to touch.

That said I only added them when if constexpr is available. Life is too short for tag dispatch

stl/inc/vector

miscco · 2020-04-28T10:55:14Z

So I was wondering. This is a performance improvement. Is there any interest in collecting benchmarks in this repository, e.g. a separate benchmarks folder.

It feels kind of strange that there are none currently

BillyONeal · 2020-04-28T19:59:02Z

stl/inc/vector

+    // copy [_First, _Last) to [_Dest, ...)
+    _Adl_verify_range(_First, _Last);
+
+    // Slow path as _First and _Dest are not aligned


I'm not sure copy for 'only aligned vectors' is worth the metaprogramming cost here. The interesting cases to optimize are:
vector -> vector
vector -> any other bool range
any other bool range -> vector

I am not really sure whether it is worth it actually. This was more an exercise in bit fiddling for me as I never really did any bit manipulations so far.

I am not an expert but can we do unaligned memcopy? Is that even inside the limits of the language to copy an non byte-aligned range into another non-byte aligned range?

How would that work with strict aliasing. Would we need to cast to void* rather than char to get an unaligned pointer?

I guess one could try some tricks to expand / collaps the individual bits, but that would require SSE instructions and each direction would need its own special case.

That said, in my experience vector is rather less used and I would be curious wether there are actually use cases with different ranges

Is that even inside the limits of the language to copy an non byte-aligned range into another non-byte aligned range?

As in 'through memcpy', no. As in 'better than extracting bit by bit', yes.

How would that work with strict aliasing

Access as an unsigned int, no need for reinterpret here.

I guess one could try some tricks to expand / collaps the individual bits, but that would require SSE instructions

I mean something like this pseudocode:

const unsigned int* data(); int main() { const unsigned int* first = data(); size_t _Offset = 1234; constexpr auto _Bits_per = (8*sizeof(unsigned int)); // 8 == char_bit first += _Offset/_Bits_per; _Offset = _Offset % _Bits_per; unsigned int current = (first[0] >> _Offset) | first[1] << (_Bits_per - _Offset); // read 32 bools }

If the target is another vector we can do a similar alignment fixup without unpacking current, if the target is a bool buffer that expansion would indeed be faster with AVX (since 32 bools * 8 bits == 256 bits, the AVX register size) but the same shift and mask thing can be done on a per size_t basis or similar. It won't be as fast as AVX but it will still be faster an unpacking and repacking on a bit by bit basis like before.

To be clear, I'm not saying you need to implement optimizations like that, but I am saying I believe the value of the optimization for its metaprogramming cost is questionable when the result is that copy(v.begin(), v.end(), v2.begin()) is 10x faster than copy(v.begin(), v.end(), v2.begin() + 1)

As in 'through memcpy', no. As in 'better than extracting bit by bit', yes.

Is it too early for some warhammer references?

BillyONeal · 2020-04-28T19:59:24Z

stl/inc/vector

+_INLINE_VAR constexpr bool _Is_vb_iterator<_Vb_iterator<_Alloc>> = true;
+
+template <class _InIt, class _OutIt>
+_CONSTEXPR20 _OutIt _Copy_vbool(_InIt _First, _InIt _Last, _OutIt _Dest) {


Since _InIt and _OutIt are vector iterator do we need this to be a template?

Yes as we always need to consider allocators :(

It could still be templated on allocator instead then to make it clearer that the types accepted must be vector iterators but I agree there's less value there :(

BillyONeal · 2020-04-28T19:59:51Z

stl/inc/vector

+}
+
+template <class _FwdIt, class _Ty>
+_CONSTEXPR20 void _Fill_vbool(_FwdIt _First, _FwdIt _Last, const _Ty& _Val) {


Ditto does this need to be a template

ditto because of template class Alloc> vb_iterator

BillyONeal · 2020-04-28T20:00:10Z

stl/inc/vector

+    // compare [_First1, _Last1) to [_First2, ...)
+    _Adl_verify_range(_First1, _Last1);
+
+    // Slow path as _First1 and _First2 are not aligned


Ditto not sure this is worth the cost if unaligned does not work.

miscco · 2020-04-28T21:00:46Z

stl/inc/vector

+template <class _InIt, class _OutIt>
+_CONSTEXPR20 _OutIt _Copy_vbool(_InIt _First, _InIt _Last, _OutIt _Dest) {
+    // copy [_First, _Last) to [_Dest, ...)
+    _Adl_verify_range(_First, _Last);


Note to myself I took care to always put the specialization after the _Adl_verify_range in the caller so that I can repeat it here for fun

miscco · 2020-04-29T12:02:06Z

Thinking about this a bit more I believe there is a path forward for the copy family:

If both _First and _Dest are char-aligned we can directly do a memmov and mask the remainder of whatever _Last._Myoff is.
Otherwise we can define a variable _Carry that holds the misaligned bits between _Source and _Dest and then iterate over each _Vbase block shifting *_First appropriately and string the carry.

I will have a look at that in the evening

miscco · 2020-05-06T13:45:49Z

Sorry for the noise, I am slowly working on this on a linux machine currently.

I have to massively expand testing so do not bother to review right now

stl/inc/vector

miscco · 2020-07-23T05:02:58Z

Dropping because of #879 and the other stuff I have based on that

miscco requested a review from a team as a code owner April 26, 2020 20:30

miscco commented Apr 26, 2020

View reviewed changes

stl/inc/vector Show resolved Hide resolved

miscco force-pushed the vector_bool_algorithms branch 4 times, most recently from f3a1123 to 74b38e4 Compare April 27, 2020 19:53

StephanTLavavej added the performance Must go faster label Apr 27, 2020

BillyONeal reviewed Apr 28, 2020

View reviewed changes

miscco commented Apr 28, 2020

View reviewed changes

miscco force-pushed the vector_bool_algorithms branch 4 times, most recently from 78ebbe7 to 6d09e07 Compare May 4, 2020 13:20

miscco added 2 commits May 4, 2020 16:18

[vector] Add specialization of copy

3897a0c

[vector] Add specialization of copy_n

d182d77

miscco force-pushed the vector_bool_algorithms branch 2 times, most recently from 2b4b709 to 35dbfdc Compare May 6, 2020 13:44

miscco force-pushed the vector_bool_algorithms branch from 35dbfdc to d647b6c Compare May 6, 2020 14:36

cbezault marked this pull request as draft May 6, 2020 20:36

cpplearner reviewed May 7, 2020

View reviewed changes

stl/inc/vector Outdated Show resolved Hide resolved

miscco added 6 commits May 7, 2020 08:21

[vector] Add specialization of copy_backward

a71fe8c

[vector] Add specialization of move

c71184c

[vector] Add specialization of move_backward

00ed42f

[vector] Add specialization of fill

557d6b7

[vector] Add specialization of fill_n

a3cf927

[vector] Add specialization of equal

818afc8

miscco force-pushed the vector_bool_algorithms branch from d647b6c to cba418d Compare May 7, 2020 13:24

miscco force-pushed the vector_bool_algorithms branch 3 times, most recently from 9d051ae to 09b8fde Compare May 10, 2020 04:20

miscco added 3 commits May 10, 2020 13:39

[vector] Add specialization of find

e6c1b32

[vector] Add specialization of count

608b47a

[vector] Expand _Copy_vbool to also handle non aligned iterators

6c86f50

miscco force-pushed the vector_bool_algorithms branch from 09b8fde to 6c86f50 Compare May 10, 2020 12:16

miscco mentioned this pull request Jun 5, 2020

Specialize fill and fill_n for vector<bool> #879

Merged

miscco closed this Jul 23, 2020

miscco deleted the vector_bool_algorithms branch July 23, 2020 05:03

Specialize some algorithms for vector<bool> #750

Specialize some algorithms for vector<bool> #750

Conversation

miscco commented Apr 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

miscco commented Apr 28, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

miscco commented Apr 29, 2020

Uh oh!

miscco commented May 6, 2020

Uh oh!

Uh oh!

miscco commented Jul 23, 2020

Uh oh!

Uh oh!

miscco commented Apr 26, 2020 •

edited

Loading