Skip to content

Conversation

miscco
Copy link
Contributor

@miscco miscco commented Apr 26, 2020

This addresses #625

I went ahead and added specialization for the following classes of algorithms:

  • copy/move
  • fill
  • equal

What is still open are find / count, which all need bitops support

Also lexicographical_compare was not really something I would want to touch.

That said I only added them when if constexpr is available. Life is too short for tag dispatch

@miscco miscco requested a review from a team as a code owner April 26, 2020 20:30
@miscco miscco force-pushed the vector_bool_algorithms branch 4 times, most recently from f3a1123 to 74b38e4 Compare April 27, 2020 19:53
@StephanTLavavej StephanTLavavej added the performance Must go faster label Apr 27, 2020
@miscco
Copy link
Contributor Author

miscco commented Apr 28, 2020

So I was wondering. This is a performance improvement. Is there any interest in collecting benchmarks in this repository, e.g. a separate benchmarks folder.

It feels kind of strange that there are none currently

stl/inc/vector Outdated
// copy [_First, _Last) to [_Dest, ...)
_Adl_verify_range(_First, _Last);

// Slow path as _First and _Dest are not aligned
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure copy for 'only aligned vectors' is worth the metaprogramming cost here. The interesting cases to optimize are:
vector -> vector
vector -> any other bool range
any other bool range -> vector

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not really sure whether it is worth it actually. This was more an exercise in bit fiddling for me as I never really did any bit manipulations so far.

I am not an expert but can we do unaligned memcopy? Is that even inside the limits of the language to copy an non byte-aligned range into another non-byte aligned range?

How would that work with strict aliasing. Would we need to cast to void* rather than char to get an unaligned pointer?

I guess one could try some tricks to expand / collaps the individual bits, but that would require SSE instructions and each direction would need its own special case.

That said, in my experience vector is rather less used and I would be curious wether there are actually use cases with different ranges

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that even inside the limits of the language to copy an non byte-aligned range into another non-byte aligned range?

As in 'through memcpy', no. As in 'better than extracting bit by bit', yes.

How would that work with strict aliasing

Access as an unsigned int, no need for reinterpret here.

I guess one could try some tricks to expand / collaps the individual bits, but that would require SSE instructions

I mean something like this pseudocode:

const unsigned int* data();
int main() {
const unsigned int* first = data();
size_t _Offset = 1234;
constexpr auto _Bits_per = (8*sizeof(unsigned int)); // 8 == char_bit
first += _Offset/_Bits_per;
_Offset = _Offset % _Bits_per;
unsigned int current = (first[0] >> _Offset) | first[1] << (_Bits_per - _Offset); // read 32 bools
}

If the target is another vector we can do a similar alignment fixup without unpacking current, if the target is a bool buffer that expansion would indeed be faster with AVX (since 32 bools * 8 bits == 256 bits, the AVX register size) but the same shift and mask thing can be done on a per size_t basis or similar. It won't be as fast as AVX but it will still be faster an unpacking and repacking on a bit by bit basis like before.

To be clear, I'm not saying you need to implement optimizations like that, but I am saying I believe the value of the optimization for its metaprogramming cost is questionable when the result is that copy(v.begin(), v.end(), v2.begin()) is 10x faster than copy(v.begin(), v.end(), v2.begin() + 1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in 'through memcpy', no. As in 'better than extracting bit by bit', yes.

Is it too early for some warhammer references?

_INLINE_VAR constexpr bool _Is_vb_iterator<_Vb_iterator<_Alloc>> = true;

template <class _InIt, class _OutIt>
_CONSTEXPR20 _OutIt _Copy_vbool(_InIt _First, _InIt _Last, _OutIt _Dest) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since _InIt and _OutIt are vector iterator do we need this to be a template?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes as we always need to consider allocators :(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could still be templated on allocator instead then to make it clearer that the types accepted must be vector iterators but I agree there's less value there :(

}

template <class _FwdIt, class _Ty>
_CONSTEXPR20 void _Fill_vbool(_FwdIt _First, _FwdIt _Last, const _Ty& _Val) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto does this need to be a template

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto because of template class Alloc> vb_iterator

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto :(

// compare [_First1, _Last1) to [_First2, ...)
_Adl_verify_range(_First1, _Last1);

// Slow path as _First1 and _First2 are not aligned
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto not sure this is worth the cost if unaligned does not work.

stl/inc/vector Outdated
template <class _InIt, class _OutIt>
_CONSTEXPR20 _OutIt _Copy_vbool(_InIt _First, _InIt _Last, _OutIt _Dest) {
// copy [_First, _Last) to [_Dest, ...)
_Adl_verify_range(_First, _Last);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to myself I took care to always put the specialization after the _Adl_verify_range in the caller so that I can repeat it here for fun

@miscco
Copy link
Contributor Author

miscco commented Apr 29, 2020

Thinking about this a bit more I believe there is a path forward for the copy family:

  1. If both _First and _Dest are char-aligned we can directly do a memmov and mask the remainder of whatever _Last._Myoff is.

  2. Otherwise we can define a variable _Carry that holds the misaligned bits between _Source and _Dest and then iterate over each _Vbase block shifting *_First appropriately and string the carry.

I will have a look at that in the evening

@miscco miscco force-pushed the vector_bool_algorithms branch 4 times, most recently from 78ebbe7 to 6d09e07 Compare May 4, 2020 13:20
@miscco miscco force-pushed the vector_bool_algorithms branch 2 times, most recently from 2b4b709 to 35dbfdc Compare May 6, 2020 13:44
@miscco
Copy link
Contributor Author

miscco commented May 6, 2020

Sorry for the noise, I am slowly working on this on a linux machine currently.

I have to massively expand testing so do not bother to review right now

@miscco miscco force-pushed the vector_bool_algorithms branch from 35dbfdc to d647b6c Compare May 6, 2020 14:36
@cbezault cbezault marked this pull request as draft May 6, 2020 20:36
@miscco miscco force-pushed the vector_bool_algorithms branch from d647b6c to cba418d Compare May 7, 2020 13:24
@miscco miscco force-pushed the vector_bool_algorithms branch 3 times, most recently from 9d051ae to 09b8fde Compare May 10, 2020 04:20
@miscco
Copy link
Contributor Author

miscco commented Jul 23, 2020

Dropping because of #879 and the other stuff I have based on that

@miscco miscco closed this Jul 23, 2020
@miscco miscco deleted the vector_bool_algorithms branch July 23, 2020 05:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants