-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Use string::resize_and_overwrite
in bitset
to avoid buffer initialization
#3904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No change requested: I observed that we could ignore
_Len
and take advantage of our knowledge that it will always be the compile-time constant_Bits
, which is available without capturing. I checked the codegen, and this indeed allows the compiler to bake in the constant. However, while benchmarking showed some improvement forBM_bitset_to_string<64, char>
, it showed significant pessimization forBM_bitset_to_string<15, char>
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think despite some pessimization of the certain case, it is still the right thing to do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, actually the pessimization is too much. I think, angry loop unrolling strikes here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep it as a parameter for now then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As
_Bits
is the natural decision but we have to use_Len
for performance reasons, I think we need to add a comment for it.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, how to compare benchmark results before and after change in the same run in a convenient way? I'm doing this in a very awkward way.
Benchmark
(And I'm feeling nervious that the effect is very complex and can even vary between different runs)
Results
Another run
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there problems with my comparing benchmark? I'm finding
resize_and_overwrite
&_Len
approach stably slow for largewchar_t
cases in my testing.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be CPU-dependent.
For
_Bits
the compiler is more likely to advantage of knowing_Bits
and unroll loops or vectorize something.For
_Len
the compiler may be restricted from doing that.Whether the optimizations like loop unrolling are useful or harmful -- dependent on CPU type.
Also non-unrolled loop (
_Len
) is subject to JCC errata pessimization on some CPUs,/QIntel-jcc-erratum
might help in this case https://learn.microsoft.com/ru-ru/cpp/build/reference/qintel-jcc-erratum?view=msvc-170I'm neutral with regards to
_Bits
vs_Len
if none of them are absolutely advantage.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to test for
_Bits
vs_Len
in more environments to do the decision. I tend to believe that by using_Bits
we can get general improvements over previous impl, though not significant in some cases likesmall bitset.to_string<char>
; by using_Len
we risk introducing some regressions.Here is some results after turning on that switch:
.......I find my benchmark result highly unstable between separate runs...
Result
Another run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest don't think too much on
_Bits
vs_Len
. The next step would be vectorization, it would be obviously faster than this, and obviously will use_Bits
.