Skip to content

Conversation

StephanTLavavej
Copy link
Member

@StephanTLavavej StephanTLavavej commented Mar 27, 2025

Fixes #4496.

  • google-benchmark 1.9.2.
  • Remove HAVE_GNU_POSIX_REGEX and HAVE_POSIX_REGEX workarounds.
  • Move/rename inconsistent vector<bool> benchmarks.
  • Stop using random_device, part 1.
    • Benchmark results should be deterministic, so seeding from random_device is counterproductive. (This is different from randomized correctness testing, where we do want to cover the entire space, so random_device plus logging the seed data is desirable.)
  • Stop using random_device, part 2: Remove xoshiro.
    • It's unclear why we were using xoshiro in the first place. (I believe it was the original author's personal preference; there appears to be no discussion of why it was needed in the original PR.) This was our only usage, so we can drop the entry in NOTICE.txt.
  • Add missing files to benchmark_headers.
    • We forgot to keep this updated.
  • Remove STL_BENCHMARK_ITERATOR_DEBUG_LEVEL.
    • We've never been interested in benchmarking non-default IDL settings (or debug).
  • Remove unnecessary static.
    • Each benchmark is a separate self-contained source file, so this served no purpose, and our newer benchmarks didn't do this.
  • Remove unnecessary unnamed namespaces.
    • Ditto.
  • Build the STL and benchmarks with /nologo.
    • By default, this makes no difference. But when investigating what the build system is doing with -DCMAKE_VERBOSE_MAKEFILE=ON, this suppresses the verbose:
      Microsoft (R) C/C++ Optimizing Compiler Version 19.44.34918.1 for x64
      Copyright (C) Microsoft Corporation.  All rights reserved.
      
      and extremely verbose:
      /std:c++latest is provided as a preview of language features from the latest C++
      working draft, and we're eager to hear about bugs and suggestions for improvements.
      However, note that these features are provided as-is without support, and subject
      to changes or removal as the working draft evolves. See
      https://go.microsoft.com/fwlink/?linkid=2045807 for details.
      
  • Fix benchmark: it is built with /Ob1, so vector algorithm dispatcher is noticeable #4496 and remove product code workaround.
  • Link with /DEBUG to generate PDBs.

Compiler command line comparison:

RelWithDebInfo: C:\PROGRA~1\MIB055~1\2022\Preview\VC\Tools\MSVC\1444~1.349\bin\Hostx64\x64\cl.exe -DBENCHMARK_STATIC_DEFINE -ID:\GitHub\STL\out\x64\out\inc -ID:\GitHub\STL\benchmarks\inc -ID:\GitHub\STL\benchmarks\google-benchmark\include /DWIN32 /D_WINDOWS /EHsc /O2 /Ob1 /DNDEBUG -std:c++latest -MT -Zi /nologo /diagnostics:caret /W4 /WX /w14265 /w15038 /w15262 /utf-8 /Zc:preprocessor D:\GitHub\STL\benchmarks\src\minmax_element.cpp -nologo -TP -showIncludes -scanDependencies CMakeFiles\benchmark-minmax_element.dir\src\minmax_element.cpp.obj.ddi -FoCMakeFiles\benchmark-minmax_element.dir\src\minmax_element.cpp.obj
Release:        C:\PROGRA~1\MIB055~1\2022\Preview\VC\Tools\MSVC\1444~1.349\bin\Hostx64\x64\cl.exe -DBENCHMARK_STATIC_DEFINE -ID:\GitHub\STL\out\x64\out\inc -ID:\GitHub\STL\benchmarks\inc -ID:\GitHub\STL\benchmarks\google-benchmark\include /DWIN32 /D_WINDOWS /EHsc /O2 /Ob2 /DNDEBUG -std:c++latest -MT /nologo /diagnostics:caret /W4 /WX /w14265 /w15038 /w15262 /utf-8 /Zc:preprocessor D:\GitHub\STL\benchmarks\src\minmax_element.cpp -nologo -TP -showIncludes -scanDependencies CMakeFiles\benchmark-minmax_element.dir\src\minmax_element.cpp.obj.ddi -FoCMakeFiles\benchmark-minmax_element.dir\src\minmax_element.cpp.obj
Release /Zi:    C:\PROGRA~1\MIB055~1\2022\Preview\VC\Tools\MSVC\1444~1.349\bin\Hostx64\x64\cl.exe -DBENCHMARK_STATIC_DEFINE -ID:\GitHub\STL\out\x64\out\inc -ID:\GitHub\STL\benchmarks\inc -ID:\GitHub\STL\benchmarks\google-benchmark\include /DWIN32 /D_WINDOWS /EHsc /O2 /Ob2 /DNDEBUG -std:c++latest -MT /Zi /nologo /diagnostics:caret /W4 /WX /w14265 /w15038 /w15262 /utf-8 /Zc:preprocessor D:\GitHub\STL\benchmarks\src\minmax_element.cpp -nologo -TP -showIncludes -scanDependencies CMakeFiles\benchmark-minmax_element.dir\src\minmax_element.cpp.obj.ddi -FoCMakeFiles\benchmark-minmax_element.dir\src\minmax_element.cpp.obj

Comparing RelWithDebInfo (before) to Release /Zi (after) for the mismatch() benchmark with the removed product code workaround:

Benchmark Before After Speedup
bm<uint8_t, op::mismatch>/8/3 3.42 ns 3.05 ns 1.12
bm<uint8_t, op::mismatch>/24/22 3.40 ns 3.08 ns 1.10
bm<uint8_t, op::mismatch>/105/-1 4.49 ns 4.29 ns 1.05
bm<uint8_t, op::mismatch>/4021/3056 67.5 ns 58.2 ns 1.16
bm<uint16_t, op::mismatch>/8/3 3.63 ns 3.02 ns 1.20
bm<uint16_t, op::mismatch>/24/22 3.83 ns 3.20 ns 1.20
bm<uint16_t, op::mismatch>/105/-1 5.34 ns 4.95 ns 1.08
bm<uint16_t, op::mismatch>/4021/3056 108 ns 105 ns 1.03
bm<uint32_t, op::mismatch>/8/3 3.40 ns 3.18 ns 1.07
bm<uint32_t, op::mismatch>/24/22 4.00 ns 3.83 ns 1.04
bm<uint32_t, op::mismatch>/105/-1 8.34 ns 8.54 ns 0.98
bm<uint32_t, op::mismatch>/4021/3056 202 ns 205 ns 0.99
bm<uint64_t, op::mismatch>/8/3 3.62 ns 3.18 ns 1.14
bm<uint64_t, op::mismatch>/24/22 4.92 ns 4.68 ns 1.05
bm<uint64_t, op::mismatch>/105/-1 14.7 ns 14.6 ns 1.01
bm<uint64_t, op::mismatch>/4021/3056 398 ns 398 ns 1.00
bm<uint8_t, op::lexi>/8/3 4.03 ns 3.40 ns 1.19
bm<uint8_t, op::lexi>/24/22 4.03 ns 3.39 ns 1.19
bm<uint8_t, op::lexi>/105/-1 4.91 ns 4.46 ns 1.10
bm<uint8_t, op::lexi>/4021/3056 67.8 ns 59.1 ns 1.15
bm<int8_t, op::lexi>/8/3 3.61 ns 3.59 ns 1.01
bm<int8_t, op::lexi>/24/22 3.61 ns 3.47 ns 1.04
bm<int8_t, op::lexi>/105/-1 4.68 ns 4.38 ns 1.07
bm<int8_t, op::lexi>/4021/3056 67.4 ns 58.9 ns 1.14
bm<uint16_t, op::lexi>/8/3 4.03 ns 3.55 ns 1.14
bm<uint16_t, op::lexi>/24/22 4.04 ns 3.66 ns 1.10
bm<uint16_t, op::lexi>/105/-1 5.13 ns 4.91 ns 1.04
bm<uint16_t, op::lexi>/4021/3056 106 ns 107 ns 0.99
bm<uint32_t, op::lexi>/8/3 3.81 ns 3.67 ns 1.04
bm<uint32_t, op::lexi>/24/22 4.30 ns 4.52 ns 0.95
bm<uint32_t, op::lexi>/105/-1 8.21 ns 8.93 ns 0.92
bm<uint32_t, op::lexi>/4021/3056 203 ns 206 ns 0.99
bm<uint64_t, op::lexi>/8/3 4.24 ns 3.85 ns 1.10
bm<uint64_t, op::lexi>/24/22 5.55 ns 5.09 ns 1.09
bm<uint64_t, op::lexi>/105/-1 14.6 ns 14.6 ns 1.00
bm<uint64_t, op::lexi>/4021/3056 400 ns 400 ns 1.00

As expected, we don't need the inline anymore, and /O2 /Ob2 improves performance, so we'll be looking at more realistic results from the benchmarks.

Linker command line comparison:

main:    C:\WINDOWS\system32\cmd.exe /C "cd . && "C:\Program Files\Microsoft Visual Studio\2022\Preview\Common7\IDE\CommonExtensions\Microsoft\CMake\CMake\bin\cmake.exe" -E vs_link_exe --intdir=CMakeFiles\benchmark-mismatch.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100226~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100226~1.0\x64\mt.exe --manifests  -- C:\PROGRA~1\MIB055~1\2022\Preview\VC\Tools\MSVC\1444~1.349\bin\Hostx64\x64\link.exe  CMakeFiles\benchmark-mismatch.dir\src\mismatch.cpp.obj  /out:benchmark-mismatch.exe /implib:benchmark-mismatch.lib /pdb:benchmark-mismatch.pdb /version:0.0 /machine:x64 /debug /INCREMENTAL /subsystem:console -LIBPATH:D:\GitHub\STL\out\x64\out\lib\amd64 google-benchmark\src\benchmark.lib  shlwapi.lib  kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ."
This PR: C:\WINDOWS\system32\cmd.exe /C "cd . && "C:\Program Files\Microsoft Visual Studio\2022\Preview\Common7\IDE\CommonExtensions\Microsoft\CMake\CMake\bin\cmake.exe" -E vs_link_exe --intdir=CMakeFiles\benchmark-mismatch.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100226~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100226~1.0\x64\mt.exe --manifests  -- C:\PROGRA~1\MIB055~1\2022\Preview\VC\Tools\MSVC\1444~1.349\bin\Hostx64\x64\link.exe  CMakeFiles\benchmark-mismatch.dir\src\mismatch.cpp.obj  /out:benchmark-mismatch.exe /implib:benchmark-mismatch.lib /pdb:benchmark-mismatch.pdb /version:0.0 /machine:x64 /INCREMENTAL:NO /subsystem:console  /DEBUG -LIBPATH:D:\GitHub\STL\out\x64\out\lib\amd64 google-benchmark\src\benchmark.lib  shlwapi.lib  kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ."

Output files:

main:
D:\GitHub\STL>dir out\bench\benchmark-mismatch.* | rg benchmark-mismatch
03/28/2025  12:47 AM         1,953,280 benchmark-mismatch.exe
03/28/2025  12:47 AM        13,378,702 benchmark-mismatch.ilk
03/28/2025  12:47 AM        19,755,008 benchmark-mismatch.pdb

This PR:
D:\GitHub\STL>dir out\bench\benchmark-mismatch.* | rg benchmark-mismatch
03/28/2025  12:50 AM         1,482,752 benchmark-mismatch.exe
03/28/2025  12:50 AM        17,035,264 benchmark-mismatch.pdb

This preserves the /DEBUG linker option, but changes /INCREMENTAL to /INCREMENTAL:NO, fixing what @barcharcraz observed in Discord:

In particular the benchmarks shouldn't build with incremental linking

@StephanTLavavej StephanTLavavej added the test Related to test code label Mar 27, 2025
@StephanTLavavej StephanTLavavej requested a review from a team as a code owner March 27, 2025 09:34
@github-project-automation github-project-automation bot moved this to Initial Review in STL Code Reviews Mar 27, 2025
@StephanTLavavej StephanTLavavej moved this from Initial Review to Final Review in STL Code Reviews Mar 27, 2025
By default, this makes no difference. But when investigating what the build system is doing with `-DCMAKE_VERBOSE_MAKEFILE=ON`, this suppresses the verbose:

Microsoft (R) C/C++ Optimizing Compiler Version 19.44.34918.1 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

and extremely verbose:

/std:c++latest is provided as a preview of language features from the latest C++
working draft, and we're eager to hear about bugs and suggestions for improvements.
However, note that these features are provided as-is without support, and subject
to changes or removal as the working draft evolves. See
https://go.microsoft.com/fwlink/?linkid=2045807 for details.
@StephanTLavavej StephanTLavavej moved this from Final Review to Ready To Merge in STL Code Reviews Mar 31, 2025
@StephanTLavavej StephanTLavavej moved this from Ready To Merge to Merging in STL Code Reviews Apr 9, 2025
@StephanTLavavej StephanTLavavej self-assigned this Apr 9, 2025
@StephanTLavavej
Copy link
Member Author

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@StephanTLavavej StephanTLavavej merged commit ee74822 into microsoft:main Apr 10, 2025
39 checks passed
@github-project-automation github-project-automation bot moved this from Merging to Done in STL Code Reviews Apr 10, 2025
@StephanTLavavej StephanTLavavej deleted the benchmark-1.9.2 branch April 10, 2025 22:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Related to test code
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

benchmark: it is built with /Ob1, so vector algorithm dispatcher is noticeable
3 participants