`<regex>`: Improve search performance for regexes with initial `+` quantifiers #5509

muellerj2 · 2025-05-15T20:31:28Z

Towards #5468. This is a small change that greatly speeds up searches for regexes like a+ that start with some letter/string/character class followed by a + quantifier (or any other quantifier requiring at least one repetition). Because this loop must be matched at least once, we can enter the repeated subpattern and look for the first position a letter/string/character class in the subpattern can match.

While working on this, I noticed that I didn't think the implementation of text_regex::should_search_match_capture_groups() through well enough: I designed it to use relative coordinates for expected submatches, but this isn't so helpful when one wants to ensure that the whole match is in a particular position. So I changed the implementation to use absolute coordinates from the start of the matched string. Luckily, no test seems to have relied on the previous behavior, meaning all of them just matched the start of the input string anyway.

Benchmark

Running on my machine:

Benchmark	Before	After	Speedup
bm_lorem_search/"bibe"/2	38504 ns	39237 ns	0.98
bm_lorem_search/"bibe"/3	76730 ns	76730 ns	1.00
bm_lorem_search/"bibe"/4	153460 ns	153460 ns	1.00
bm_lorem_search/"(bibe)+"/2	4814680 ns	92076 ns	52.29
bm_lorem_search/"(bibe)+"/3	9521484 ns	204041 ns	46.66
bm_lorem_search/"(bibe)+"/4	18158784 ns	401088 ns	45.27
bm_lorem_search/"(?:bibe)+"/2	4743304 ns	97656 ns	48.57
bm_lorem_search/"(?:bibe)+"/3	9521484 ns	192540 ns	49.45
bm_lorem_search/"(?:bibe)+"/4	19531250 ns	384976 ns	50.73

…antifiers

tests/std/tests/VSO_0000000_regex_use/test.cpp

StephanTLavavej · 2025-05-16T16:04:16Z

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

StephanTLavavej · 2025-05-17T04:37:23Z

I resolved a trivial adjacent-add conflict with #5494 in VSO_0000000_regex_use.

StephanTLavavej · 2025-05-17T05:26:27Z

➕ 🚀 ⏱️

<regex>: Improve search performance for regexes with initial + qu…

a1f447e

…antifiers

muellerj2 requested a review from a team as a code owner May 15, 2025 20:31

github-project-automation bot added this to STL Code Reviews May 15, 2025

github-project-automation bot moved this to Initial Review in STL Code Reviews May 15, 2025

StephanTLavavej added performance Must go faster regex meow is a substring of homeowner labels May 15, 2025

StephanTLavavej self-assigned this May 15, 2025

Fix citation.

de03ed0

StephanTLavavej reviewed May 16, 2025

View reviewed changes

tests/std/tests/VSO_0000000_regex_use/test.cpp Outdated Show resolved Hide resolved

StephanTLavavej approved these changes May 16, 2025

View reviewed changes

StephanTLavavej removed their assignment May 16, 2025

StephanTLavavej moved this from Initial Review to Ready To Merge in STL Code Reviews May 16, 2025

StephanTLavavej mentioned this pull request May 16, 2025

Maintainer priorities #4700

Open

StephanTLavavej moved this from Ready To Merge to Merging in STL Code Reviews May 16, 2025

Merge branch 'main' into regex-improve-regex_search-performance

b615de4

StephanTLavavej approved these changes May 17, 2025

View reviewed changes

StephanTLavavej merged commit 2391e5e into microsoft:main May 17, 2025
40 checks passed

github-project-automation bot moved this from Merging to Done in STL Code Reviews May 17, 2025

muellerj2 deleted the regex-improve-regex_search-performance branch May 31, 2025 21:44

muellerj2 mentioned this pull request Jun 14, 2025

<regex>: Use std::search() in skip heuristic #5586

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`<regex>`: Improve search performance for regexes with initial `+` quantifiers #5509

`<regex>`: Improve search performance for regexes with initial `+` quantifiers #5509

Uh oh!

muellerj2 commented May 15, 2025 •

edited

Loading

Uh oh!

Uh oh!

StephanTLavavej commented May 16, 2025

Uh oh!

StephanTLavavej commented May 17, 2025

Uh oh!

Uh oh!

StephanTLavavej commented May 17, 2025

Uh oh!

Uh oh!

<regex>: Improve search performance for regexes with initial + quantifiers #5509

<regex>: Improve search performance for regexes with initial + quantifiers #5509

Uh oh!

Conversation

muellerj2 commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark

Uh oh!

Uh oh!

StephanTLavavej commented May 16, 2025

Uh oh!

StephanTLavavej commented May 17, 2025

Uh oh!

Uh oh!

StephanTLavavej commented May 17, 2025

➕ 🚀 ⏱️

Uh oh!

Uh oh!

`<regex>`: Improve search performance for regexes with initial `+` quantifiers #5509

`<regex>`: Improve search performance for regexes with initial `+` quantifiers #5509

muellerj2 commented May 15, 2025 •

edited

Loading