<regex>
: Improve search performance for regexes with initial +
quantifiers
#5509
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Towards #5468. This is a small change that greatly speeds up searches for regexes like
a+
that start with some letter/string/character class followed by a+
quantifier (or any other quantifier requiring at least one repetition). Because this loop must be matched at least once, we can enter the repeated subpattern and look for the first position a letter/string/character class in the subpattern can match.While working on this, I noticed that I didn't think the implementation of
text_regex::should_search_match_capture_groups()
through well enough: I designed it to use relative coordinates for expected submatches, but this isn't so helpful when one wants to ensure that the whole match is in a particular position. So I changed the implementation to use absolute coordinates from the start of the matched string. Luckily, no test seems to have relied on the previous behavior, meaning all of them just matched the start of the input string anyway.Benchmark
Running on my machine: