Skip to content

Have an option for POSIX-compatible longest match of alternates #150

@mrabarnett

Description

@mrabarnett

Original report by Anonymous.


Hello there,

Currently both re and regexp short-circuit the first match for alternate matches. For example, (A|AA)$ matches only the last character in AA.

On the other hand, POSIX regex (C, C++, Boost, Ruby) would demand that the longest leftmost match is returned, i.e AA. Most modern engines seem to reject this on the basis that it makes the engine terribly slow (because it cannot match alternates eagerly).

However, the leftmost longest overall match behavior can be quite useful in some situations, where otherwise workarounds are needed and it looks like there is currently no engine for Python which supports this behaviour.

It would be nice to have the POSIX behaviour of the longest submatch as an option when compiling a regular expression.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions