Skip to content

Non-standard glob pattern behavior with * #689

@racerpeter

Description

@racerpeter

Hi there, when using git-filter-repo recently (git version 2.49.0, git-filter-repo version a40bce548d2c), I learned "the hard way" that the glob syntax does not match what you'd expect from a glob pattern containing a *. Specifically, the path segment boundary is not respected.

For example, given the following directory structure:

a/
  b/
    c/
      d.sql
      e.sql
    not_deleted.txt
    y.sql

The following glob patterns in a paths file (paths.txt):

glob:a/b/*.sql

And the following command:

git filter-repo --sensitive-data-removal --invert-paths --paths-from-file paths.txt

With typical glob syntax, one would expect that y.sql would be deleted and nothing else (because * does not match on path separators). However, the entire a/b/c directory is also deleted.

On further RTFM-ing of the man page, I discovered a note in the examples describing this as an expected behavior.

It appears that git-filter-repo may actually be using Unix fnmatch syntax rather than glob syntax -- where * in fnmatch will match any character (including path separators).

Short of changing the API to git-filter-repo by renaming path-glob to path-fnmatch, I would suggest making this fact far more prominent in the docs (for example, by mentioning it in the options section of the man page, and in any corresponding --help output) to avoid confusion by people who assume that they are dealing with a standard glob matcher.

Finally, thanks so much for creating this tool -- it's the best at what it does!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions