Skip to content

fnmatch should use re.search() #32123

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

fnmatch should use re.search() #32123

wants to merge 3 commits into from

Conversation

Roffild
Copy link

@Roffild Roffild commented Mar 26, 2022

re.match does not work with the end of a string (\Z or $).

import fnmatch, re

fnmatch.translate("busybox")
Out[2]: '(?s:busybox)\\Z'

re.compile(fnmatch.translate("busybox")).match("/bin/busybox")

re.compile(fnmatch.translate("busybox")).search("/bin/busybox")
Out[4]: <re.Match object; span=(5, 12), match='busybox'>

@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

CLA Missing

Our records indicate the following people have not signed the CLA:

@Roffild

For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

@ghost
Copy link

ghost commented Apr 18, 2022

All commit authors signed the Contributor License Agreement.
CLA signed

@bedevere-bot
Copy link

Every change to Python requires a NEWS entry.

Please, add it using the blurb_it Web app or the blurb command-line tool.

@Roffild
Copy link
Author

Roffild commented Apr 18, 2022

Github is not set up. Only after "Update - Rebase" did the bot call the reviewer. For other projects, this button is not available to me.

@AlexWaygood
Copy link
Member

AlexWaygood commented Apr 18, 2022

Hi, I'm not a bot :) I'm a very human triager.

@Roffild
Copy link
Author

Roffild commented Apr 18, 2022

But @cpython-cla-bot - bot 😄

@AlexWaygood
Copy link
Member

But @cpython-cla-bot - bot 😄

That's just because they're the new bot on the block! They didn't exist when you first filed the PR (and nor did Bedevere's "please add a NEWS entry" functionality), but now they do, so adding new commits to your branch triggered them.

@ezio-melotti
Copy link
Member

The current behavior is correct:

>>> fnmatch.fnmatch('/bin/busybox', 'busybox')
False
>>> fnmatch.fnmatch('busybox', 'busybox')
True
>>> fnmatch.fnmatch('/bin/busybox', '*busybox')  # note the *
True

The problem is not with the end of the string, but with the beginning:

  • 'busybox' only matches busybox and not /bin/busybox or other strings containing busybox
  • '*busybox' matches strings that end with busybox, including /bin/busybox
  • 'busybox*' matches strings that begin with busybox

@serhiy-storchaka
Copy link
Member

Concur with @ezio-melotti.

@Roffild
Copy link
Author

Roffild commented Apr 18, 2022

#include <stdio.h>
#include <fnmatch.h>
int main() {
   printf("%d=False %d=True %d=True\n",
      fnmatch("/bin/busybox", "busybox", 0),
      fnmatch("busybox", "busybox", 0),
      fnmatch("/bin/busybox", "*busybox", 0)
      );
}
// 1=False 0=True 1=True

https://man7.org/linux/man-pages/man3/fnmatch.3.html and GCC

re.search:

fnmatch.fnmatch('/bin/busybox', 'busybox')
Out[2]: True
fnmatch.fnmatch('busybox', 'busybox')
Out[3]: True
fnmatch.fnmatch('/bin/busybox', '*busybox')
Out[4]: True

My version meets the standard for option 1.

@serhiy-storchaka
Copy link
Member

You are confused by the fact that the C's fnmatch and the Python's fnmatch have different order of parameters.

@ezio-melotti
Copy link
Member

#include <stdio.h>
#include <fnmatch.h>

int main()
{
    printf("1st=%d 2nd=%d 3rd=%d\n",
        fnmatch("busybox", "/bin/busybox", 0),
        fnmatch("busybox", "busybox", 0),
        fnmatch("*busybox", "/bin/busybox", 0)
    );
    return 0;
}

Outputs 1st=1 2nd=0 3rd=0. Note two things, from the fnmatch manpage:

  • The signature of the C function is int fnmatch(const char *pattern, const char *string, int flags); (the pattern first, then the string).
  • Zero if string matches pattern, FNM_NOMATCH if there is no match or another nonzero value if there is an error.

This means that the first example didn't match, whereas the other two did, so it's consistent with Python's behavior and the example I posted above.

@Roffild
Copy link
Author

Roffild commented Apr 18, 2022

But

fnmatch("busybox", "/bin/busybox", 0) != fnmatch.fnmatch('/bin/busybox', 'busybox')

It was this pattern that made me create the patch.
The \Z in '(?s:busybox)\\Z' clearly hints at this behavior. And about ignoring \Z and $ in re.match, there are many posts in SO.

@serhiy-storchaka
Copy link
Member

Please read more carefully what @ezio-melotti has written above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants