Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-38807: Add os.PathLike to exception message raised by _check_arg_types #17160

Merged
merged 5 commits into from Nov 19, 2019

Conversation

tomasfarias
Copy link
Contributor

@tomasfarias tomasfarias commented Nov 15, 2019

os.PathLike objects can be passed to os.path.join, however this is not reflected in the TypeError raised by _check_arg_types when passing an invalid type:

TypeError: join() argument must be str or bytes, not 'NoneType'

https://bugs.python.org/issue38807

@the-knights-who-say-ni
Copy link

the-knights-who-say-ni commented Nov 15, 2019

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

CLA Missing

Our records indicate the following people have not signed the CLA:

@tomasfarias

For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

@brandtbucher
Copy link
Member

brandtbucher commented Nov 15, 2019

Thanks for your time @tomasfarias, and welcome to CPython! 😎

I assume that you've seen the bot's message about the CLA?

Copy link
Member

@brandtbucher brandtbucher left a comment

I agree that this is a good change. Just one minor grammar fix:

@@ -149,7 +149,7 @@ def _check_arg_types(funcname, *args):
elif isinstance(s, bytes):
hasbytes = True
else:
raise TypeError('%s() argument must be str or bytes, not %r' %
raise TypeError('%s() argument must be str, bytes or os.PathLike object, not %r' %
Copy link
Member

@brandtbucher brandtbucher Nov 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise TypeError('%s() argument must be str, bytes or os.PathLike object, not %r' %
raise TypeError('%s() argument must be str, bytes, or os.PathLike object, not %r' %

Copy link
Member

@aeros aeros Nov 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think we can use an f-string here instead of the older % formatting. We often will avoid refactoring old code exclusively for the purpose of using f-strings, but it's preferred for any new code to use f-strings when possible. They're significantly easier to read.

@brandtbucher brandtbucher added the docs Documentation in the Doc dir label Nov 15, 2019
@brandtbucher
Copy link
Member

brandtbucher commented Nov 15, 2019

This PR should also have a NEWS entry. Something like:

Update :exc:`TypeError` messages for :meth:`os.path.join` to include :class:`os.PathLike` objects as acceptable input types.

@aeros
Copy link
Member

aeros commented Nov 15, 2019

I'd say this falls under the category of enhancement rather than documentation, since it's providing an enhancement to an error message rather than making any documentation changes.

@aeros aeros added type-feature A feature request or enhancement and removed docs Documentation in the Doc dir labels Nov 15, 2019
Copy link
Member

@aeros aeros left a comment

Other than this detail, everything else looks good:

raise TypeError('%s() argument must be str or bytes, not %r' %
(funcname, s.__class__.__name__)) from None
raise TypeError(f'{funcname}() argument must be str, bytes, or '
f'os.PathLike object, not {s.__class__.__name__}') from None
Copy link
Member

@aeros aeros Nov 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to use %r formatting in an f-string, the expression within the braces has to end with an !r:

Suggested change
f'os.PathLike object, not {s.__class__.__name__}') from None
f'os.PathLike object, not {s.__class__.__name__!r}') from None

Copy link
Contributor Author

@tomasfarias tomasfarias Nov 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, thank you!

aeros
aeros approved these changes Nov 15, 2019
Copy link
Member

@aeros aeros left a comment

Thanks for making the recommended changes @tomasfarias, LGTM.

Copy link
Member

@brandtbucher brandtbucher left a comment

Thanks!

@taleinat
Copy link
Contributor

taleinat commented Nov 15, 2019

The current uses of genericpath._check_arg_types() are by the .join(), .relpath() and .commonpath() methods of the ntpath and posixpath modules. However, unlike the other methods, both of the .join() methods call _check_arg_types() with the original paths, without first passing them through os.fspath().

That makes the error message of _check_arg_types() potentially inaccurate after this change, since it could potentially be passed os.PathLike objects and raise a very confusing exception.

I suggest adding a check for os.PathLike objects to _check_arg_types() to make it less susceptible to future bugs. With that, the new error message will more certainly be correct. (I wouldn't be worried about performance here, since this is only used in error cases to give more informative exceptions.)

@aeros
Copy link
Member

aeros commented Nov 15, 2019

@taleinat

I suggest adding a check for os.PathLike objects to _check_arg_types() to make it less susceptible to future bugs

Would something like this be adequate?

def _check_arg_types(funcname, *args):
    hasstr = hasbytes = haspathlike = False
    for s in args:
        if isinstance(s, str):
            hasstr = True
        elif isinstance(s, bytes):
            hasbytes = True
        elif isinstance(s, os.PathLike):
            haspathlike = True
        else:
            ...

I believe the conditional check below would also be needed, to ensure that os.PathLike instances aren't mixed with bytes:

    if haspathlike and hasbytes:
        raise TypeError("Can't mix os.PathLike and bytes in path components") from None

Currently, if you try to mix os.PathLike path and with a bytes literal, you get a potentially misleading error message:

>>> a = pathlib.PurePath('dir')
>>> b = b'test.py'
>>> os.path.join(a, b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.7/posixpath.py", line 94, in join
    genericpath._check_arg_types('join', a, *p)
  File "/usr/lib/python3.7/genericpath.py", line 151, in _check_arg_types
    raise TypeError("Can't mix strings and bytes in path components") from None
TypeError: Can't mix strings and bytes in path components

I believe this occurs because in os.path.join(), os.PathLike instances are converted to strings when they are passed to os.fspath().

A check shouldn't be needed for hasstr and haspathlike, because os.PathLike instances can be mixed with strings:

>>> a = pathlib.PurePath('dir')
>>> b = "test.py"
>>> os.path.join(a, b)
'dir/test.py'

Edit: the current error message is even more misleading the over way around:

>>> a = 'dir'
>>> b = pathlib.PurePath('test.py')
>>> os.path.join(a, b)
'dir/test.py'
>>> a = b'dir'
>>> os.path.join(a, b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.7/posixpath.py", line 94, in join
    genericpath._check_arg_types('join', a, *p)
  File "/usr/lib/python3.7/genericpath.py", line 149, in _check_arg_types
    (funcname, s.__class__.__name__)) from None
TypeError: join() argument must be str or bytes, not 'PurePosixPath'

@taleinat
Copy link
Contributor

taleinat commented Nov 15, 2019

@aeros, you're right, adding that check like that to _check_arg_types() becomes awkward due to the additional check for mixed strings and bytes. So, to answer:

Would something like this be adequate?

IMO, no.

I suggest changing _check_arg_types() to use for s in map(os.fspath, args):, which should just work.

@aeros
Copy link
Member

aeros commented Nov 16, 2019

@taleinat

I suggest changing _check_arg_types() to use for s in map(os.fspath, args):, which should just work.

I agree, that would likely be a better solution. Especially in the long term if any other valid types are added in the future that convert to strings with os.fspath() (that aren't subclasses of os.PathLike).

@tomasfarias
Copy link
Contributor Author

tomasfarias commented Nov 19, 2019

I've been afk the last couple of days so I'm just now reading the discussion. Thank you all for your comments.

Is there a point to raising the first TypeError in _check_arg_types if we're using for s in map(os.fspath, args)? Since at least in the first example I brought up os.fspath will end up raising the TypeError before _check_arg_types has a chance to run the else block.

I agree with the comment, just wondering if also deleting the else block would be a positive change.

Edit: the exception message from os.fspath mentions os.PathLike objects but doesn't add the function name like _check_arg_types currently does: TypeError: expected str, bytes or os.PathLike object, not NoneType

@tomasfarias
Copy link
Contributor Author

tomasfarias commented Nov 19, 2019

Actually I think the for s in map(os.fspath, args) solution is not ideal: after testing I realize nothing is handling the exception raised by os.fspath inside _check_arg_types and this leads to printing two tracebacks, which is confusing.

A possible solution is to have join pass the arguments through os.fspath first, like the other functions do: paths = tuple(map(os.fspath, args))

Also thought about using a try/except to wrap the loop in _check_arg_types, but I do not like that idea as it would be duplicating the try/except blocks in join, relpath and commonpath.

@bedevere-bot
Copy link

bedevere-bot commented Nov 19, 2019

@rhettinger: Please replace # with GH- in the commit message next time. Thanks!

@miss-islington
Copy link
Contributor

miss-islington commented Nov 19, 2019

Thanks @tomasfarias for the PR, and @rhettinger for merging it 🌮🎉.. I'm working now to backport this PR to: 3.8.
🐍🍒🤖

@bedevere-bot
Copy link

bedevere-bot commented Nov 19, 2019

GH-17249 is a backport of this pull request to the 3.8 branch.

@brandtbucher
Copy link
Member

brandtbucher commented Nov 19, 2019

Congrats on your first CPython contribution @tomasfarias! 🍾

Looking forward to seeing more from you in the future.

rhettinger pushed a commit that referenced this pull request Nov 19, 2019
…types (GH-17160) (GH-17249)

(cherry picked from commit fe75b62)

Co-authored-by: Tomás Farías <tomasfariassantana@gmail.com>
@taleinat
Copy link
Contributor

taleinat commented Nov 19, 2019

@rhettinger, did you follow the conversation here? The PR as merged doesn't appear to properly handle edge cases in posixpath.join() and ntpath.join(). If you see otherwise, could you explain?

jacobneiltaylor pushed a commit to jacobneiltaylor/cpython that referenced this pull request Dec 5, 2019
shihai1991 pushed a commit to shihai1991/cpython that referenced this pull request Jan 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-feature A feature request or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants