Skip to content

gh-63293: Deprecate providing false values besides strings or bytes to urllib.parse functions #26687

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

jacobtylerwalls
Copy link
Contributor

@jacobtylerwalls jacobtylerwalls commented Jun 12, 2021

bpo-19094 and bpo-22234

Previously, although truthy values failed with AttributeError: ... has no attribute 'decode', falsy values such as [] were silently accepted.

Original patch by @vajrasky.


I noticed in 3.11 (d597fdc) some inappropriate types will fail first in urlsplit() (only) via the @functools.lru_cache decorator requiring hashable types. This is the chief reason for rotating unsupported types in the test cases.

>>> import urllib.parse
>>> urllib.parse.urlsplit([])
Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: unhashable type: 'list'
>>> urllib.parse.urlsplit(())
SplitResultBytes(scheme=b'', netloc=b'', path=b'', query=b'', fragment=b'')

https://bugs.python.org/issue19094

Closes #63293

@jacobtylerwalls
Copy link
Contributor Author

jacobtylerwalls commented Jun 13, 2021

I just did a bpo search for duplicates and found bpo-22234. The discussion eventually came round to a similar solution after discussing the necessity of a deprecation period (believe the consensus = not needed).

But in looking at the discussion there I realized it's safe to extend this solution to the other functions in the module and just factor this TypeError out to _coerce_args(). DRY is good. Will push again and update title and body.

@jacobtylerwalls jacobtylerwalls changed the title bpo-19094: Raise TypeError in urljoin(), urlparse(), and urlsplit() for inappropriate types bpo-19094: Raise TypeError in urllib.parse for inappropriate types Jun 13, 2021
@jacobtylerwalls
Copy link
Contributor Author

jacobtylerwalls commented Jun 13, 2021

The failing test is caused by pip relying on the undocumented duck-typing in question:

https://github.com/pypa/pip/blob/main/src/pip/_internal/models/link.py#L150-L154

I'll raise a PR against pip. So this may need to wait, unless we add a friendly shim.

Notice that the other patch in bpo-22234 explicitly regarded this as an error condition:

+        with self.assertRaises(TypeError):
+            urlunsplit(("http", "www.python.org", "", "", None))

Update: pip merged my PR: pypa/pip#10064

@jacobtylerwalls

This comment was marked as outdated.

@jacobtylerwalls jacobtylerwalls changed the title bpo-19094: Raise TypeError in urllib.parse for inappropriate types bpo-19094: Raise TypeError in urllib.parse for objects besides strings or bytes Jun 14, 2021
@github-actions
Copy link

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label Jul 15, 2021
…t()` for inappropriate types

Co-authored-by: Vajrasky Kok <sky.kok@speaklikeaking.com>
This extends the solution to urldefrag(), urlunparse(), urlunsplit(), and parse_qsl()
@jacobtylerwalls
Copy link
Contributor Author

jacobtylerwalls commented Feb 23, 2022

From the failing docs job, I see requests relies on this behavior. I'll change to a emit a deprecation warning. Serhiy has a draft patch on bpo-22234 doing something similar. That patch also allows duck-typing for types having an "encode" attribute.

@jacobtylerwalls jacobtylerwalls changed the title bpo-19094: Raise TypeError in urllib.parse for objects besides strings or bytes bpo-19094: Deprecate providing false values besides strings to urllib.parse functions Feb 23, 2022
@jacobtylerwalls jacobtylerwalls changed the title bpo-19094: Deprecate providing false values besides strings to urllib.parse functions bpo-19094: Deprecate providing false values besides strings or bytes to urllib.parse functions Feb 23, 2022
Copy link
Contributor

@MaxwellDupre MaxwellDupre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok.

@jacobtylerwalls jacobtylerwalls changed the title bpo-19094: Deprecate providing false values besides strings or bytes to urllib.parse functions gh-63293: Deprecate providing false values besides strings or bytes to urllib.parse functions Jun 18, 2022
@github-actions github-actions bot removed the stale Stale PR or inactive for long period of time. label Aug 9, 2022
@jacobtylerwalls
Copy link
Contributor Author

@orsenthil I'm wondering if you have a view as a maintainer of urllib?

Previously, values other than strings or bytes either failed cryptically with AttributeError if truthy or were accepted silently if falsy. In 3.11 one falsy value will now start failing cryptically with TypeError, described in OP.

Is this a useful cleanup? I included a deprecation path for the falsy values.

@arhadthedev arhadthedev added stdlib Python modules in the Lib dir python labels Nov 19, 2023
@arhadthedev arhadthedev added the 3.13 bugs and security fixes label Nov 19, 2023
@pablogsal pablogsal requested review from pablogsal and removed request for pablogsal November 19, 2023 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes awaiting review stdlib Python modules in the Lib dir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

urljoin should raise a TypeError if URL is not a string
9 participants