New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
urlparse incorrectly retrieves IPv4 and regular name hosts from inside of brackets #103848
Comments
…it are of IPv6 or IPvFuture format (#103849) * Adds checks to ensure that bracketed hosts found by urlsplit are of IPv6 or IPvFuture format --------- Co-authored-by: Gregory P. Smith <greg@krypto.org>
…urlsplit are of IPv6 or IPvFuture format (pythonGH-103849) * Adds checks to ensure that bracketed hosts found by urlsplit are of IPv6 or IPvFuture format --------- (cherry picked from commit 29f348e) Co-authored-by: JohnJamesUtley <81572567+JohnJamesUtley@users.noreply.github.com> Co-authored-by: Gregory P. Smith <greg@krypto.org>
…und by urlsplit are of IPv6 or IPvFuture format (pythonGH-103849) * Adds checks to ensure that bracketed hosts found by urlsplit are of IPv6 or IPvFuture format --------- Co-authored-by: Gregory P. Smith <greg@krypto.org> (cherry picked from commit 29f348e) Co-authored-by: JohnJamesUtley <81572567+JohnJamesUtley@users.noreply.github.com>
… urlsplit are of IPv6 or IPvFuture format (GH-103849) (#104349) gh-103848: Adds checks to ensure that bracketed hosts found by urlsplit are of IPv6 or IPvFuture format (GH-103849) * Adds checks to ensure that bracketed hosts found by urlsplit are of IPv6 or IPvFuture format --------- (cherry picked from commit 29f348e) Co-authored-by: JohnJamesUtley <81572567+JohnJamesUtley@users.noreply.github.com> Co-authored-by: Gregory P. Smith <greg@krypto.org>
* main: pythonGH-102181: Improve specialization stats for SEND (pythonGH-102182) pythongh-103000: Optimise `dataclasses.asdict` for the common case (python#104364) pythongh-103538: Remove unused TK_AQUA code (pythonGH-103539) pythonGH-87695: Fix OSError from `pathlib.Path.glob()` (pythonGH-104292) pythongh-104263: Rely on Py_NAN and introduce Py_INFINITY (pythonGH-104202) pythongh-104010: Separate and improve docs for `typing.get_origin` and `typing.get_args` (python#104013) pythongh-101819: Adapt _io._BufferedIOBase_Type methods to Argument Clinic (python#104355) pythongh-103960: Dark mode: invert image brightness (python#103983) pythongh-104252: Immortalize Py_EMPTY_KEYS (pythongh-104253) pythongh-101819: Clean up _io windows console io after pythongh-104197 (python#104354) pythongh-101819: Harden _io init (python#104352) pythongh-103247: clear the module cache in a test in test_importlib/extensions/test_loader.py (pythonGH-104226) pythongh-103848: Adds checks to ensure that bracketed hosts found by urlsplit are of IPv6 or IPvFuture format (python#103849) pythongh-74895: adjust tests to work on Solaris (python#104326) pythongh-101819: Refactor _io in preparation for module isolation (python#104334) pythongh-90953: Don't use deprecated AST nodes in clinic.py (python#104322) pythongh-102327: Extend docs for "url" and "headers" parameters to HTTPConnection.request() pythongh-104328: Fix typo in ``typing.Generic`` multiple inheritance error message (python#104335)
As implemented, the URL |
This change caused an issue on our system. We have brackets in a password. Since 3.11.4, urlparse tries to parse an ip address if it finds brackets. The brackets don't even have to be in the correct order. Running this code: from urllib.parse import urlparse
urlparse("https://user:some]password[@host.com") In 3.11.3 it works as expected:
In 3.11.4 it throws an exception: Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.11/urllib/parse.py", line 395, in urlparse
splitresult = urlsplit(url, scheme, allow_fragments)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/urllib/parse.py", line 500, in urlsplit
_check_bracketed_host(bracketed_host)
File "/usr/local/lib/python3.11/urllib/parse.py", line 446, in _check_bracketed_host
ip = ipaddress.ip_address(hostname) # Throws Value Error if not IPv6 or IPv4
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/ipaddress.py", line 54, in ip_address
raise ValueError(f'{address!r} does not appear to be an IPv4 or IPv6 address')
ValueError: '@host.com' does not appear to be an IPv4 or IPv6 address |
That this worked before was a bug.
So this works:
The password is still URL encoded, use Further references:
|
Thank you for the clarification! This makes a lot of sense :) |
I'm also hitting "ValueError: 'xxxxx' does not appear to be an IPv4 or IPv6 address" on code that was working. In my case I have no control over the input data, it's background noise from the internet that i'm parsing and analyzing. Some of the crap in my dataset is very much not standards compliant. None the less, as it was working and now isn't i figured i should 👍 before i start vendoring old versions of the stdlib into my project 😆 |
Background
RFC 3986 defines a host as follows
Where
WhatWG says that "A valid host string must be a valid domain string, a valid IPv4-address string, or: U+005B ([), followed by a valid IPv6-address string, followed by U+005D (])."
The Bug
This is code from
Lib/urllib/parse.py:196-208
used for retrieving the hostname from the netlocIt will incorrectly retrieve IPv4 addresses and regular name hosts from inside brackets. This is in violation of both specifications.
Minimally reproducible example:
Your environment
23cf1e2
)Linked PRs
The text was updated successfully, but these errors were encountered: