Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.
Sign upbpo-30458: Disallow control chars in http URLs. #12755
Conversation
gpshead
added
type-bugfix
needs backport to 3.6
needs backport to 2.7
DO-NOT-MERGE
needs backport to 3.7
labels
Apr 10, 2019
the-knights-who-say-ni
added
the
CLA signed
label
Apr 10, 2019
bedevere-bot
added
the
awaiting core review
label
Apr 10, 2019
This comment has been minimized.
This comment has been minimized.
diff --git a/Lib/test/test_urllib.py b/Lib/test/test_urllib.py
index 2ac73b58d8..99bd934765 100644
--- a/Lib/test/test_urllib.py
+++ b/Lib/test/test_urllib.py
@@ -329,6 +329,14 @@ class urlopen_HttpTests(unittest.TestCase, FakeHTTPMixin, FakeFTPMixin):
finally:
self.unfakehttp()
+ def test_url_newline(self):
+ self.fakehttp(b"HTTP/1.1 200 OK\r\n\r\nHello!")
+ try:
+ with self.assertRaises(ValueError):
+ resp = urlopen("http://127.0.0.1:1234/?q=HTTP/1.1\r\nHeader: Value")
+ finally:
+ self.unfakehttp()
+
def test_read_0_9(self):
# "0.9" response accepted (but not "simple responses" without
# a status line)
@@ -1512,6 +1520,11 @@ class RequestTests(unittest.TestCase):
request.method = 'HEAD'
self.assertEqual(request.get_method(), 'HEAD')
+ def test_url_newline(self):
+ Request = urllib.request.Request
+ with self.assertRaises(ValueError):
+ resp = Request("http://127.0.0.1:1234/?q=HTTP/1.1\r\nHeader: Value")
+
class URL2PathNameTests(unittest.TestCase):
diff --git a/Lib/urllib/parse.py b/Lib/urllib/parse.py
index 8b6c9b1060..20242eee2d 100644
--- a/Lib/urllib/parse.py
+++ b/Lib/urllib/parse.py
@@ -997,6 +997,11 @@ def _splittype(url):
if _typeprog is None:
_typeprog = re.compile('([^/:]+):(.*)', re.DOTALL)
+ unquoted_url = unquote(url)
+ if (unquoted_url.startswith('http') and
+ re.search("[\x00-\x19\x7f-\x9f]", unquoted_url)):
+ raise ValueError("URL can't contain control characters. %r" % (unquoted_url,))
+
match = _typeprog.match(url)
if match:
scheme, data = match.groups() |
tirkarthi
reviewed
Apr 10, 2019
Lib/urllib/request.py Outdated
tirkarthi
reviewed
Apr 10, 2019
Lib/urllib/parse.py Outdated
This comment has been minimized.
This comment has been minimized.
The initial reverted fix in golang was reverted as noted in golang/go#27302 (comment) because it took into account unicode characters in URL also as problematic CTL/invalid characters : golang/go@1040626#diff-6c2d018290e298803c0c9419d8739885R1136 and resorted to only using ASCII CTL characters. I assume this could be the internal test failures at Google where unicode values were present in URL. It's not a problem with Python because urllib already tries to decode using ASCII and therefore throws a
Line 971 in 750d74f |
gpshead
added some commits
Apr 10, 2019
This comment has been minimized.
This comment has been minimized.
Sorry, this has been sending me circles from morning. Checking out the latest golang source this was further relaxed golang/go@f1d662f#diff-130fa30904f41d0d113c732265e91542R63 to use check only for space and 0x7f instead of anything between First fix golang/go@829c5df#diff-130fa30904f41d0d113c732265e91542 Latest fix reverting the above golang/go@f1d662f#diff-130fa30904f41d0d113c732265e91542R63 func stringContainsCTLByte(s string) bool {
for i := 0; i < len(s); i++ {
b := s[i]
if b < ' ' || b == 0x7f {
return true
}
}
return false
} Thanks, this seems to have been changed with 448a541 in this PR |
This comment has been minimized.
This comment has been minimized.
See also an older thread from Victor on fixing similar issues : https://mail.python.org/pipermail/python-dev/2017-July/148699.html |
This comment has been minimized.
This comment has been minimized.
Yep. I'm trying to make this fix as surgical as possible to only prevent the actual problem of allowing invalid paths with whitespace to pass through to the protocol level without changing public APIs of other functions such as parse.quote or (deprecated) parse.splittype and such. |
gpshead
requested a review
from
vstinner
Apr 10, 2019
gpshead
removed
the
DO-NOT-MERGE
label
Apr 10, 2019
gpshead
changed the title
bpo-14826 bpo-36276: Disallow control chars in http URLs.
bpo-36276: Disallow control chars in http URLs.
Apr 10, 2019
gpshead
requested a review
from
orsenthil
Apr 10, 2019
gpshead
changed the title
bpo-36276: Disallow control chars in http URLs.
bpo-30458: Disallow control chars in http URLs.
Apr 10, 2019
gpshead
added
the
type-security
label
Apr 10, 2019
This comment has been minimized.
This comment has been minimized.
This seems to cause issue with
|
This comment has been minimized.
This comment has been minimized.
Actual test :
So
|
ryanpetrello
added a commit
to ryanpetrello/urllib3
that referenced
this pull request
Apr 30, 2019
ryanpetrello
added a commit
to ryanpetrello/urllib3
that referenced
this pull request
Apr 30, 2019
gpshead
self-assigned this
Apr 30, 2019
ryanpetrello
added a commit
to ryanpetrello/urllib3
that referenced
this pull request
Apr 30, 2019
This comment has been minimized.
This comment has been minimized.
I have made the requested changes; please review again |
bedevere-bot
added
awaiting change review
and removed
awaiting merge
labels
Apr 30, 2019
This comment has been minimized.
This comment has been minimized.
bedevere-bot
commented
Apr 30, 2019
Thanks for making the requested changes! @orsenthil, @vstinner: please review the changes made to this pull request. |
tirkarthi
approved these changes
Apr 30, 2019
LGTM. Getting this to testing will help. Thanks @gpshead for the PR :) This PR is also adopted in urllib3 : urllib3/urllib3#1591 |
ryanpetrello
added a commit
to ryanpetrello/urllib3
that referenced
this pull request
Apr 30, 2019
gpshead
merged commit c4e671e
into
python:master
May 1, 2019
5 checks passed
This comment has been minimized.
This comment has been minimized.
miss-islington
commented
May 1, 2019
Thanks @gpshead for the PR |
bedevere-bot
removed
the
awaiting change review
label
May 1, 2019
This comment has been minimized.
This comment has been minimized.
miss-islington
commented
May 1, 2019
Sorry, @gpshead, I could not cleanly backport this to |
This comment has been minimized.
This comment has been minimized.
miss-islington
commented
May 1, 2019
Sorry, @gpshead, I could not cleanly backport this to |
This comment has been minimized.
This comment has been minimized.
miss-islington
commented
May 1, 2019
Sorry, @gpshead, I could not cleanly backport this to |
gpshead
deleted the
gpshead:url-no-control-chars
branch
May 1, 2019
This comment has been minimized.
This comment has been minimized.
bedevere-bot
commented
May 1, 2019
|
This comment has been minimized.
This comment has been minimized.
bedevere-bot
commented
May 1, 2019
|
This comment has been minimized.
This comment has been minimized.
I have made a backport for openSUSE and yes it wasn't trivial. |
This comment has been minimized.
This comment has been minimized.
This patch actually breaks urllib3, I had to patch the test suite not to fail. |
ryanpetrello
added a commit
to ryanpetrello/urllib3
that referenced
this pull request
May 1, 2019
This comment has been minimized.
This comment has been minimized.
thanks, based on that i changed the stdlib one to raise http.client.InvalidURL instead of ValueError. #13044 |
arnolddumas
added a commit
to arnolddumas/cpython
that referenced
this pull request
May 3, 2019
This comment has been minimized.
This comment has been minimized.
I would love to get some feedback on this. |
gpshead commentedApr 10, 2019
•
edited
bpo-36276 and bpo-30458:
disallow control characters such as whitespace in urls put into the raw http protocol within http.client.
https://bugs.python.org/issue36276
https://bugs.python.org/issue30458