Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upbpo-30458: Disallow control chars in http URLs. #12755
Conversation
Example possible fix for those issues.
This comment has been minimized.
This comment has been minimized.
diff --git a/Lib/test/test_urllib.py b/Lib/test/test_urllib.py
index 2ac73b58d8..99bd934765 100644
--- a/Lib/test/test_urllib.py
+++ b/Lib/test/test_urllib.py
@@ -329,6 +329,14 @@ class urlopen_HttpTests(unittest.TestCase, FakeHTTPMixin, FakeFTPMixin):
finally:
self.unfakehttp()
+ def test_url_newline(self):
+ self.fakehttp(b"HTTP/1.1 200 OK\r\n\r\nHello!")
+ try:
+ with self.assertRaises(ValueError):
+ resp = urlopen("http://127.0.0.1:1234/?q=HTTP/1.1\r\nHeader: Value")
+ finally:
+ self.unfakehttp()
+
def test_read_0_9(self):
# "0.9" response accepted (but not "simple responses" without
# a status line)
@@ -1512,6 +1520,11 @@ class RequestTests(unittest.TestCase):
request.method = 'HEAD'
self.assertEqual(request.get_method(), 'HEAD')
+ def test_url_newline(self):
+ Request = urllib.request.Request
+ with self.assertRaises(ValueError):
+ resp = Request("http://127.0.0.1:1234/?q=HTTP/1.1\r\nHeader: Value")
+
class URL2PathNameTests(unittest.TestCase):
diff --git a/Lib/urllib/parse.py b/Lib/urllib/parse.py
index 8b6c9b1060..20242eee2d 100644
--- a/Lib/urllib/parse.py
+++ b/Lib/urllib/parse.py
@@ -997,6 +997,11 @@ def _splittype(url):
if _typeprog is None:
_typeprog = re.compile('([^/:]+):(.*)', re.DOTALL)
+ unquoted_url = unquote(url)
+ if (unquoted_url.startswith('http') and
+ re.search("[\x00-\x19\x7f-\x9f]", unquoted_url)):
+ raise ValueError("URL can't contain control characters. %r" % (unquoted_url,))
+
match = _typeprog.match(url)
if match:
scheme, data = match.groups() |
This comment has been minimized.
This comment has been minimized.
The initial reverted fix in golang was reverted as noted in golang/go#27302 (comment) because it took into account unicode characters in URL also as problematic CTL/invalid characters : golang/go@1040626#diff-6c2d018290e298803c0c9419d8739885R1136 and resorted to only using ASCII CTL characters. I assume this could be the internal test failures at Google where unicode values were present in URL. It's not a problem with Python because urllib already tries to decode using ASCII and therefore throws a
Line 971 in 750d74f |
This comment has been minimized.
This comment has been minimized.
Sorry, this has been sending me circles from morning. Checking out the latest golang source this was further relaxed golang/go@f1d662f#diff-130fa30904f41d0d113c732265e91542R63 to use check only for space and 0x7f instead of anything between First fix golang/go@829c5df#diff-130fa30904f41d0d113c732265e91542 Latest fix reverting the above golang/go@f1d662f#diff-130fa30904f41d0d113c732265e91542R63 func stringContainsCTLByte(s string) bool {
for i := 0; i < len(s); i++ {
b := s[i]
if b < ' ' || b == 0x7f {
return true
}
}
return false
} Thanks, this seems to have been changed with 448a541 in this PR |
This comment has been minimized.
This comment has been minimized.
See also an older thread from Victor on fixing similar issues : https://mail.python.org/pipermail/python-dev/2017-July/148699.html |
This comment has been minimized.
This comment has been minimized.
Yep. I'm trying to make this fix as surgical as possible to only prevent the actual problem of allowing invalid paths with whitespace to pass through to the protocol level without changing public APIs of other functions such as parse.quote or (deprecated) parse.splittype and such. |
This comment has been minimized.
This comment has been minimized.
This seems to cause issue with
|
This comment has been minimized.
This comment has been minimized.
Actual test :
So
|
This comment has been minimized.
This comment has been minimized.
thanks, based on that i changed the stdlib one to raise http.client.InvalidURL instead of ValueError. #13044 |
Disallow control chars in http URLs in urllib.urlopen. This addresses a potential security problem for applications that do not sanity check their URLs where http request headers could be injected.
This comment has been minimized.
This comment has been minimized.
I would love to get some feedback on this. |
Disallow control chars in http URLs in urllib.urlopen. This addresses a potential security problem for applications that do not sanity check their URLs where http request headers could be injected. Disable https related urllib tests on a build without ssl (pythonGH-13032) These tests require an SSL enabled build. Skip these tests when python is built without SSL to fix test failures. Use http.client.InvalidURL instead of ValueError as the new error case's exception. (pythonGH-13044) Co-Authored-By: Miro Hrončok <miro@hroncok.cz>
This comment has been minimized.
This comment has been minimized.
bedevere-bot
commented
May 7, 2019
GH-13154 is a backport of this pull request to the 3.7 branch. |
This comment has been minimized.
This comment has been minimized.
bedevere-bot
commented
May 7, 2019
GH-13155 is a backport of this pull request to the 3.6 branch. |
Disallow control chars in http URLs in urllib.urlopen. This addresses a potential security problem for applications that do not sanity check their URLs where http request headers could be injected. Disable https related urllib tests on a build without ssl (pythonGH-13032) These tests require an SSL enabled build. Skip these tests when python is built without SSL to fix test failures. Use http.client.InvalidURL instead of ValueError as the new error case's exception. (pythonGH-13044) Co-Authored-By: Miro Hrončok <miro@hroncok.cz>
Disallow control chars in http URLs in urllib.urlopen. This addresses a potential security problem for applications that do not sanity check their URLs where http request headers could be injected. Disable https related urllib tests on a build without ssl (GH-13032) These tests require an SSL enabled build. Skip these tests when python is built without SSL to fix test failures. Use http.client.InvalidURL instead of ValueError as the new error case's exception. (GH-13044) Backport Co-Authored-By: Miro Hrončok <miro@hroncok.cz>
Disallow control chars in http URLs in urllib.urlopen. This addresses a potential security problem for applications that do not sanity check their URLs where http request headers could be injected. Disable https related urllib tests on a build without ssl (GH-13032) These tests require an SSL enabled build. Skip these tests when python is built without SSL to fix test failures. Use http.client.InvalidURL instead of ValueError as the new error case's exception. (GH-13044) Co-Authored-By: Miro Hrončok <miro@hroncok.cz>
This comment has been minimized.
This comment has been minimized.
bedevere-bot
commented
May 8, 2019
|
This comment has been minimized.
This comment has been minimized.
bedevere-bot
commented
May 8, 2019
|
This comment has been minimized.
This comment has been minimized.
bedevere-bot
commented
May 8, 2019
|
This comment has been minimized.
This comment has been minimized.
bedevere-bot
commented
May 8, 2019
|
This comment has been minimized.
This comment has been minimized.
bedevere-bot
commented
May 8, 2019
|
This comment has been minimized.
This comment has been minimized.
bedevere-bot
commented
May 8, 2019
|
This comment has been minimized.
This comment has been minimized.
bedevere-bot
commented
May 8, 2019
|
This comment has been minimized.
This comment has been minimized.
bedevere-bot
commented
May 8, 2019
|
This comment has been minimized.
This comment has been minimized.
bedevere-bot
commented
May 8, 2019
|
This comment has been minimized.
This comment has been minimized.
bedevere-bot
commented
May 8, 2019
|
This comment has been minimized.
This comment has been minimized.
bedevere-bot
commented
May 8, 2019
|
This comment has been minimized.
This comment has been minimized.
bedevere-bot
commented
May 8, 2019
|
Disallow control chars in http URLs in urllib.urlopen. This addresses a potential security problem for applications that do not sanity check their URLs where http request headers could be injected. Disable https related urllib tests on a build without ssl (pythonGH-13032) These tests require an SSL enabled build. Skip these tests when python is built without SSL to fix test failures. Use http.client.InvalidURL instead of ValueError as the new error case's exception. (pythonGH-13044) Co-Authored-By: Miro Hrončok <miro@hroncok.cz>
…honGH-13154) Disallow control chars in http URLs in urllib2.urlopen. This addresses a potential security problem for applications that do not sanity check their URLs where http request headers could be injected. Disable https related urllib tests on a build without ssl (pythonGH-13032) These tests require an SSL enabled build. Skip these tests when python is built without SSL to fix test failures. Use httplib.InvalidURL instead of ValueError as the new error case's exception. (pythonGH-13044) Backport Co-Authored-By: Miro Hrončok <miro@hroncok.cz> (cherry picked from commit 7e200e0) Notes on backport to Python 2.7: * test_urllib tests urllib.urlopen() which quotes the URL and so is not vulerable to HTTP Header Injection. * Add tests to test_urllib2 on urllib2.urlopen(). * Reject non-ASCII characters: range 0x80-0xff.
…H-13315) Disallow control chars in http URLs in urllib2.urlopen. This addresses a potential security problem for applications that do not sanity check their URLs where http request headers could be injected. Disable https related urllib tests on a build without ssl (GH-13032) These tests require an SSL enabled build. Skip these tests when python is built without SSL to fix test failures. Use httplib.InvalidURL instead of ValueError as the new error case's exception. (GH-13044) Backport Co-Authored-By: Miro Hrončok <miro@hroncok.cz> (cherry picked from commit 7e200e0) Notes on backport to Python 2.7: * test_urllib tests urllib.urlopen() which quotes the URL and so is not vulerable to HTTP Header Injection. * Add tests to test_urllib2 on urllib2.urlopen(). * Reject non-ASCII characters: range 0x80-0xff.
Disallow control chars in http URLs in urllib.urlopen. This addresses a potential security problem for applications that do not sanity check their URLs where http request headers could be injected. Disable https related urllib tests on a build without ssl (pythonGH-13032) These tests require an SSL enabled build. Skip these tests when python is built without SSL to fix test failures. Use http.client.InvalidURL instead of ValueError as the new error case's exception. (pythonGH-13044) Co-Authored-By: Miro Hrončok <miro@hroncok.cz> Signed-off-by: Tapas Kundu <tkundu@vmware.com>
Disallow control chars in http URLs in urllib.urlopen. This addresses a potential security problem for applications that do not sanity check their URLs where http request headers could be injected. Disable https related urllib tests on a build without ssl (GH-13032) These tests require an SSL enabled build. Skip these tests when python is built without SSL to fix test failures. Use http.client.InvalidURL instead of ValueError as the new error case's exception. (GH-13044) Co-Authored-By: Miro Hrončok <miro@hroncok.cz>
…honGH-13154) Disallow control chars in http URLs in urllib.urlopen. This addresses a potential security problem for applications that do not sanity check their URLs where http request headers could be injected. Disable https related urllib tests on a build without ssl (pythonGH-13032) These tests require an SSL enabled build. Skip these tests when python is built without SSL to fix test failures. Use http.client.InvalidURL instead of ValueError as the new error case's exception. (pythonGH-13044) Backport Co-Authored-By: Miro Hrončok <miro@hroncok.cz>
gpshead commentedApr 10, 2019
•
edited
bpo-36276 and bpo-30458:
disallow control characters such as whitespace in urls put into the raw http protocol within http.client.
https://bugs.python.org/issue36276
https://bugs.python.org/issue30458