-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
bpo-33973: Only split request lines on b'\x20' #7932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Lib/http/server.py
Outdated
@@ -280,9 +280,9 @@ def parse_request(self): | |||
self.request_version = version = self.default_request_version | |||
self.close_connection = True | |||
requestline = str(self.raw_requestline, 'iso-8859-1') | |||
requestline = requestline.rstrip('\r\n') | |||
requestline = requestline.rstrip(' \r\n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you strip trailing spaces?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving it as rstrip('\r\n')
would cause test_version_none_get to fail:
FAIL: test_version_none_get (Lib.test.test_httpservers.BaseHTTPServerTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File ".../cpython/Lib/test/test_httpservers.py", line 167, in test_version_none_get
self.assertEqual(res.status, HTTPStatus.NOT_IMPLEMENTED)
AssertionError: 400 != <HTTPStatus.NOT_IMPLEMENTED: 501>
Maybe the 400 would be better, though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. But you should not modify requestline.
Lib/http/server.py
Outdated
@@ -280,9 +280,9 @@ def parse_request(self): | |||
self.request_version = version = self.default_request_version | |||
self.close_connection = True | |||
requestline = str(self.raw_requestline, 'iso-8859-1') | |||
requestline = requestline.rstrip('\r\n') | |||
requestline = requestline.rstrip(' \r\n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. But you should not modify requestline.
self.requestline = requestline | ||
words = requestline.split() | ||
words = requestline.split(' ') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A different fix (leaving requestline unchanged) would be: words = requestline.strip(' ').split(' ')
But I didn't the HTTP RFC, so I don't know the exact grammar here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least, the change lacks an unit test for non regression.
2d775f0
to
bbd9ae7
Compare
Done -- good idea.
I'm not sure I understand. Backing out the change to http.server with
Is there another test you'd like to see? |
Sorry, I didn't read carefully your PR... I just missed the new test. |
Anything else I can do to help get this moving? I work on a py2 project with a very... robust... test suite, and this keeps me from being able to run its functional tests on py3... |
Well, the good news is I've got a workaround. But, quoting my reviewer,
|
Lib/http/server.py
Outdated
@@ -282,7 +282,7 @@ def parse_request(self): | |||
requestline = str(self.raw_requestline, 'iso-8859-1') | |||
requestline = requestline.rstrip('\r\n') | |||
self.requestline = requestline | |||
words = requestline.split() | |||
words = requestline.rstrip(' ').split(' ') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for the rstrip
here? (I understand split()
does the equivalent of this but it also does lstrip()
). Sure, if there's trailing spaces, we can just consider them part of the path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the .rstrip(' ')
would cause test_version_none_get
to fail:
FAIL: test_version_none_get (Lib.test.test_httpservers.BaseHTTPServerTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File ".../cpython/Lib/test/test_httpservers.py", line 167, in test_version_none_get
self.assertEqual(res.status, HTTPStatus.NOT_IMPLEMENTED)
AssertionError: 400 != <HTTPStatus.NOT_IMPLEMENTED: 501>
(...since we get a blank version which trips the 400 down at line 306.)
Sure, if there's trailing spaces, we can just consider them part of the path?
In general, they'd wind up tacked onto the version rather than the path, tripping the same 400. We already (and appropriately, in my mind) respond 400 to requests like
b'GET /some/path HTTP/1.0 '
It'd only end up on the path if the client was speaking HTTP/0.9 like
b'GET /some/path '
I could be talked into doing a .strip(' ')
here instead; this would allow us to continue processing requests like
b' GET /some/path HTTP/1.0 '
like we currently do (whereas my patch will cause an early exit with Bad request syntax
).
However, considering that the bug was the result of lax parsing, I was inclined to do the minimal amount of requestline massaging to keep existing tests passing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm all in favor of stricter parsing, which is why I don't understand why trailing whitespace should be more privileged than leading whitespace. In my view, http.client
is buggily constructing a request line when _http_vsn_str
is empty.
bbd9ae7
to
f89a800
Compare
f89a800
to
6efdd40
Compare
Failure of @asvetlov This PR is maturing for two-and-a-half years already; can be it taken into consideration? |
Otherwise, upgrading a Python 2 server to Python 3 would break previously working (if misbehaving) clients that send unquoted UTF-8 request lines. While a client would be out of spec for sending a request-line that includes bytes outside of the ASCII range, this was previously allowed and worked as expected under Python 2.7. Co-authored-by: Oleg Iarygin <dralife@yandex.ru>
674c7c1
to
a85f83d
Compare
This PR is stale because it has been open for 30 days with no activity. |
Otherwise, upgrading a Python 2 server to Python 3 would break
previously working (if misbehaving) clients that send unquoted
UTF-8 request lines. While a client would be out of spec for
sending a request-line that includes bytes outside of the ASCII
range, this was previously allowed and worked as expected under
Python 2.7.
https://bugs.python.org/issue33973