Skip to content

bpo-33973: Only split request lines on b'\x20' #7932

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tipabu
Copy link
Contributor

@tipabu tipabu commented Jun 26, 2018

Otherwise, upgrading a Python 2 server to Python 3 would break
previously working (if misbehaving) clients that send unquoted
UTF-8 request lines. While a client would be out of spec for
sending a request-line that includes bytes outside of the ASCII
range, this was previously allowed and worked as expected under
Python 2.7.

https://bugs.python.org/issue33973

@@ -280,9 +280,9 @@ def parse_request(self):
self.request_version = version = self.default_request_version
self.close_connection = True
requestline = str(self.raw_requestline, 'iso-8859-1')
requestline = requestline.rstrip('\r\n')
requestline = requestline.rstrip(' \r\n')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you strip trailing spaces?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving it as rstrip('\r\n') would cause test_version_none_get to fail:

FAIL: test_version_none_get (Lib.test.test_httpservers.BaseHTTPServerTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File ".../cpython/Lib/test/test_httpservers.py", line 167, in test_version_none_get
    self.assertEqual(res.status, HTTPStatus.NOT_IMPLEMENTED)
AssertionError: 400 != <HTTPStatus.NOT_IMPLEMENTED: 501>

Maybe the 400 would be better, though?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. But you should not modify requestline.

@@ -280,9 +280,9 @@ def parse_request(self):
self.request_version = version = self.default_request_version
self.close_connection = True
requestline = str(self.raw_requestline, 'iso-8859-1')
requestline = requestline.rstrip('\r\n')
requestline = requestline.rstrip(' \r\n')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. But you should not modify requestline.

self.requestline = requestline
words = requestline.split()
words = requestline.split(' ')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A different fix (leaving requestline unchanged) would be: words = requestline.strip(' ').split(' ')

But I didn't the HTTP RFC, so I don't know the exact grammar here.

@vstinner vstinner requested review from 1st1 and asvetlov June 27, 2018 00:40
@vstinner
Copy link
Member

@1st1, @asvetlov: Would you mind to review this change? It seems like http.server parser allows U+00A0 (b'\xA0') whereas it shouldn't.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least, the change lacks an unit test for non regression.

@tipabu tipabu force-pushed the spaces-in-http-request-line branch from 2d775f0 to bbd9ae7 Compare June 27, 2018 18:41
@tipabu
Copy link
Contributor Author

tipabu commented Jun 27, 2018

A different fix (leaving requestline unchanged) would be: words = requestline.strip(' ').split(' ')

Done -- good idea.

the change lacks an unit test for non regression

I'm not sure I understand. Backing out the change to http.server with git checkout @~ -- Lib/http/server.py, the added test fails as I was expecting:

FAIL: test_unicode_space (Lib.test.test_httpservers.BaseHTTPRequestHandlerTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File ".../Lib/test/test_httpservers.py", line 912, in test_unicode_space
    self.assertEqual(expected, result[0][:len(expected)])
AssertionError: b'HTTP/1.1 200 ' != b'HTTP/1.1 400 '

Is there another test you'd like to see?

@vstinner
Copy link
Member

Sorry, I didn't read carefully your PR... I just missed the new test.

@tipabu
Copy link
Contributor Author

tipabu commented Feb 5, 2019

Anything else I can do to help get this moving? I work on a py2 project with a very... robust... test suite, and this keeps me from being able to run its functional tests on py3...

@tipabu
Copy link
Contributor Author

tipabu commented May 16, 2019

Well, the good news is I've got a workaround. But, quoting my reviewer,

this is horrifying

@@ -282,7 +282,7 @@ def parse_request(self):
requestline = str(self.raw_requestline, 'iso-8859-1')
requestline = requestline.rstrip('\r\n')
self.requestline = requestline
words = requestline.split()
words = requestline.rstrip(' ').split(' ')
Copy link
Contributor

@benjaminp benjaminp Sep 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for the rstrip here? (I understand split() does the equivalent of this but it also does lstrip()). Sure, if there's trailing spaces, we can just consider them part of the path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the .rstrip(' ') would cause test_version_none_get to fail:

FAIL: test_version_none_get (Lib.test.test_httpservers.BaseHTTPServerTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File ".../cpython/Lib/test/test_httpservers.py", line 167, in test_version_none_get
    self.assertEqual(res.status, HTTPStatus.NOT_IMPLEMENTED)
AssertionError: 400 != <HTTPStatus.NOT_IMPLEMENTED: 501>

(...since we get a blank version which trips the 400 down at line 306.)

Sure, if there's trailing spaces, we can just consider them part of the path?

In general, they'd wind up tacked onto the version rather than the path, tripping the same 400. We already (and appropriately, in my mind) respond 400 to requests like

b'GET /some/path  HTTP/1.0   '

It'd only end up on the path if the client was speaking HTTP/0.9 like

b'GET /some/path   '

I could be talked into doing a .strip(' ') here instead; this would allow us to continue processing requests like

b'  GET /some/path HTTP/1.0   '

like we currently do (whereas my patch will cause an early exit with Bad request syntax).

However, considering that the bug was the result of lax parsing, I was inclined to do the minimal amount of requestline massaging to keep existing tests passing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm all in favor of stricter parsing, which is why I don't understand why trailing whitespace should be more privileged than leading whitespace. In my view, http.client is buggily constructing a request line when _http_vsn_str is empty.

@csabella csabella requested review from 1st1, asvetlov and vstinner and removed request for 1st1 and asvetlov January 25, 2020 13:17
@tipabu tipabu force-pushed the spaces-in-http-request-line branch from bbd9ae7 to f89a800 Compare July 17, 2020 18:30
@tipabu tipabu force-pushed the spaces-in-http-request-line branch from f89a800 to 6efdd40 Compare February 5, 2022 01:42
@arhadthedev
Copy link
Member

Failure of Lib/test/test_urllib2.py:test_issue16464() on macOS looks unrelated.

@asvetlov This PR is maturing for two-and-a-half years already; can be it taken into consideration?

@asvetlov asvetlov self-assigned this Feb 21, 2022
Otherwise, upgrading a Python 2 server to Python 3 would break
previously working (if misbehaving) clients that send unquoted
UTF-8 request lines. While a client would be out of spec for
sending a request-line that includes bytes outside of the ASCII
range, this was previously allowed and worked as expected under
Python 2.7.

Co-authored-by: Oleg Iarygin <dralife@yandex.ru>
Copy link

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label Sep 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting core review stale Stale PR or inactive for long period of time.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants