bpo-33973: Only split request lines on b'\x20' #7932

tipabu · 2018-06-26T18:42:05Z

Otherwise, upgrading a Python 2 server to Python 3 would break
previously working (if misbehaving) clients that send unquoted
UTF-8 request lines. While a client would be out of spec for
sending a request-line that includes bytes outside of the ASCII
range, this was previously allowed and worked as expected under
Python 2.7.

https://bugs.python.org/issue33973

vstinner · 2018-06-26T22:33:49Z

Lib/http/server.py

@@ -280,9 +280,9 @@ def parse_request(self):
        self.request_version = version = self.default_request_version
        self.close_connection = True
        requestline = str(self.raw_requestline, 'iso-8859-1')
-        requestline = requestline.rstrip('\r\n')
+        requestline = requestline.rstrip(' \r\n')


Why do you strip trailing spaces?

Leaving it as rstrip('\r\n') would cause test_version_none_get to fail:

FAIL: test_version_none_get (Lib.test.test_httpservers.BaseHTTPServerTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File ".../cpython/Lib/test/test_httpservers.py", line 167, in test_version_none_get self.assertEqual(res.status, HTTPStatus.NOT_IMPLEMENTED) AssertionError: 400 != <HTTPStatus.NOT_IMPLEMENTED: 501>

Maybe the 400 would be better, though?

I see. But you should not modify requestline.

vstinner · 2018-06-27T00:38:22Z

Lib/http/server.py

@@ -280,9 +280,9 @@ def parse_request(self):
        self.request_version = version = self.default_request_version
        self.close_connection = True
        requestline = str(self.raw_requestline, 'iso-8859-1')
-        requestline = requestline.rstrip('\r\n')
+        requestline = requestline.rstrip(' \r\n')


I see. But you should not modify requestline.

vstinner · 2018-06-27T00:38:56Z

Lib/http/server.py

        self.requestline = requestline
-        words = requestline.split()
+        words = requestline.split(' ')


A different fix (leaving requestline unchanged) would be: words = requestline.strip(' ').split(' ')

But I didn't the HTTP RFC, so I don't know the exact grammar here.

vstinner · 2018-06-27T00:41:06Z

@1st1, @asvetlov: Would you mind to review this change? It seems like http.server parser allows U+00A0 (b'\xA0') whereas it shouldn't.

vstinner

At least, the change lacks an unit test for non regression.

tipabu · 2018-06-27T18:42:18Z

A different fix (leaving requestline unchanged) would be: words = requestline.strip(' ').split(' ')

Done -- good idea.

the change lacks an unit test for non regression

I'm not sure I understand. Backing out the change to http.server with git checkout @~ -- Lib/http/server.py, the added test fails as I was expecting:

FAIL: test_unicode_space (Lib.test.test_httpservers.BaseHTTPRequestHandlerTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File ".../Lib/test/test_httpservers.py", line 912, in test_unicode_space
    self.assertEqual(expected, result[0][:len(expected)])
AssertionError: b'HTTP/1.1 200 ' != b'HTTP/1.1 400 '

Is there another test you'd like to see?

vstinner · 2018-06-27T20:43:20Z

Sorry, I didn't read carefully your PR... I just missed the new test.

tipabu · 2019-02-05T01:33:18Z

Anything else I can do to help get this moving? I work on a py2 project with a very... robust... test suite, and this keeps me from being able to run its functional tests on py3...

tipabu · 2019-05-16T04:44:49Z

Well, the good news is I've got a workaround. But, quoting my reviewer,

this is horrifying

benjaminp · 2019-09-10T14:04:03Z

Lib/http/server.py

@@ -282,7 +282,7 @@ def parse_request(self):
        requestline = str(self.raw_requestline, 'iso-8859-1')
        requestline = requestline.rstrip('\r\n')
        self.requestline = requestline
-        words = requestline.split()
+        words = requestline.rstrip(' ').split(' ')


What's the reason for the rstrip here? (I understand split() does the equivalent of this but it also does lstrip()). Sure, if there's trailing spaces, we can just consider them part of the path?

Removing the .rstrip(' ') would cause test_version_none_get to fail:

FAIL: test_version_none_get (Lib.test.test_httpservers.BaseHTTPServerTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File ".../cpython/Lib/test/test_httpservers.py", line 167, in test_version_none_get self.assertEqual(res.status, HTTPStatus.NOT_IMPLEMENTED) AssertionError: 400 != <HTTPStatus.NOT_IMPLEMENTED: 501>

(...since we get a blank version which trips the 400 down at line 306.)

Sure, if there's trailing spaces, we can just consider them part of the path?

In general, they'd wind up tacked onto the version rather than the path, tripping the same 400. We already (and appropriately, in my mind) respond 400 to requests like

b'GET /some/path HTTP/1.0 '

It'd only end up on the path if the client was speaking HTTP/0.9 like

b'GET /some/path '

I could be talked into doing a .strip(' ') here instead; this would allow us to continue processing requests like

b' GET /some/path HTTP/1.0 '

like we currently do (whereas my patch will cause an early exit with Bad request syntax).

However, considering that the bug was the result of lax parsing, I was inclined to do the minimal amount of requestline massaging to keep existing tests passing.

I'm all in favor of stricter parsing, which is why I don't understand why trailing whitespace should be more privileged than leading whitespace. In my view, http.client is buggily constructing a request line when _http_vsn_str is empty.

Lib/http/client.py

arhadthedev · 2022-02-05T09:45:39Z

Failure of Lib/test/test_urllib2.py:test_issue16464() on macOS looks unrelated.

@asvetlov This PR is maturing for two-and-a-half years already; can be it taken into consideration?

Otherwise, upgrading a Python 2 server to Python 3 would break previously working (if misbehaving) clients that send unquoted UTF-8 request lines. While a client would be out of spec for sending a request-line that includes bytes outside of the ASCII range, this was previously allowed and worked as expected under Python 2.7. Co-authored-by: Oleg Iarygin <dralife@yandex.ru>

github-actions · 2024-09-21T00:09:36Z

This PR is stale because it has been open for 30 days with no activity.

the-knights-who-say-ni added the CLA signed label Jun 26, 2018

bedevere-bot added the awaiting review label Jun 26, 2018

vstinner reviewed Jun 26, 2018

View reviewed changes

vstinner reviewed Jun 27, 2018

View reviewed changes

vstinner requested review from 1st1 and asvetlov June 27, 2018 00:40

vstinner reviewed Jun 27, 2018

View reviewed changes

tipabu force-pushed the spaces-in-http-request-line branch from 2d775f0 to bbd9ae7 Compare June 27, 2018 18:41

benjaminp reviewed Sep 10, 2019

View reviewed changes

csabella requested review from 1st1, asvetlov and vstinner and removed request for 1st1 and asvetlov January 25, 2020 13:17

tipabu force-pushed the spaces-in-http-request-line branch from bbd9ae7 to f89a800 Compare July 17, 2020 18:30

tipabu force-pushed the spaces-in-http-request-line branch from f89a800 to 6efdd40 Compare February 5, 2022 01:42

arhadthedev requested changes Feb 5, 2022

View reviewed changes

Lib/http/client.py Outdated Show resolved Hide resolved

bedevere-bot added awaiting core review and removed awaiting review labels Feb 5, 2022

asvetlov self-assigned this Feb 21, 2022

ezio-melotti removed the CLA signed label Jul 13, 2022

tipabu force-pushed the spaces-in-http-request-line branch from 674c7c1 to a85f83d Compare January 11, 2024 19:55

tipabu mannequin mentioned this pull request Apr 10, 2022

HTTP request-line parsing splits on Unicode whitespace #78154

Closed

github-actions bot added the stale Stale PR or inactive for long period of time. label Sep 21, 2024

Uh oh!

bpo-33973: Only split request lines on b'\x20' #7932

Are you sure you want to change the base?

bpo-33973: Only split request lines on b'\x20' #7932

Uh oh!

Conversation

tipabu commented Jun 26, 2018 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vstinner Jun 26, 2018

Choose a reason for hiding this comment

Uh oh!

tipabu Jun 26, 2018

Choose a reason for hiding this comment

Uh oh!

vstinner Jun 27, 2018

Choose a reason for hiding this comment

Uh oh!

vstinner Jun 27, 2018

Choose a reason for hiding this comment

Uh oh!

vstinner Jun 27, 2018

Choose a reason for hiding this comment

Uh oh!

vstinner commented Jun 27, 2018

Uh oh!

vstinner left a comment

Choose a reason for hiding this comment

Uh oh!

tipabu commented Jun 27, 2018

Uh oh!

vstinner commented Jun 27, 2018

Uh oh!

tipabu commented Feb 5, 2019

Uh oh!

tipabu commented May 16, 2019

Uh oh!

benjaminp Sep 10, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tipabu Sep 11, 2019

Choose a reason for hiding this comment

Uh oh!

benjaminp Sep 12, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

arhadthedev commented Feb 5, 2022

Uh oh!

github-actions bot commented Sep 21, 2024

Uh oh!

Uh oh!

tipabu commented Jun 26, 2018 •

edited by bedevere-bot

Loading

benjaminp Sep 10, 2019 •

edited

Loading