bpo-37093: Allow http.client to parse non-ASCII header names #13788

tipabu · 2019-06-03T22:20:58Z

Previously, when http.client tried to parse a response from an out-of-spec server that sent a header with a non-ASCII name, email.feedparser would assume that the non-compliant header must be part of a message body and abort parsing. However, http.client already determined the boundary between headers and body and only passed the headers to the parser. As a result, any headers after the first non-compliant one would be silently (!) ignored. This could include headers important for message framing like Content-Length and Transfer-Encoding.

In the long-long ago, this parsing was handled by the rfc822 module, which didn't care about which bytes were in the header as long as there was a colon in the line.

Now, add an optional argument to the email parsers to decide whether to require strict RFC-compliant header names. Default this to True to minimize the possibility of breaking other callers. In http.client, which already knows where the headers end and body begins, use False.

Note that the non-ASCII names will be decoded as ISO-8859-1 in keeping with how header values are decoded.

https://bugs.python.org/issue37093

ZackerySpytz

Please update the documentation.

ZackerySpytz · 2020-08-22T05:07:45Z

Lib/email/parser.py

        """Create a message structure from the data in a file.

        Reads all the data from the file and returns the root of the message
        structure.  Optional headersonly is a flag specifying whether to stop
        parsing after reading the headers or not.  The default is False,
        meaning it parses the entire contents of the file.
        """
-        feedparser = FeedParser(self._class, policy=self.policy)
+        feedparser = FeedParser(self._class, policy=self.policy, strictheaders=strictheaders)


Please limit lines to 79 characters (PEP 8).

There's still a litany of line-length violations in Lib/http/client.py and Lib/test/test_httplib.py but I think now at least I'm not making things any worse.

ZackerySpytz · 2020-08-22T05:08:12Z

Misc/NEWS.d/next/Library/2019-06-17-08-42-34.bpo-37093.T2sOF8.rst

@@ -0,0 +1 @@
+http.client now parses non-ASCII header names.


:mod:`http.client`

Previously, when http.client tried to parse a response from an out-of-spec server that sent a header with a non-ASCII name, email.feedparser would assume that the non-compliant header must be part of a message body and abort parsing. However, http.client already determined the boundary between headers and body and only passed the headers to the parser. As a result, any headers after the first non-compliant one would be silently (!) ignored. This could include headers important for message framing like Content-Length and Transfer-Encoding. In the long-long ago, this parsing was handled by the rfc822 module, which didn't care about which bytes were in the header as long as there was a colon in the line. Now, add an optional argument to the email parsers to decide whether to require strict RFC-compliant header names. Default this to True to minimize the possibility of breaking other callers. In http.client, which already knows where the headers end and body begins, use False. Note that the non-ASCII names will be decoded as ISO-8859-1 in keeping with how header values are decoded.

tipabu requested a review from a team as a code owner June 3, 2019 22:20

the-knights-who-say-ni added the CLA signed label Jun 3, 2019

bedevere-bot added the awaiting review label Jun 3, 2019

tipabu force-pushed the bpo-37093 branch from 42012d4 to d94d375 Compare June 17, 2019 16:05

tipabu mentioned this pull request Jul 14, 2019

wsgi: Work around CPython bug when parsing non-ASCII headers eventlet/eventlet#574

Open

csabella requested a review from maxking January 16, 2020 12:14

tipabu force-pushed the bpo-37093 branch from d94d375 to 25546ff Compare July 17, 2020 18:52

ZackerySpytz reviewed Aug 22, 2020

View reviewed changes

tipabu force-pushed the bpo-37093 branch from 25546ff to 442dd62 Compare February 5, 2022 01:36

ezio-melotti removed the CLA signed label Jul 13, 2022

tipabu force-pushed the bpo-37093 branch from 442dd62 to 568f6db Compare November 3, 2022 19:16

tipabu mentioned this pull request Apr 26, 2024

HTTP request-line parsing splits on Unicode whitespace #78154

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

bpo-37093: Allow http.client to parse non-ASCII header names #13788

bpo-37093: Allow http.client to parse non-ASCII header names #13788

Uh oh!

tipabu commented Jun 3, 2019 •

edited by bedevere-bot

Loading

Uh oh!

ZackerySpytz left a comment

Uh oh!

ZackerySpytz Aug 22, 2020

Uh oh!

tipabu Feb 5, 2022

Uh oh!

ZackerySpytz Aug 22, 2020

Uh oh!

tipabu Feb 5, 2022

Uh oh!

Uh oh!

Uh oh!

bpo-37093: Allow http.client to parse non-ASCII header names #13788

Are you sure you want to change the base?

bpo-37093: Allow http.client to parse non-ASCII header names #13788

Uh oh!

Conversation

tipabu commented Jun 3, 2019 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ZackerySpytz left a comment

Choose a reason for hiding this comment

Uh oh!

ZackerySpytz Aug 22, 2020

Choose a reason for hiding this comment

Uh oh!

tipabu Feb 5, 2022

Choose a reason for hiding this comment

Uh oh!

ZackerySpytz Aug 22, 2020

Choose a reason for hiding this comment

Uh oh!

tipabu Feb 5, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tipabu commented Jun 3, 2019 •

edited by bedevere-bot

Loading