-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
bpo-37093: Allow http.client to parse non-ASCII header names #13788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the documentation.
Lib/email/parser.py
Outdated
"""Create a message structure from the data in a file. | ||
|
||
Reads all the data from the file and returns the root of the message | ||
structure. Optional headersonly is a flag specifying whether to stop | ||
parsing after reading the headers or not. The default is False, | ||
meaning it parses the entire contents of the file. | ||
""" | ||
feedparser = FeedParser(self._class, policy=self.policy) | ||
feedparser = FeedParser(self._class, policy=self.policy, strictheaders=strictheaders) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please limit lines to 79 characters (PEP 8).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's still a litany of line-length violations in Lib/http/client.py
and Lib/test/test_httplib.py
but I think now at least I'm not making things any worse.
@@ -0,0 +1 @@ | |||
http.client now parses non-ASCII header names. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:mod:`http.client`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Previously, when http.client tried to parse a response from an out-of-spec server that sent a header with a non-ASCII name, email.feedparser would assume that the non-compliant header must be part of a message body and abort parsing. However, http.client already determined the boundary between headers and body and only passed the headers to the parser. As a result, any headers after the first non-compliant one would be silently (!) ignored. This could include headers important for message framing like Content-Length and Transfer-Encoding. In the long-long ago, this parsing was handled by the rfc822 module, which didn't care about which bytes were in the header as long as there was a colon in the line. Now, add an optional argument to the email parsers to decide whether to require strict RFC-compliant header names. Default this to True to minimize the possibility of breaking other callers. In http.client, which already knows where the headers end and body begins, use False. Note that the non-ASCII names will be decoded as ISO-8859-1 in keeping with how header values are decoded.
Previously, when
http.client
tried to parse a response from an out-of-spec server that sent a header with a non-ASCII name,email.feedparser
would assume that the non-compliant header must be part of a message body and abort parsing. However,http.client
already determined the boundary between headers and body and only passed the headers to the parser. As a result, any headers after the first non-compliant one would be silently (!) ignored. This could include headers important for message framing likeContent-Length
andTransfer-Encoding
.In the long-long ago, this parsing was handled by the
rfc822
module, which didn't care about which bytes were in the header as long as there was a colon in the line.Now, add an optional argument to the email parsers to decide whether to require strict RFC-compliant header names. Default this to
True
to minimize the possibility of breaking other callers. Inhttp.client
, which already knows where the headers end and body begins, useFalse
.Note that the non-ASCII names will be decoded as ISO-8859-1 in keeping with how header values are decoded.
https://bugs.python.org/issue37093