Skip to content

bpo-37093: Allow http.client to parse non-ASCII header names #13788

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tipabu
Copy link
Contributor

@tipabu tipabu commented Jun 3, 2019

Previously, when http.client tried to parse a response from an out-of-spec server that sent a header with a non-ASCII name, email.feedparser would assume that the non-compliant header must be part of a message body and abort parsing. However, http.client already determined the boundary between headers and body and only passed the headers to the parser. As a result, any headers after the first non-compliant one would be silently (!) ignored. This could include headers important for message framing like Content-Length and Transfer-Encoding.

In the long-long ago, this parsing was handled by the rfc822 module, which didn't care about which bytes were in the header as long as there was a colon in the line.

Now, add an optional argument to the email parsers to decide whether to require strict RFC-compliant header names. Default this to True to minimize the possibility of breaking other callers. In http.client, which already knows where the headers end and body begins, use False.

Note that the non-ASCII names will be decoded as ISO-8859-1 in keeping with how header values are decoded.

https://bugs.python.org/issue37093

Copy link
Contributor

@ZackerySpytz ZackerySpytz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the documentation.

"""Create a message structure from the data in a file.

Reads all the data from the file and returns the root of the message
structure. Optional headersonly is a flag specifying whether to stop
parsing after reading the headers or not. The default is False,
meaning it parses the entire contents of the file.
"""
feedparser = FeedParser(self._class, policy=self.policy)
feedparser = FeedParser(self._class, policy=self.policy, strictheaders=strictheaders)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please limit lines to 79 characters (PEP 8).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's still a litany of line-length violations in Lib/http/client.py and Lib/test/test_httplib.py but I think now at least I'm not making things any worse.

@@ -0,0 +1 @@
http.client now parses non-ASCII header names.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:mod:`http.client`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Previously, when http.client tried to parse a response from an
out-of-spec server that sent a header with a non-ASCII name,
email.feedparser would assume that the non-compliant header must be
part of a message body and abort parsing. However, http.client already
determined the boundary between headers and body and only passed the
headers to the parser. As a result, any headers after the first
non-compliant one would be silently (!) ignored. This could include
headers important for message framing like Content-Length and
Transfer-Encoding.

In the long-long ago, this parsing was handled by the rfc822 module,
which didn't care about which bytes were in the header as long as there
was a colon in the line.

Now, add an optional argument to the email parsers to decide whether to
require strict RFC-compliant header names. Default this to True to
minimize the possibility of breaking other callers. In http.client,
which already knows where the headers end and body begins, use False.

Note that the non-ASCII names will be decoded as ISO-8859-1 in keeping
with how header values are decoded.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants