-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
bpo-36226: Fix multipart false positive header defects #12214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA). Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA (this might be simply due to a missing "GitHub Name" entry in your b.p.o account settings). This is necessary for legal reasons before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. You can check yourself to see if the CLA has been received. Thanks again for your contribution, we look forward to reviewing it! |
@grische, thank you for your interest in contributing to CPython. In order to have someone review your change, you'll need to sign the Contributor agreement. Thanks! |
Thanks for the PR, but closing as the CLA has not been signed within the last month. If you do decide to sign the CLA we can re-open this PR. |
@brettcannon the CLA has been signed a long time ago, but the system doesn't show the results. Your legal team has been informed and should be aware of the issue. |
@grische not sure who you emailed but you should email contributors@python.org to find out what's going on with your CLA. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes overall looks good to me. I have some small comments inline.
Also, please add a news entry using the blurb tool.
Lib/test/test_email/test_parser.py
Outdated
@@ -105,6 +105,12 @@ def message_from_binary_file(s, *args, **kw): | |||
class TestBytesParser(TestParserBase, TestEmailBase): | |||
parsers = (message_from_bytes, message_from_binary_file) | |||
|
|||
class TestFeedParser(TestCustomMessage): | |||
def test_multipart_message_with_headers_only(self): | |||
import email.parser |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this to the top of the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved it to the top
Lib/test/test_email/test_parser.py
Outdated
@@ -105,6 +105,12 @@ def message_from_binary_file(s, *args, **kw): | |||
class TestBytesParser(TestParserBase, TestEmailBase): | |||
parsers = (message_from_bytes, message_from_binary_file) | |||
|
|||
class TestFeedParser(TestCustomMessage): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a need for this to inherit from TestCustomMessage
, let's just do TestEmailBase
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to TestEmailBase
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
root cannot be a list if only the headers have been passed on and are being parsed. This fixes false MultipartInvariantViolationDefects
Currently parse_headers only gives headers to the parser. For the parser to correctly handle this case, it also needs to set headersonly=True This fixes false StartBoundaryNotFoundDefects
8928543
to
bb9dd1b
Compare
I added the news entries for both commits, as they are fixing different parts. I have made the requested changes; please review again |
Thanks for making the requested changes! @maxking: please review the changes made to this pull request. |
Catching up on 4 years of conversations on b.p.o, this issue is way more complicated than it looks :) Most of the context (and one another associated patch) is present in https://bugs.python.org/issue24363, https://bugs.python.org/issue29353. Martin Panter thinks using headersonly can cause problems when trying to convert it back to string
See bpo-29353 has context on this side effect of using headersonly. |
@maxking Thanks for digging into that. I am aware that there are more issues in the parser. However, I am confident that my patches are on the right track to improve the situation without breaking backwards compatibility, as I didn't observe any failing tests. It seems that you are taking care of the outstanding (and related) issues on bpo-24363? It would be good to have these sorted as well. Regarding the |
@maxking are there any further changes you are requesting? |
@maxking ping |
This PR is stale because it has been open for 30 days with no activity. |
@maxking ping |
The current implementation of
multipart/related
in urllib triggers header defects even though the headers are valid:[StartBoundaryNotFoundDefect(), MultipartInvariantViolationDefect()]
The example header is valid according to RFC 2387 (https://tools.ietf.org/html/rfc2387):
Both defects are triggered by the fact that httplib only passes on headers to the underlying email parser, while the email parser assumes to receive a full message. The simple fix is to tell the underlying email parser that we are only passing the header: 8928543
The second issue is related, but independent: The underlying email parser checks if the parsed message is of type multipart by checking of the object "root" is of type list. As we only passed the header (and set
headersonly=True
), the check does makes no sense anymore at this point, creating a false positive: a82e662This issue has been a while with python (backport to 3.7 easily possible) and can also be found in urllib3 (hence also requests, etc):
https://bugs.python.org/issue36226