Skip to content

bpo-36226: Fix multipart false positive header defects #12214

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

grische
Copy link

@grische grische commented Mar 7, 2019

The current implementation of multipart/related in urllib triggers header defects even though the headers are valid:
[StartBoundaryNotFoundDefect(), MultipartInvariantViolationDefect()]

The example header is valid according to RFC 2387 (https://tools.ietf.org/html/rfc2387):

Content-Type: multipart/related; boundary="===" 

Both defects are triggered by the fact that httplib only passes on headers to the underlying email parser, while the email parser assumes to receive a full message. The simple fix is to tell the underlying email parser that we are only passing the header: 8928543

The second issue is related, but independent: The underlying email parser checks if the parsed message is of type multipart by checking of the object "root" is of type list. As we only passed the header (and set headersonly=True), the check does makes no sense anymore at this point, creating a false positive: a82e662

This issue has been a while with python (backport to 3.7 easily possible) and can also be found in urllib3 (hence also requests, etc):

https://bugs.python.org/issue36226

@grische grische requested a review from a team as a code owner March 7, 2019 16:09
@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA (this might be simply due to a missing "GitHub Name" entry in your b.p.o account settings). This is necessary for legal reasons before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

You can check yourself to see if the CLA has been received.

Thanks again for your contribution, we look forward to reviewing it!

@csabella
Copy link
Contributor

csabella commented Jun 5, 2019

@grische, thank you for your interest in contributing to CPython. In order to have someone review your change, you'll need to sign the Contributor agreement. Thanks!

@brettcannon
Copy link
Member

Thanks for the PR, but closing as the CLA has not been signed within the last month. If you do decide to sign the CLA we can re-open this PR.

@brettcannon brettcannon closed this Aug 8, 2019
@grische
Copy link
Author

grische commented Aug 8, 2019

@brettcannon the CLA has been signed a long time ago, but the system doesn't show the results. Your legal team has been informed and should be aware of the issue.

@brettcannon
Copy link
Member

@grische not sure who you emailed but you should email contributors@python.org to find out what's going on with your CLA.

Copy link
Contributor

@maxking maxking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes overall looks good to me. I have some small comments inline.

Also, please add a news entry using the blurb tool.

@@ -105,6 +105,12 @@ def message_from_binary_file(s, *args, **kw):
class TestBytesParser(TestParserBase, TestEmailBase):
parsers = (message_from_bytes, message_from_binary_file)

class TestFeedParser(TestCustomMessage):
def test_multipart_message_with_headers_only(self):
import email.parser
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to the top of the file.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it to the top

@@ -105,6 +105,12 @@ def message_from_binary_file(s, *args, **kw):
class TestBytesParser(TestParserBase, TestEmailBase):
parsers = (message_from_bytes, message_from_binary_file)

class TestFeedParser(TestCustomMessage):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a need for this to inherit from TestCustomMessage, let's just do TestEmailBase instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to TestEmailBase

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

root cannot be a list if only the headers have been passed on and are
being parsed.

This fixes false MultipartInvariantViolationDefects
Currently parse_headers only gives headers to the parser. For the parser
to correctly handle this case, it also needs to set headersonly=True

This fixes false StartBoundaryNotFoundDefects
@grische
Copy link
Author

grische commented Sep 5, 2019

I added the news entries for both commits, as they are fixing different parts.

I have made the requested changes; please review again

@bedevere-bot
Copy link

Thanks for making the requested changes!

@maxking: please review the changes made to this pull request.

@brettcannon brettcannon requested a review from maxking September 9, 2019 15:49
@maxking
Copy link
Contributor

maxking commented Sep 14, 2019

Catching up on 4 years of conversations on b.p.o, this issue is way more complicated than it looks :)

Most of the context (and one another associated patch) is present in https://bugs.python.org/issue24363, https://bugs.python.org/issue29353. Martin Panter thinks using headersonly can cause problems when trying to convert it back to string

Parser().parsestr("Content-Type: message/delivery-status\r\nInvalid line\r\n\r\n", headersonly=True).as_string()

See bpo-29353 has context on this side effect of using headersonly.

@grische
Copy link
Author

grische commented Sep 16, 2019

@maxking Thanks for digging into that.

I am aware that there are more issues in the parser. However, I am confident that my patches are on the right track to improve the situation without breaking backwards compatibility, as I didn't observe any failing tests.

It seems that you are taking care of the outstanding (and related) issues on bpo-24363? It would be good to have these sorted as well.

Regarding the as_string conversion, I agree to David Murray, that the lack of tests for this conversion seems like it is not an expected use-case and hence does not seem like a hard blocker to get this PR merged?

@grische
Copy link
Author

grische commented Sep 25, 2019

@maxking are there any further changes you are requesting?

@grische
Copy link
Author

grische commented Jan 20, 2020

@maxking ping

Copy link

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label Jun 15, 2025
@grische
Copy link
Author

grische commented Jun 15, 2025

@maxking ping

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting change review stale Stale PR or inactive for long period of time.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants