Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-115398: Expose Expat >=2.6.0 reparse deferral API (CVE-2023-52425) #115623

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

hartwork
Copy link
Contributor

@hartwork hartwork commented Feb 18, 2024

Mission

Allow controlling Expat >=2.6.0 reparse deferral (CVE-2023-52425) by adding five new methods:

  • pyexpat.xmlparser.GetReparseDeferralEnabled
  • pyexpat.xmlparser.SetReparseDeferralEnabled
  • xml.etree.ElementTree.XMLParser.flush
  • xml.etree.ElementTree.XMLPullParser.flush
  • xml.sax.expatreader.ExpatParser.flush

Based on the "flush" idea from #115138 (comment) .

Notes

  • Please treat as a security fix related to CVE-2023-52425.
  • Please note that this is my first contact with the CPython Argument Clinic.
  • I'm happy to add more end user documentation once it is clear how you feel about the technical aspects of this pull request.
  • Looking forward to your review 🍻

CC @serhiy-storchaka

Include/pyexpat.h Outdated Show resolved Hide resolved
Lib/xml/etree/ElementTree.py Show resolved Hide resolved
Lib/xml/sax/expatreader.py Show resolved Hide resolved
Modules/_elementtree.c Outdated Show resolved Hide resolved
Modules/_elementtree.c Outdated Show resolved Hide resolved
Modules/pyexpat.c Outdated Show resolved Hide resolved
Modules/pyexpat.c Outdated Show resolved Hide resolved
Modules/pyexpat.c Outdated Show resolved Hide resolved
Modules/_elementtree.c Outdated Show resolved Hide resolved
@hartwork hartwork force-pushed the expat-2-6-0-reparse-deferral-api branch 2 times, most recently from d80dcc4 to 5f23237 Compare February 18, 2024 16:31
@serhiy-storchaka
Copy link
Member

Please do not use rebase and force push. It makes reviewing more difficult -- the reviewer needs to re-read all changes instead of only the difference since the last review.

Use merge if you need, but this is only needed if there are conflicts with the main branch. At the end all commits will be squashed and merged as a single commits.

@hartwork hartwork force-pushed the expat-2-6-0-reparse-deferral-api branch from 5f23237 to a62e863 Compare February 18, 2024 16:46
@hartwork
Copy link
Contributor Author

Please do not use rebase and force push. It makes reviewing more difficult -- the reviewer needs to re-read all changes instead of only the difference since the last review.

@serhiy-storchaka that's what GitHub has "compare" buttons for, for every push done to the pull request branch:

compare_button

The e-mail that GitHub sends also contain these links.

Use merge if you need, but this is only needed if there are conflicts with the main branch.
At the end all commits will be squashed and merged as a single commits.

That's sad. Merge commits should not be in pull requests and commit cuttings do have semantic value. I'll "go cry somewhere now".

@hartwork hartwork force-pushed the expat-2-6-0-reparse-deferral-api branch 3 times, most recently from 8c26bbe to 439d970 Compare February 18, 2024 20:27
@hartwork hartwork force-pushed the expat-2-6-0-reparse-deferral-api branch from 439d970 to 850e46d Compare February 18, 2024 20:34
@serhiy-storchaka
Copy link
Member

The "Compare" link shows a lot of unrelated changes. This is why it is better to use merge instead of rebase.

@hartwork
Copy link
Contributor Author

The "Compare" link shows a lot of unrelated changes. This is why it is better to use merge instead of rebase.

@serhiy-storchaka only because I made a mistake in one of the rebases. It works perfectly if the merge base is the same and the person pushing does the right thing.

@serhiy-storchaka
Copy link
Member

For future, please don't use rebase.


elementtreestate *st = self->state;

if (EXPAT(st, SetReparseDeferralEnabled) == NULL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should check the result of GetReparseDeferralEnabled() as well.

Copy link

@Snild-Sony Snild-Sony left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding an optional data arg to flush() could be useful for performance: it would allow merging feed(data); flush() into just flush(data), avoiding unnecessary reparsing (specifically, in the case where the original feed also tried to parse).

Another option would be to add a default-false flush flag to feed(). This would perhaps be a better API, as it will be very clear that the flush operation can cause the same effects and errors as a normal feed.

In both cases, the flush operation should be documented along the lines of "Try to parse all available data, possibly at a very high performance cost".

@@ -1726,6 +1731,17 @@ def close(self):
del self.parser, self._parser
del self.target, self._target

def flush(self):
if not self.parser.GetReparseDeferralEnabled():
return

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's technically a possibility that deferral was disabled after the last feed(), which would make this early return skip a flush that could actually have had an effect.

AFAICT, this would require reaching into this object and using its parser attribute to call SetReparseDeferralEnabled() -- do we care about such usecases, or is the programmer on their own in that case?

@serhiy-storchaka
Copy link
Member

Another option would be to add a default-false flush flag to feed(). This would perhaps be a better API, as it will be very clear that the flush operation can cause the same effects and errors as a normal feed.

This is an interesting idea. The drawback is that it is more difficult to detect whether the new feature is supported or not: you cannot simply use hasattr(XMLParser, flush). But otherwise it looks attractive. Most users do not need to use flush, most code is not so fragile, this feature is needed to handle the rare cases for which the only alternative -- not to use Expat 2.6.0 at all.

@Snild-Sony
Copy link

The drawback is that it is more difficult to detect whether the new feature is supported or not: you cannot simply use hasattr(XMLParser, flush).

Is there an somewhat standard alternative, e.g. feature flags or something? If not, maybe we can just set self.flush = None (or some other more descriptive name) and use the existence of that attribute as an indicator of having the optional arg? If so, is there a preference between a real attribute and a @property?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants