New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
findtext returns empty string on integer zero value #91447
Comments
I anticipate this would happen for any "falsey" text attribute value. The For instance:
You can stringify the value for the text attribute as a workaround.
But I'm not familiar with why the current behavior is the way it is. From the API docs: "Note that if the matching element has no text content an empty string is returned." What is considered "no text content"? Any reason not to do something like the following?
|
@eugenetriguba Good call, hadn't tried the Boolean False myself! I ran into this problem while writing some unit tests that take a large element tree with low-level properties of an ISO image file as input. Since these values are used internally by the software for various calculations, stringifying them isn't really an option, as that would make things unnecessarily complex. My own interpretation of "no text content" would also be a "None" match. After posting this, I realized I'd come across this very same issue some ten years ago in another project (for reasons I can't remember I never got round to reporting it back then!). At the time we created a replacement for the findText function (link here), which is along similar lines as your suggestion. |
Is it even meaningful for elements to have an int or boolean as their |
@JelleZijlstra I thought about that as well but felt like this was the less controversial change. It fixes what certainly seems like unintended and incorrect behavior. I assumed we could revisit the text attribute returning different types in a different issue since that is what the behavior seems to have been anyway. What are your thoughts there? |
@JelleZijlstra @eugenetriguba although I completely agree it's somewhat counter-intuitive that the First, the current behavior has been around for a very long time. Because of this, I suspect restricting the type to string-only in some future Python release would break a lot of existing code. Second, I think it's good to keep in mind that ElementTree and the Element object were originally designed to store/represent arbitrary hierarchical data structures, not just XML (even though this is an important use case). See also this 2014 archived snapshot from the (now defunct) effbot site:
I was surprised to see that this scope appears to have narrowed somewhat in the current documentation, which just describes ElementTree as an XML library. In my own case I'm using ElementTree extensively in the jpylyzer software, of which I'm the lead developer. This is a validator tool for the JP2 (JPEG 2000 Part 1) still image format. Since the JP2 format follows a hierarchical tree structure, I'm using the Element object to create an internal representation of a JP2 file. Within this Element object, all extracted header fields from a JP2 are stored as Any future change to ElementTree that restricts the My suggestion would be to address this (as a separate issue) primarily as a documentation issue, i.e. clearly explain the current behavior in the ElementTree documentation. I think it would also be helpful to be more explicit about the scope of ElementTree and its Element object (generic structure for hierarchical data structures vs structure for just XML), since there seems to be a discrepancy here between the current docs and the original Effbot documentation. |
Oh, almost forgot, @eugenetriguba, many thanks (belatedly on my part) for fixing the original issue with #91486, seems I somehow overlooked the notification for this fix! |
Thanks @bitsgalore, that's very helpful! I didn't know ElementTree is useful for use cases other than xml (it's in the xml package, after all). Given your use case, we should continue to support non-string text fields. I agree this could be documented better. I'm still hesitant about changing the behavior as proposed in #91486 though; it could break other users who put non-string content in their text fields. |
@JelleZijlstra Could you explain that? I'm not quite understanding your concern. Currently falsey values (like the integer 0) return "" rather than the actual |
It changes user-visible behavior that users could be relying on. Sure, it's not very intuitive behavior and we wouldn't design it like this from scratch, but it's been like this for a long time, and there may well be code out there relying on this behavior. |
@JelleZijlstra Edited my comment and then saw yours come through. Here was my expanded portion I assume you're saying that if someone has the The docs say it should return an empty string for the following: "Note that if the matching element has no text content an empty string is returned." It depends on how you want to interpret "no text content." I interpret it as being |
@JelleZijlstra But yes, I agree. It does still have that potential. But I would think it is closer aligned to how it is currently documented. |
@JelleZijlstra Is there someone or somewhere I could reach out to who could make a decision on the best way to move forward here? I tried sending an email to the core mentorship mailing list a bit over 2 weeks ago, but I got an automated email back that it is pending approval to be sent. |
Maybe start a discussion at discuss.python.org? |
The original test case here seems pretty awkward. Is this a proper demonstration of the issue?
IIUC you'd like the final output to be the integer |
@gvanrossum Yes, your example demonstrates the issue perfectly. Looking back at my original example, I agree it's a bit overly convoluted. Also, as @eugenetriguba pointed out above, the same thing happens for other "falsey" text attributes, e.g.:
|
Honestly I think I should just merge your PR GH-91486 and be done with it. It will not land in 3.11 though, I don't feel this is a release blocker since the buggy behavior has been around for such a long time. We might fix it in 3.11.1 but that's up to the release manager. |
The API documentation for [findtext](https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.Element.findtext) states that this function gives back an empty string on "no text content." With the previous implementation, this would give back a empty string even on text content values such as 0 or False. This patch attempts to resolve that by only giving back an empty string if the text attribute is set to `None`. Resolves #91447. Automerge-Triggered-By: GH:gvanrossum
…thonGH-91486) The API documentation for [findtext](https://docs.python.org/3/library/xml.etree.elementtree.htmlGH-xml.etree.ElementTree.Element.findtext) states that this function gives back an empty string on "no text content." With the previous implementation, this would give back a empty string even on text content values such as 0 or False. This patch attempts to resolve that by only giving back an empty string if the text attribute is set to `None`. Resolves pythonGH-91447. Automerge-Triggered-By: GH:gvanrossum (cherry picked from commit a95e60d) Co-authored-by: Eugene Triguba <eugenetriguba@gmail.com>
The API documentation for [findtext](https://docs.python.org/3/library/xml.etree.elementtree.htmlGH-xml.etree.ElementTree.Element.findtext) states that this function gives back an empty string on "no text content." With the previous implementation, this would give back a empty string even on text content values such as 0 or False. This patch attempts to resolve that by only giving back an empty string if the text attribute is set to `None`. Resolves GH-91447. Automerge-Triggered-By: GH:gvanrossum (cherry picked from commit a95e60d) Co-authored-by: Eugene Triguba <eugenetriguba@gmail.com>
bitsgalore commentedApr 11, 2022
ElementTree's "findtext" function returns an empty string value if the element's text field contains an integer with value 0. Below example illustrates the issue:
This gives me the following output:
Note how the data type of the "diff" sub-element is "string", even though the source data is an integer.
I'm using Python 3.8.10 on Linux Mint 20.1 Ulyssa (based on Ubuntu Focal Fossa 20.04).
The text was updated successfully, but these errors were encountered: