New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ElementTree should use UTF-8 for xml declaration. #91810
Comments
Look at |
Or maybe change |
|
@scoder would you give us an advice? (you are listed as etree expert in expert index). There is no correct behavior, because output is Unicode and etree don't know what is real output encoding. There are some cases that current behavior is better (e.g. using default encoding (e.g. I have two ideas: a. Make UTF-8 default. This is simplest. |
Adding Maybe add a simple mapping from Python encodings to XML encodings (for example we need to write "ascii" as "us-ascii")? Later we can discuss adding a public API for this. |
I proposed to get the default encoding from the file object if available. #91812 (comment) |
Of course, we should recommend to use UTF-8.
We may not know Python encoding because output is Unicode (e.g. Unicode string or StringIO). If we want to support arbitrary encoding, we should add another option like |
…ML declaration ElementTree method write() and function tostring() now use the text file's encoding ("UTF-8" if not available) instead of locale encoding in XML declaration when encoding="unicode" is specified.
Yes, it is a bug, and #91903 fixes it. |
pythonGH-91989) (cherry picked from commit f60b4c3) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
…II data (pythonGH-91989). (cherry picked from commit f60b4c3) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
The difference of #91903 from #91812:
What is common in #91812 and #91903 and different from the current code:
|
What do you think about this @methane? |
…ML declaration (pythonGH-91903) ElementTree method write() and function tostring() now use the text file's encoding ("UTF-8" if not available) instead of locale encoding in XML declaration when encoding="unicode" is specified. (cherry picked from commit 707839b) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
…ML declaration (pythonGH-91903) ElementTree method write() and function tostring() now use the text file's encoding ("UTF-8" if not available) instead of locale encoding in XML declaration when encoding="unicode" is specified. (cherry picked from commit 707839b) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
…ML declaration (pythonGH-91903) ElementTree method write() and function tostring() now use the text file's encoding ("UTF-8" if not available) instead of locale encoding in XML declaration when encoding="unicode" is specified. (cherry picked from commit 707839b) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
…laration (GH-91903) ElementTree method write() and function tostring() now use the text file's encoding ("UTF-8" if not available) instead of locale encoding in XML declaration when encoding="unicode" is specified.
…XML declaration (GH-91903) (GH-92663) ElementTree method write() and function tostring() now use the text file's encoding ("UTF-8" if not available) instead of locale encoding in XML declaration when encoding="unicode" is specified. (cherry picked from commit 707839b) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> Automerge-Triggered-By: GH:serhiy-storchaka
…XML declaration (GH-91903) (GH-92664) ElementTree method write() and function tostring() now use the text file's encoding ("UTF-8" if not available) instead of locale encoding in XML declaration when encoding="unicode" is specified. (cherry picked from commit 707839b) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> Automerge-Triggered-By: GH:serhiy-storchaka
…ML declaration (GH-91903) (GH-92665) ElementTree method write() and function tostring() now use the text file's encoding ("UTF-8" if not available) instead of locale encoding in XML declaration when encoding="unicode" is specified. (cherry picked from commit 707839b) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> Automerge-Triggered-By: GH:serhiy-storchaka
…ncoding='unicode' Suppress writing an XML declaration in open files in ElementTree.write() with encoding='unicode' and xml_declaration=None.
…II data (pythonGH-91989). (pythonGH-91994) (cherry picked from commit f60b4c3)
…t in XML declaration (pythonGH-91903) (pythonGH-92665) ElementTree method write() and function tostring() now use the text file's encoding ("UTF-8" if not available) instead of locale encoding in XML declaration when encoding="unicode" is specified. (cherry picked from commit 707839b) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> Automerge-Triggered-By: GH:serhiy-storchaka
…g='unicode' (GH-93426) Suppress writing an XML declaration in open files in ElementTree.write() with encoding='unicode' and xml_declaration=None. If file patch is passed to ElementTree.write() with encoding='unicode', always open a new file in UTF-8.
…ncoding='unicode' (pythonGH-93426) Suppress writing an XML declaration in open files in ElementTree.write() with encoding='unicode' and xml_declaration=None. If file patch is passed to ElementTree.write() with encoding='unicode', always open a new file in UTF-8. (cherry picked from commit d7db9dc3cc5b44d0b4ce000571fecf58089a01ec) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
…ncoding='unicode' (pythonGH-93426) Suppress writing an XML declaration in open files in ElementTree.write() with encoding='unicode' and xml_declaration=None. If file patch is passed to ElementTree.write() with encoding='unicode', always open a new file in UTF-8. (cherry picked from commit d7db9dc3cc5b44d0b4ce000571fecf58089a01ec) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
…ncoding='unicode' (pythonGH-93426) Suppress writing an XML declaration in open files in ElementTree.write() with encoding='unicode' and xml_declaration=None. If file patch is passed to ElementTree.write() with encoding='unicode', always open a new file in UTF-8. (cherry picked from commit d7db9dc) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Feature or enhancement
Currently,
ElementTree.tostring(root, encoding="unicode", xml_declaration=True)
uses locale encoding.I think ElementTree should use UTF-8, instead of locale encoding.
Example:
Code:
cpython/Lib/xml/etree/ElementTree.py
Lines 732 to 742 in bcf14ae
Pitch
cp932
oreucJP
) would be different from XML encoding name recommended by w3c (e.g.Shift_JIS
orEUC-JP
).The text was updated successfully, but these errors were encountered: