Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minidom Level 1 DOM compliance #44871

Open
jorend mannequin opened this issue Apr 20, 2007 · 12 comments
Open

minidom Level 1 DOM compliance #44871

jorend mannequin opened this issue Apr 20, 2007 · 12 comments
Labels
stdlib Python modules in the Lib dir topic-XML type-bug An unexpected behavior, bug, or error

Comments

@jorend
Copy link
Mannequin

jorend mannequin commented Apr 20, 2007

BPO 1704134
Nosy @loewis, @devdanzin
Files
  • minidom-level-1-compliance.patch: patch against trunk rev 54884
  • minidom-level-1-compliance.3.patch: updated patch against trunk rev 54954
  • minidom-level-1-compliance.4.patch: updated patch against trunk rev 55014
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2007-04-20.07:39:47.000>
    labels = ['expert-XML', 'type-bug']
    title = 'minidom Level 1 DOM compliance'
    updated_at = <Date 2014-02-03.19:20:23.877>
    user = 'https://bugs.python.org/jorend'

    bugs.python.org fields:

    activity = <Date 2014-02-03.19:20:23.877>
    actor = 'BreamoreBoy'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['XML']
    creation = <Date 2007-04-20.07:39:47.000>
    creator = 'jorend'
    dependencies = []
    files = ['7952', '7953', '7954']
    hgrepos = []
    issue_num = 1704134
    keywords = ['patch']
    message_count = 12.0
    messages = ['52470', '52471', '52472', '52473', '52474', '52475', '52476', '52477', '52478', '52479', '52480', '110582']
    nosy_count = 4.0
    nosy_names = ['loewis', 'nnorwitz', 'jorend', 'ajaksu2']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue1704134'
    versions = ['Python 3.1', 'Python 2.7', 'Python 3.2']

    @jorend
    Copy link
    Mannequin Author

    jorend mannequin commented Apr 20, 2007

    Tested on: Mac OS X 10.4.9

    This patch fixes numerous bugs in xml.dom.minidom and expatbuilder.
    It fixes all the small-to-middling bugs in minidom's DOM Level 1
    compliance that I'm aware of; only large bugs remain (see below).

    Changes: These are mainly fixes for bugs found by the W3C DOM Test
    Suite for DOM Level 1. Python 2.5 fails over 120 of these tests; I
    got the number down to 48.

    • Exposed expat's XML_GetSpecifiedAttributeCount() as a method of
      pyexpat parser objects. (This is needed to set Attr.specified
      correctly.) Documented the new method in Doc/lib/libpyexpat.tex.

    • Attributes that got default values from the DTD didn't show up in
      the DOM. (This was a violation of the XML 1.0 spec! See
      http://www.w3.org/TR/REC-xml/#proc-types section 5.1, "Validating
      and Non-Validating Processors". Even non-validating processors
      must "supply default attribute values" based on a certain subset
      of the DTD!)

    • Attr.specified is now set correctly. Before, it was always False
      (should have been True).

    • Inserting a node into one of its descendants caused an infinite
      loop! :-) Now it throws HierarchyMalarkey, per the spec.

    • Many error conditions specified in the DOM were not detected. In
      particular, InvalidCharacterErr was never raised. The new version
      does a lot more checking.

    • Assigning to nodeValue is now a no-op for node types where it's
      defined to be null.

    • Document.createEntityReference() is implemented. It returns an
      EntityReference node, but the node is not populated from the DTD.
      (That is, the new EntityReference implementation is compliant as
      far as it goes, but incomplete.)

    • Element.removeAttributeNode(attr) now raises NotFoundErr if
      attr belongs to some other Element and merely has the same name as
      an attribute of this Element.

    • Element.setAttributeNode() would sometimes return None erroneously.

    • Element.removeAttributeNode() now returns the removed node.

    • Several CharacterData methods would incorrectly throw if you
      passed node.length as the index.

    • Added Document.xmlVersion (from DOM Level 3). This affects
      INVALID_CHARACTER_ERR checking as specified.

    • Added tests for all of the above.

    • Removed trailing whitespace from lines in Lib/test/test_minidom.py.

    • Deleted obsolete gc testing from test_minidom.

    • In one or two places, broke very large asserts into many small
      asserts. (I was debugging something. This change is inessential,
      but it's a good change, so I kept it.)

    DOM Level 1 bugs remaining:

    • A lot of the readonly properties are not implemented as readonly.
      This would be easy to fix with new-style classes, but these are
      old-style classes that are using property() for a few things--I
      haven't tried to understand it yet. I'm putting this off until
      the present patch lands.

    • All NodeLists should be live views, even the one returned by
      getElementsByName(). It will be hard to fix this while retaining
      pickle backward compatibility, and still harder to do it without
      hurting performance.

    • Attribute nodes' nodeValue and childNodes are supposed to stay in
      sync. This has all the same problems.

    • EntityReference nodes should be populated with child nodes. The
      descendants of EntityReference nodes should be readonly. This is
      slightly less of a headache.

    I haven't even tried to run DOMTS level2 tests yet. I'm sure it'll be
    pretty gruesome.

    @jorend jorend mannequin added topic-XML labels Apr 20, 2007
    @jorend
    Copy link
    Mannequin Author

    jorend mannequin commented Apr 23, 2007

    Justification for the most significant changes:

    1. I added a __setattr__() method to class xml.dom.minidom.Node. This means existing code that subclasses any of the Node classes *and* overrides __setattr__() *and* isn't calling the base class __setattr__... will not get the new __setattr__ functionality, which basically implements a DOM-compatibility quirk. I think it's OK. :)

    2. There's a flag in pyexpat that lets you turn off default attribute values. The flag is marked "use with caution", because turning this off is not XML-compliant. expatbuilder was using the flag. I can't tell why. There was no comment in the code. I think it was just a mistake; I changed it, and minidom passes more DOMTS tests as a result.

    3. To implement Attr.specified, I had to expose another Expat API via pyexpat. This was actually a pretty minor change, but I mention it because I'm sure anyone opening this patch will be shocked that it touches any C code. :)

    4. Added some big regexes to check for non-XML-compliant names and raise InvalidCharacterErr. I didn't see a better way to do this. If anyone is worried about the performance hit from doing these checks, I'll look at it.

    @nnorwitz
    Copy link
    Mannequin

    nnorwitz mannequin commented Apr 24, 2007

    Wow this patch is big!

    I notice a lot of whitespace changes, can you drop those from the patch, it will make it easier to review. I don't see any extra whitespace in the original, so maybe if you ran Tools/scripts/reindent.py over the code that would clean it up. I think that's what Tim uses to normalize whitespace.

    The new tests are good to see. I see you added doc for a new method in xmlparser, but the new class EntityReference from minidom.py is not documented.

    In xml/dom/minidom.py, it seems odd to see if cond: pass else: # do something. What about inverting the condition so pass/else is not necessary. The comment could be improved here as to why this is a no-op.

    The patch itself looked fine to me. I didn't review in complete context (ie, I didn't apply it and review all the code), just the patch itself. I don't know enough about minidom to know if this change it good or not. Other than what I noted above, the patch looks decent on the surface.

    @jorend
    Copy link
    Mannequin Author

    jorend mannequin commented Apr 24, 2007

    Neal, thanks for looking at this.

    Size: Yeah. :-\ I don't know what to do about this. There were a lot of bugs, so there are a lot of diffs. :(

    Whitespace: I checked, and the extra whitespace is definitely in trunk's Lib/test/test_minidom.py; a recent change added it. This patch removes it. (I could revert the whitespace anyway, just to reduce the size of the patch. Let me know if I should do this.)

    Doc: Oops! I will fix this today and upload a new patch.

    pass: Thanks, I'll change as suggested. (Just to explain myself-- I used "if name == 'nodeValue': pass; else: ..." because 'nodeValue' is the special case, and the special behavior is "do nothing". But I'm not attached to that wording. I'll change it.)

    @jorend
    Copy link
    Mannequin Author

    jorend mannequin commented Apr 25, 2007

    OK, the updated patch has the suggested changes (Doc for EntityReference; eliminated "pass" and added comment in Node.__setattr__).

    File Added: minidom-level-1-compliance.2.patch

    @nnorwitz
    Copy link
    Mannequin

    nnorwitz mannequin commented Apr 25, 2007

    Jason, don't worry about the size, I was mostly complaining about the whitespace diffs. I know Tim hadn't done a normalization step for a while. I just did it and am running the tests now. Damn there are a lot! :-(

    The good news is that your patch will go down. The bad news for you is that you are going to have to regenerate it and there will probably be conflicts. I'm hoping Martin will review it as he's quite knowledgeable about XML. I doubt there is any more that I can contribute to this patch other than to say it needs some expert eyes on it, but generally looks good.

    The tests should be done within 15 minutes or so. Then I'll check in whitespace normalization.

    @jorend
    Copy link
    Mannequin Author

    jorend mannequin commented Apr 25, 2007

    File Added: minidom-level-1-compliance.3.patch

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Apr 26, 2007

    I guess we'll need a number of iterations. Some observations:

    • I read the nodeValue specification so that it raises DOMException if you try to set it, and it is "readonly", where I interpret "readonly" as "always null" (because it appears to be the only case where the value is never changing). So __setattr__ should raise that exception, IMO.
    • I think that the checks for correct names should be moved out of the minidom module, into a more global place (because it might be useful independent of minidom), e.g. xml.names. See also xml.utils.characters in PyXML.
    • Setting the xmlVersion to 1.1 should raise NOT_SUPPORTED_ERR IMO: we cannot guarantee that we will write out correct XML 1.1 (i.e. a document that is in Unicode normal form NFC).

    More later.

    @jorend
    Copy link
    Mannequin Author

    jorend mannequin commented Apr 26, 2007

    Martin, thank you very, very much for looking at this. Happy to
    iterate. I'll turn it around as fast as I can, probably tomorrow.

    • Node.nodeValue isn't readonly; setting it raises an exception only
      if the *node* is readonly. DOM Level 2 clarifies: "When it is defined
      to be null, setting it has no effect."

    http://www.w3.org/TR/REC-DOM-Level-1/level-one-core.html#ID-1950641247
    http://www.w3.org/TR/DOM-Level-2-Core/core.html#ID-F68D080

    • I agree about moving the name-checking code out of minidom.
      "xml.names" sounds fine. Should it be official and documented?

      xml.names.is_name(name, [xmlVersion="1.0"]) -> bool
      Returns True if name matches the Name construction in the
      specified version of XML. Raises ValueError if xmlVersion is not
      a known version of XML (currently either "1.0" or "1.1").

    • Regarding xmlVersion = "1.1": I don't think writing "fully
      normalized" XML is required. It's only a "SHOULD" in XML 1.1:

    "All XML parsed entities (including document entities) SHOULD be fully
    normalized [...] However, a document is still well-formed even if it
    is not fully normalized. XML processors SHOULD provide a user option
    to verify that the document being processed is in fully normalized
    form, and report to the application whether it is or not."

    http://www.w3.org/TR/xml11/#sec-normalization-checking

    DOM Level 3 provides a "normalize-characters" option. Implementations
    are not required to support it:

    http://www.w3.org/TR/2004/REC-DOM-Level-3-LS-20040407/load-save.html#parameter-normalize-characters

    I want to add normalization to writexml() and toxml(), but I hope it
    can wait until after this patch lands.

    @jorend
    Copy link
    Mannequin Author

    jorend mannequin commented Apr 28, 2007

    Martin,

    This version of the patch moves name-checking into a separate module, "xml.names". But for now it is undocumented (intentionally).

    Also, DOMImplementation.hasFeature("XMLVersion", v) now returns True for v in ("1.0", "1.1"), and setting doc.xmlVersion to any other value raises NotSupportedErr. (Both these things are specified in DOM Level 3 Core under the Document.xmlVersion attribute.) Added a test for this, too.

    I didn't make any other changes. Let me know if I'm wrong about those other two things.

    File Added: minidom-level-1-compliance.4.patch

    @jorend
    Copy link
    Mannequin Author

    jorend mannequin commented May 1, 2007

    Martin, do I need to change anything else?

    @devdanzin devdanzin mannequin added type-feature A feature request or enhancement labels Mar 30, 2009
    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Jul 17, 2010

    Changed type to behaviour as I see many references to bugs, hence the changes to versions. This is a massive patch and will be a substancial amount of work if anyone wishes to take it on.

    @BreamoreBoy BreamoreBoy mannequin added type-bug An unexpected behavior, bug, or error and removed type-feature A feature request or enhancement labels Jul 17, 2010
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @iritkatriel iritkatriel added the stdlib Python modules in the Lib dir label Nov 22, 2023
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir topic-XML type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant