Skip to content

uuid.uuid3 and uuid.uuid5 cannot be used with non-UTF-8 names #97856

Closed as not planned
@thehatmakesbling

Description

@thehatmakesbling

Bug report

Consider a name space fc48656f-2196-4866-ad70-0cf68bf80146 which defines a name as the concatenation of the byte representation of two or more UUIDs.

$ python3 -c 'import uuid; print(repr(uuid.uuid5(namespace = uuid.UUID("fc48656f-2196-4866-ad70-0cf68bf80146"), name = uuid.UUID("61695a88-5a35-48b0-b8f6-89c2c5a77aa8").bytes + uuid.UUID("b1467ea8-0e1c-4e11-9185-e2eaaafc6270").bytes)))'

This raises "TypeError: encoding without a string argument" due to the call to bytes(name, "utf-8"), and is a regression from Python 2, which handles this case correctly:

$ python2 -c 'import uuid; print repr(uuid.uuid5(namespace = uuid.UUID("fc48656f-2196-4866-ad70-0cf68bf80146"), name = uuid.UUID("61695a88-5a35-48b0-b8f6-89c2c5a77aa8").bytes + uuid.UUID("b1467ea8-0e1c-4e11-9185-e2eaaafc6270").bytes))'
UUID('ab74c285-20ad-583e-978a-e26b99ef7c9b')

The current implementation makes it impossible to use these functions with a name that cannot be decoded as a valid UTF-8 string. RFC 4122 makes it clear in section 4.3 that this restriction should not be imposed:

"The concept of name and name space should be broadly construed, and not limited to textual names."

It goes on to state that the name space may define how the name is converted to bytes, leaving the developer completely out of luck if the name space has been defined by someone else:

"Convert the name to a canonical sequence of octets (as defined by the standards or conventions of its name space)"

This is reinforced by the reference implementation, which takes void * and a length as arguments, rather than any string type, and by the definition of an X.500 DN name space that allows DER-encoded names, which also cannot be guaranteed to be representable as UTF-8. The availability of an X.500 DN name space allowing DER-encoded names is also repeated in the uuid module documentation.

Your environment

I have encountered this bug in the following Python versions:

Python 3.10.5 (tags/v3.10.5:f377153, Jun  6 2022, 16:14:13) [MSC v.1929 64 bit (AMD64)] on win32
Python 3.7.2 (tags/v3.7.2:9a3ffc0492, Dec 23 2018, 23:09:28) [MSC v.1916 64 bit (AMD64)] on win32
Python 3.6.6 (v3.6.6:4cf1f54eb7, Jun 27 2018, 03:37:03) [MSC v.1900 64 bit (AMD64)] on win32
Python 3.5.3 (default, Apr  5 2021, 09:00:41) [GCC 6.3.0 20170516] on linux

And it appears to be present in the python/cpython GitHub repository as of 2022-10-04.

Metadata

Metadata

Assignees

No one assigned

    Labels

    type-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions