New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unicodedata: is_normalized claims nothing is normalized in any form when using the 3.2.0 database #101372
Labels
Comments
corona10
added a commit
to corona10/cpython
that referenced
this issue
Jan 28, 2023
corona10
added a commit
that referenced
this issue
Feb 6, 2023
miss-islington
pushed a commit
to miss-islington/cpython
that referenced
this issue
Feb 6, 2023
… UCD 3… (pythongh-101388) (cherry picked from commit 9ef7e75) Co-authored-by: Dong-hee Na <donghee.na@python.org>
miss-islington
pushed a commit
to miss-islington/cpython
that referenced
this issue
Feb 6, 2023
… UCD 3… (pythongh-101388) (cherry picked from commit 9ef7e75) Co-authored-by: Dong-hee Na <donghee.na@python.org>
Thanks for the report and the fix! Serhiy mentioned he wanted to write tests here #101388 (comment) so leaving this issue open |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Bug report
3.8 adds the
.is_normalized
function to theunicodedata
module, which also is available as a method on the legacyunicodedata.ucd_3_2_0
database. It is supposed to check whether a string is equal to its normalization in a given form, but without having to normalize and compare.However, the legacy version does not maintain the expected invariant. In fact, it reports that every single-character string is not normalized, regardless of the normalization form chosen. Presumably, the result is the same for every non-empty string. (It appears that the empty string works because it is special-cased at line 871-874.)
Example:
The bug appears to be at line 801-804 of unicodedata.c:
I believe the
NO
should sayMAYBE
instead. TheNO
value appears to indicate that the quickcheck has determined that the string is not normalized - contrary to both the comment and expected behaviour.Your environment
Linked PRs
The text was updated successfully, but these errors were encountered: