Skip to content

bpo-33881: Use NFKC to find duplicate members in make_dataclass #7916

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

ValeriyaSinevich
Copy link
Contributor

@ValeriyaSinevich ValeriyaSinevich commented Jun 25, 2018

I measured how it affected the performance by creating a dataclass with 10000 members.
python.bat -m timeit -s "from dataclasses import make_dataclass" -s "arg_list = [chr(k) * i for k in range(97, 123) for i in range(1, 500)]" "make_dataclass('a', arg_list)"
The performance didn't change, both before and after the changes it takes around 14.5s.

https://bugs.python.org/issue33881

Copy link
Member

@ericvsmith ericvsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will require some tests. Can you add them?

I haven't had time to give a thorough review, but I will once there are some basic tests.

Thanks!

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@remilapeyre
Copy link
Contributor

Hi @ericvsmith, I wrote some tests for the PR and made a small change: normalize before checking for keywords has there was a bug there too.

commit 27258e9723293178143922d6a503ff6885c3b6a1
Author: Rémi Lapeyre <remi.lapeyre@henki.fr>
Date:   Thu Jan 3 01:17:48 2019 +0100

    Add tests for dataclasses members normalization

diff --git a/Lib/dataclasses.py b/Lib/dataclasses.py
index cdb9bf23ed..64ee5610d7 100644
--- a/Lib/dataclasses.py
+++ b/Lib/dataclasses.py
@@ -1120,9 +1120,9 @@ def make_dataclass(cls_name, fields, *, bases=(), namespace=None, init=True,
 
         if not isinstance(name, str) or not name.isidentifier():
             raise TypeError(f'Field names must be valid identifers: {name!r}')
-        if keyword.iskeyword(name):
-            raise TypeError(f'Field names must not be keywords: {name!r}')
         normalized_name = unicodedata.normalize('NFKC', name)
+        if keyword.iskeyword(normalized_name):
+            raise TypeError(f'Field names must not be keywords: {normalized_name!r}')
         if normalized_name in seen:
             raise TypeError(f'Field name duplicated: {normalized_name!r}')
 
diff --git a/Lib/test/test_dataclasses.py b/Lib/test/test_dataclasses.py
index d9556c7ff9..cb81de80da 100755
--- a/Lib/test/test_dataclasses.py
+++ b/Lib/test/test_dataclasses.py
@@ -2918,6 +2918,15 @@ class TestMakeDataclass(unittest.TestCase):
                 C = make_dataclass(classname, ['a', 'b'])
                 self.assertEqual(C.__name__, classname)
 
+    def test_normalize_members(self):
+        with self.assertRaisesRegexp(TypeError, "Field name duplicated: 'μ'"):
+            make_dataclass('a', ['\u00b5', '\u03bc'])
+
+    def test_normalized_keywords(self):
+        with self.assertRaisesRegexp(TypeError,
+                "Field names must not be keywords: 'assert'"):
+            make_dataclass('a', ['𝚊ssert'])
+
 class TestReplace(unittest.TestCase):
     def test(self):
         @dataclass(frozen=True)

@csabella
Copy link
Contributor

@ValeriyaSinevich, please make the changes requested in the code review and please fix the merge conflict. Thanks!

@ericvsmith
Copy link
Member

@ValeriyaSinevich : Any chance you can add the tests to the PR?

@csabella csabella added needs backport to 3.8 needs backport to 3.9 only security fixes stale Stale PR or inactive for long period of time. labels Jun 12, 2020
@csabella
Copy link
Contributor

I'm going to close this pull request as inactive. It can be reopened if the original author comes back to working on it or someone else can create a new PR. If a new PR is created and the original change is used, please credit the original author as co-author.

@csabella csabella closed this Jun 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting changes needs backport to 3.9 only security fixes stale Stale PR or inactive for long period of time.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants