Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-97008: Add a Python implementation of AttributeError and NameError suggestions #97022

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ambv
Copy link
Contributor

@ambv ambv commented Sep 22, 2022

Relevant tests moved from test_exceptions to test_traceback to be able to compare both implementations.

Co-authored-by: Carl Friedrich Bolz-Tereick cfbolz@gmx.de

…eError suggestions

Relevant tests moved from test_exceptions to test_traceback to be able to
compare both implementations.

Co-authored-by: Carl Friedrich Bolz-Tereick <cfbolz@gmx.de>

# Both strings are the same (by identity)
if a == b:
return 0
Copy link
Contributor

@cfbolz cfbolz Sep 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not by identity in Python ;-). I think we can just fix the comment instead of switching to is

b = b[1:]
while a and b and a[-1] == b[-1]:
a = a[:-1]
b = b[:-1]
Copy link
Contributor

@cfbolz cfbolz Sep 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the slicing this is quadratic in the common affixes. we should compute the size of the pre/suffix and then do a single slice

class CPythonTracebackErrorCaretTests(
CAPIExceptionFormattingMixin,
TracebackErrorLocationCaretTests,
):
Copy link
Member

@iritkatriel iritkatriel Sep 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it can be a bit tidier if it would follow the pattern of PyExcReportingTests / CExcReportingTests where the c mixin doesn't subclass the python mixin.

# Instead of producing the whole traditional len(a)-by-len(b)
# matrix, we can update just one row in place.
# Initialize the buffer row
row = list(range(1, (_MOVE_COST * len(a)) + 1, _MOVE_COST))
Copy link
Member

@sweeneyde sweeneyde Sep 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we be filling this up with even numbers, not odd numbers?

list(range(_MOVE_COST, _MOVE_COST * (len(a) + 1), _MOVE_COST))

Copy link
Member

@sweeneyde sweeneyde Sep 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a failing test case:

>>> _MOVE_COST=2
>>> assert traceback._levenshtein_distance("ABA", "AAB", 1000) == 2*_MOVE_COST
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError

# to also exercise the Python implementation

def CHECK(a, b, expected):
actual = traceback._levenshtein_distance(a, b, 4044)
Copy link
Member

@sweeneyde sweeneyde Sep 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't currently exercise the max_distance short-circuit paths very much, whereas _testinternalcapi.test_edit_cost does.

Could we add that and then make this same test code test both implementations?

Copy link
Member

@sweeneyde sweeneyde Sep 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a rough draft of a possible randomized test case:

    def test_levenshtein_distance_random(self):
        from functools import cache
        from traceback import _substitution_cost, _MOVE_COST

        @cache
        def levenshtein(a, b):
            if not a or not b:
                return (len(a) + len(b)) * _MOVE_COST
            option1 = levenshtein(a[:-1], b[:-1]) + _substitution_cost(a[-1], b[-1])
            option2 = levenshtein(a[:-1], b) + _MOVE_COST
            option3 = levenshtein(a, b[:-1]) + _MOVE_COST
            return min(option1, option2, option3)

        from random import choices, randrange

        for _ in range(1000):
            a = ''.join(choices("abcABC", k=randrange(10)))
            b = ''.join(choices("abcABC", k=randrange(10)))
            expected = levenshtein(a, b)
            res1 = traceback._levenshtein_distance(a, b, 1000)
            self.assertEqual(res1, expected, msg=(a, b))

            for threshold in [expected, expected + 1, expected + 2]:
                # big enough thresholds shouldn't change the result
                res2 = traceback._levenshtein_distance(a, b, threshold)
                self.assertEqual(res2, expected, msg=(a, b, threshold))

            for threshold in range(expected):
                # for small thresholds, the only piece of information
                # we receive is "strings not close enough".
                res3 = traceback._levenshtein_distance(a, b, threshold)
                self.assertGreater(res3, threshold, msg=(a, b, threshold))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants