New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-97008: Add a Python implementation of AttributeError and NameError suggestions #97022
base: main
Are you sure you want to change the base?
Conversation
…eError suggestions Relevant tests moved from test_exceptions to test_traceback to be able to compare both implementations. Co-authored-by: Carl Friedrich Bolz-Tereick <cfbolz@gmx.de>
|
||
# Both strings are the same (by identity) | ||
if a == b: | ||
return 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not by identity in Python ;-). I think we can just fix the comment instead of switching to is
b = b[1:] | ||
while a and b and a[-1] == b[-1]: | ||
a = a[:-1] | ||
b = b[:-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with the slicing this is quadratic in the common affixes. we should compute the size of the pre/suffix and then do a single slice
class CPythonTracebackErrorCaretTests( | ||
CAPIExceptionFormattingMixin, | ||
TracebackErrorLocationCaretTests, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it can be a bit tidier if it would follow the pattern of PyExcReportingTests / CExcReportingTests where the c mixin doesn't subclass the python mixin.
# Instead of producing the whole traditional len(a)-by-len(b) | ||
# matrix, we can update just one row in place. | ||
# Initialize the buffer row | ||
row = list(range(1, (_MOVE_COST * len(a)) + 1, _MOVE_COST)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we be filling this up with even numbers, not odd numbers?
list(range(_MOVE_COST, _MOVE_COST * (len(a) + 1), _MOVE_COST))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a failing test case:
>>> _MOVE_COST=2
>>> assert traceback._levenshtein_distance("ABA", "AAB", 1000) == 2*_MOVE_COST
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AssertionError
# to also exercise the Python implementation | ||
|
||
def CHECK(a, b, expected): | ||
actual = traceback._levenshtein_distance(a, b, 4044) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't currently exercise the max_distance short-circuit paths very much, whereas _testinternalcapi.test_edit_cost does.
Could we add that and then make this same test code test both implementations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a rough draft of a possible randomized test case:
def test_levenshtein_distance_random(self):
from functools import cache
from traceback import _substitution_cost, _MOVE_COST
@cache
def levenshtein(a, b):
if not a or not b:
return (len(a) + len(b)) * _MOVE_COST
option1 = levenshtein(a[:-1], b[:-1]) + _substitution_cost(a[-1], b[-1])
option2 = levenshtein(a[:-1], b) + _MOVE_COST
option3 = levenshtein(a, b[:-1]) + _MOVE_COST
return min(option1, option2, option3)
from random import choices, randrange
for _ in range(1000):
a = ''.join(choices("abcABC", k=randrange(10)))
b = ''.join(choices("abcABC", k=randrange(10)))
expected = levenshtein(a, b)
res1 = traceback._levenshtein_distance(a, b, 1000)
self.assertEqual(res1, expected, msg=(a, b))
for threshold in [expected, expected + 1, expected + 2]:
# big enough thresholds shouldn't change the result
res2 = traceback._levenshtein_distance(a, b, threshold)
self.assertEqual(res2, expected, msg=(a, b, threshold))
for threshold in range(expected):
# for small thresholds, the only piece of information
# we receive is "strings not close enough".
res3 = traceback._levenshtein_distance(a, b, threshold)
self.assertGreater(res3, threshold, msg=(a, b, threshold))
Relevant tests moved from test_exceptions to test_traceback to be able to compare both implementations.
Co-authored-by: Carl Friedrich Bolz-Tereick cfbolz@gmx.de