Skip to content

bpo-44149: Add key argument to difflib.get_close_matches() #26170

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

mustafaquraish
Copy link

@mustafaquraish mustafaquraish commented May 16, 2021

https://bugs.python.org/issue44149

This features allows you to specify a key function to extract the correct value from an element to be able to find close matches. Currently the only way to do it without re-implementing the function is to extract all the strings into a list, find the close matches and then once again go through the objects to find the corresponding ones.

A simple test case has also been added, and documentation within the file updated as well.

https://bugs.python.org/issue44149

@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

Recognized GitHub username

We couldn't find a bugs.python.org (b.p.o) account corresponding to the following GitHub usernames:

@mustafaquraish

This might be simply due to a missing "GitHub Name" entry in one's b.p.o account settings. This is necessary for legal reasons before we can look at this contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

Lib/difflib.py Outdated
@@ -710,11 +710,18 @@ def get_close_matches(word, possibilities, n=3, cutoff=0.6):
Optional arg cutoff (default 0.6) is a float in [0, 1]. Possibilities
that don't score at least that similar to word are ignored.

Optional arg key specifies a function of one argument that is used to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under the impression the trailing whitespace at this and the following line cause travis to complain.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thanks for pointing it out. I'll update this along with the other changes.

@DimitrisJim
Copy link
Contributor

The docs should definitely be updated for this. I also see a test for this specific function does not exist yet, this seems like a good chance to add one?

@mustafaquraish
Copy link
Author

mustafaquraish commented May 17, 2021

The docs should definitely be updated for this. I also see a test for this specific function does not exist yet, this seems like a good chance to add one?

@DimitrisJim , thanks for the feedback. I'm still new to the project, and would like to help out with this, so can you point me as to how to go about this? From what I see the documentation for this function looks like it has been auto-generated from the docstrings, and the tests (which I can run with ./python.exe -m test test_difflib) are just using the tests from the docstrings for each function. I have updated those to reflect the new argument as well as added a test for the key feature.

Is there a more official place to put these? I'm not 100% confident on the internal algorithms for difflib, so I'm not sure I would be the best one to be writing a full test suite for it (I wouldn't know how to properly check the edge cases with the cutoff parameter for instance). My development process for this PR was just finding every occurrence of the function inside the source with the updated version.

@DimitrisJim
Copy link
Contributor

Of course! Docs are relatively straight-forward to add, from what I remember, they aren't built automatically.

The Devguide section on this explains most of what you'll need to know. It basically boils down to opening the corresponding file for difflib in the Docs/lib folder (difflib.rst) and adding the docs you've added in the function docstring (looking at difflib.rst it does contain exactly the same text as in the function docstring, so I see how it looks as if they've been auto-generated).

For tests, usually you'd add a unit test to Lib/test/test_difflib.py but I'm thinking a reason might exist that this function hasn't been explicitly tested yet (maybe doctests suffice). The reviewer, who maintains the module, will probably have more to say on this.

@github-actions
Copy link

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label Jun 19, 2021
@mustafaquraish
Copy link
Author

@tim-one any chance you could review this anytime soon? It's a pretty minor change.

@github-actions github-actions bot removed the stale Stale PR or inactive for long period of time. label Jun 29, 2021
@MaxwellDupre
Copy link
Contributor

Please add to Docs as get_close_matches(word, possibilities, n=3, cutoff=0.6) needs changing. Also, please add version changed: 3.12 (is earliest).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants