Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add symbol b'\xe2\x80\x93' to punctuation symbols #18244

Open
wants to merge 1 commit into
base: master
from

Conversation

@sergei3000
Copy link

sergei3000 commented Jan 29, 2020

This symbol looks very similar to b'-', and isn't matched when using string.punctuation as reference

This symbol looks very similar to b'-', and isn't matched when using string.punctuation as reference
@the-knights-who-say-ni

This comment has been minimized.

Copy link

the-knights-who-say-ni commented Jan 29, 2020

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

Recognized GitHub username

We couldn't find a bugs.python.org (b.p.o) account corresponding to the following GitHub usernames:

@sergei3000

This might be simply due to a missing "GitHub Name" entry in one's b.p.o account settings. This is necessary for legal reasons before we can look at this contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

@csabella

This comment has been minimized.

Copy link
Contributor

csabella commented Jan 29, 2020

Please open a ticket on bugs.python.org for this issue and add the bpo number to the pull request title. Thank you!

@RPigott

This comment has been minimized.

Copy link

RPigott commented Jan 30, 2020

string.punctuation excludes many characters in the unicode punctuation classes. I don't think it's meant to be comprehensive. For that reason, adding just U+2013 'EN DASH' makes little sense.

If you need to robustly match all punctuation characters, use the unicode category from unicodedata.category, or use an alternative regex module that implements this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.