Skip to content

[2.7] bpo-30363: Backport warnings in the re module. #1577

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 18, 2017

Conversation

serhiy-storchaka
Copy link
Member

Running Python with the -3 option now warns about regular expression
syntax that is invalid or has different semantic in Python 3
or will change the behavior in future Python versions.

Running Python with the -3 option now warns about regular expression
syntax that is invalid or has different semantic in Python 3
or will change the behavior in future Python versions.
@serhiy-storchaka serhiy-storchaka added the type-feature A feature request or enhancement label May 14, 2017
Lib/sre_parse.py Outdated
if sys.py3kwarning and c in ASCIILETTERS:
import warnings
if c in 'Uu' and state.flags & SRE_FLAG_UNICODE:
warnings.warn('unicode escape %s' % escape,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warning is going to be shown mostly to people who did want a Unicode escape.
See e.g.: sphinx-doc/sphinx#2544 translate/translate#3449 amperser/proselint#672 python-babel/babel#472
Unfortunately, the warning message doesn't give much clue about what's wrong in this case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jwilk for finding all these bugs. Could you propose better error message?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe:

bad escape %s; Unicode escapes are supported only since Python 3.3

self.assertEqual(re.sub('a', '\t\n\v\r\f\a', 'a'), '\t\n\v\r\f\a')
self.assertEqual(re.sub('a', '\t\n\v\r\f\a', 'a'),
(chr(9)+chr(10)+chr(11)+chr(13)+chr(12)+chr(7)))
self.assertEqual(re.sub('a',r'\t\n\v\r\f\a\b','a'), '\t\n\v\r\f\a\b')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing space before r'\t\n\v\r\f\a\b' :-)

@@ -42,6 +42,10 @@ Extension Modules
Library
-------

- bpo-30363: Running Python with the -3 option now warns about regular
expression syntax that is invalid or has different semantic in Python 3
or will change the behavior in future Python versions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might document the change in https://docs.python.org/2/whatsnew/2.7.html#porting-to-python-2-7

"or will change the behavior in future Python versions" is it possible to write code working on Python 2 and 3 that doesn't emit a warning?

Copy link
Member Author

@serhiy-storchaka serhiy-storchaka May 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't change the behavior. Just warnings are raised for suspicious regexpes in py3k compatible mode.

It is easy to write code working on Python 2 and 3 that doesn't emit a warning. In case of a bad escape, just remove a redundant backslash if the code is correct. But it is likely that the warning points to a bug (@jwilk have found a number of such bugs in third-party projects). If you use re.split() with a pattern always matching an empty string (e.g. r'\b'), it never worked, this is a bug. If you use re.split() with a pattern that may match an empty string (e.g. r'\s*'), you should change it to a pattern that doesn't match an empty string (r'\s+') for avoiding a warning.

@serhiy-storchaka serhiy-storchaka merged commit 955b676 into python:2.7 May 18, 2017
@serhiy-storchaka serhiy-storchaka deleted the re-py3k-warnings branch May 18, 2017 09:34
@serhiy-storchaka
Copy link
Member Author

Thank you @jwilk and @Haypo for your reviews.

@vstinner
Copy link
Member

(@jwilk have found a number of such bugs in third-party projects)

Hehe, interesting, @ambv just told me yesterday that the most "annoying" change in Python 3.6 was the new warning on invalid escapes. In fact, the warning helped to find bugs in tests which just passed because a regex was matching anything!

Thanks the backport @serhiy-storchaka.

@vstinner
Copy link
Member

Oh by the way, the change LGTM once you replied to my questions ;-)

@serhiy-storchaka
Copy link
Member Author

I tracked issues with updating third-party Python projects on GitHub to Python 3.6 and confirm that the most often issue (and the easiest for fixing) was an issue with warnings on invalid escapes.

But this is different kind of warnings. Warnings about invalid escapes in regex patterns were added in Python 3.5 (I didn't tracked updating to that version), now they are errors. Warnings about invalid escapes in string literals were added in Python 3.6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-feature A feature request or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants