-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
[2.7] bpo-30363: Backport warnings in the re module. #1577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[2.7] bpo-30363: Backport warnings in the re module. #1577
Conversation
Running Python with the -3 option now warns about regular expression syntax that is invalid or has different semantic in Python 3 or will change the behavior in future Python versions.
Lib/sre_parse.py
Outdated
if sys.py3kwarning and c in ASCIILETTERS: | ||
import warnings | ||
if c in 'Uu' and state.flags & SRE_FLAG_UNICODE: | ||
warnings.warn('unicode escape %s' % escape, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This warning is going to be shown mostly to people who did want a Unicode escape.
See e.g.: sphinx-doc/sphinx#2544 translate/translate#3449 amperser/proselint#672 python-babel/babel#472
Unfortunately, the warning message doesn't give much clue about what's wrong in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jwilk for finding all these bugs. Could you propose better error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe:
bad escape %s; Unicode escapes are supported only since Python 3.3
Lib/test/test_re.py
Outdated
self.assertEqual(re.sub('a', '\t\n\v\r\f\a', 'a'), '\t\n\v\r\f\a') | ||
self.assertEqual(re.sub('a', '\t\n\v\r\f\a', 'a'), | ||
(chr(9)+chr(10)+chr(11)+chr(13)+chr(12)+chr(7))) | ||
self.assertEqual(re.sub('a',r'\t\n\v\r\f\a\b','a'), '\t\n\v\r\f\a\b') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing space before r'\t\n\v\r\f\a\b' :-)
@@ -42,6 +42,10 @@ Extension Modules | |||
Library | |||
------- | |||
|
|||
- bpo-30363: Running Python with the -3 option now warns about regular | |||
expression syntax that is invalid or has different semantic in Python 3 | |||
or will change the behavior in future Python versions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might document the change in https://docs.python.org/2/whatsnew/2.7.html#porting-to-python-2-7
"or will change the behavior in future Python versions" is it possible to write code working on Python 2 and 3 that doesn't emit a warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't change the behavior. Just warnings are raised for suspicious regexpes in py3k compatible mode.
It is easy to write code working on Python 2 and 3 that doesn't emit a warning. In case of a bad escape, just remove a redundant backslash if the code is correct. But it is likely that the warning points to a bug (@jwilk have found a number of such bugs in third-party projects). If you use re.split()
with a pattern always matching an empty string (e.g. r'\b'
), it never worked, this is a bug. If you use re.split()
with a pattern that may match an empty string (e.g. r'\s*'
), you should change it to a pattern that doesn't match an empty string (r'\s+'
) for avoiding a warning.
Hehe, interesting, @ambv just told me yesterday that the most "annoying" change in Python 3.6 was the new warning on invalid escapes. In fact, the warning helped to find bugs in tests which just passed because a regex was matching anything! Thanks the backport @serhiy-storchaka. |
Oh by the way, the change LGTM once you replied to my questions ;-) |
I tracked issues with updating third-party Python projects on GitHub to Python 3.6 and confirm that the most often issue (and the easiest for fixing) was an issue with warnings on invalid escapes. But this is different kind of warnings. Warnings about invalid escapes in regex patterns were added in Python 3.5 (I didn't tracked updating to that version), now they are errors. Warnings about invalid escapes in string literals were added in Python 3.6. |
Running Python with the -3 option now warns about regular expression
syntax that is invalid or has different semantic in Python 3
or will change the behavior in future Python versions.