-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
gh-69117: Stop checking for valid month and day in strptime regex #26215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Checking for legitimate month and day numbers in strptime's regex is unneeded, since invalid values will raise an error anyway during datetime conversion anyway, and that error will be more informative than the somewhat cryptic error the regex is currently throwing. See https://bugs.python.org/issue24929 Original patch by Steve Yeung
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the original bpo-24929, it seems like the feedback there was that this change doesn't actually make sense, because time.strptime
doesn't do any further validation (and thus this would improperly allow things like time.strptime("0", "%d")
. From manually testing it, though, it seems that time.strptime
now has a stage that passes through a datetime.date
or datetime.datetime
constructor for some reason, so this is technically safe to do.
That said, strptime
is shockingly slow, and I would not be surprised if we wanted to refactor to avoid constructing at datetime.date
as part of time.strptime
, which would cause problems for us.
I don't see any tests that obviously cover the case of time.strptime("2021-04-33", "%Y-%m-%d")
or time.strptime("2021-13-01", "%Y-%m%d")
. Regardless of whether or not we do the validation in the regex, we should add a few test cases that ensure that we don't loosen the validation here (just assertRaises
, not assertRaisesRegex
).
Unfortunately the logic is a bit complicated, and moving the validation step from very early on (i.e. in the regex) to almost the end seems like it significantly increases the potential for missing some pathway that evades the validation we're doing, so I don't know if I'm ever going to feel particularly easy about relying on the date
/datetime
constructor for validation...
@@ -183,13 +183,13 @@ def __init__(self, locale_time=None): | |||
base = super() | |||
base.__init__({ | |||
# The " [1-9]" part of the regex is to make %c from ANSI C work |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems to indicate that %c
might break if we don't allow for a space before a single digit, so I think we actually want (?P<d>\d{2}| {0,1}\d)"
or (?P<d>\d{1,2}| \d)
.
That said, datetime.strptime(datetime(2021, 1, 1).strftime("%c"), "%c")
seems to work OK with this change, so 🤷, maybe we just need to remove this comment? Chesterton's Fence indicates that we should probably try and figure out whether something has changed between when this comment was written and now.
|
||
strptime = self.theclass.strptime | ||
|
||
self.assertRaisesRegex(ValueError, "month must be in 1..12", strptime, '0', '%m') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general rule I don't like assertRaisesRegex
because I don't think the specific error message raised should be part of the public interface. That said, I don't see a better way to test this.
Can we add a comment saying something like, "These tests are regression tests to ensure we have a good error message. We do not intend to guarantee any one specific error message."
Edit: ...On second thought, this might be more of a problem than I originally thought, because this is actually simply propagating the error from the datetime.datetime
constructor. This feels quite fragile in a way that I would usually not be super happy with.
I'd love any suggestions on how we can test this without relying on that particular implementation detail (I am very much open to the possibility of just not putting a test for this at all).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea behind the original ticket was basically, "the regex is giving a really opaque error, and if we simply rely on the datetime constructor instead, we will get a much more informative error"
but if the problem is "we're throwing an opaque error", maybe I can instead make a more useful error at the regex level.
strptime = self.theclass.strptime | ||
|
||
self.assertRaisesRegex(ValueError, "month must be in 1..12", strptime, '0', '%m') | ||
strptime('10', '%m') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably this is already covered by another test somewhere and is not necessary here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely not required - I meant for clarity line them up in a (too low, error) - (just right, no error) - (too high, error) triplet - can remove
@@ -2611,6 +2611,19 @@ def test_strptime(self): | |||
with self.assertRaises(ValueError): strptime("-000", "%z") | |||
with self.assertRaises(ValueError): strptime("z", "%z") | |||
|
|||
def test_strptime_detailed_error(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few things about these tests:
- I realize that the rest of this module uses this same style of "a bunch of asserts in a single file", but I think most of them were added before
subTest
was added to the standard library. I think it would be preferable to use subtests here, since failure in any one of the tests doesn't mean a prerequisite for the other tests would also fail. - I really prefer the context manager style of
assertRaises
, since I think it's much easier to read.
If it were me, I'd probably put these tests in a loop, like so:
bad_month_re = "month must be in 1..12"
bad_day_re = "day is out of range for month"
tests = [
("0", "%m", bad_month_re),
("13", "%m", bad_month_re),
("0", "%d", bad_day_re),
("32", "%d", bad_day_re),
]
for dt_str, fmt_str, error_re in tests:
with self.subTest(dt_str, fmt_str):
with self.assertRaisesRegex(error_re):
self.theclass.strptime(dt_str, fmt_str)
The use of a for loop for parameterization is something of a personal choice, though, feel free to skip that part. I find it a bit nicer because you can easily and cleanly add cases, but every control structure in a test is a possible source of error, so it's a bit of a toss-up.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
This PR is stale because it has been open for 30 days with no activity. |
Checking for legitimate month and day numbers in strptime's regex is unneeded,
since invalid values will raise an error anyway during datetime conversion
anyway, and that error will be more informative than the somewhat cryptic
error the regex is currently throwing.
See https://bugs.python.org/issue24929
Original patch by Steve Yeung
https://bugs.python.org/issue24929