gh-69117: Stop checking for valid month and day in strptime regex #26215

catherinedevlin · 2021-05-18T19:08:10Z

Checking for legitimate month and day numbers in strptime's regex is unneeded,
since invalid values will raise an error anyway during datetime conversion
anyway, and that error will be more informative than the somewhat cryptic
error the regex is currently throwing.

See https://bugs.python.org/issue24929
Original patch by Steve Yeung

https://bugs.python.org/issue24929

Issue: _strptime.TimeRE should not enforce range in regex #69117

Checking for legitimate month and day numbers in strptime's regex is unneeded, since invalid values will raise an error anyway during datetime conversion anyway, and that error will be more informative than the somewhat cryptic error the regex is currently throwing. See https://bugs.python.org/issue24929 Original patch by Steve Yeung

pganssle

In the original bpo-24929, it seems like the feedback there was that this change doesn't actually make sense, because time.strptime doesn't do any further validation (and thus this would improperly allow things like time.strptime("0", "%d"). From manually testing it, though, it seems that time.strptime now has a stage that passes through a datetime.date or datetime.datetime constructor for some reason, so this is technically safe to do.

That said, strptime is shockingly slow, and I would not be surprised if we wanted to refactor to avoid constructing at datetime.date as part of time.strptime, which would cause problems for us.

I don't see any tests that obviously cover the case of time.strptime("2021-04-33", "%Y-%m-%d") or time.strptime("2021-13-01", "%Y-%m%d"). Regardless of whether or not we do the validation in the regex, we should add a few test cases that ensure that we don't loosen the validation here (just assertRaises, not assertRaisesRegex).

Unfortunately the logic is a bit complicated, and moving the validation step from very early on (i.e. in the regex) to almost the end seems like it significantly increases the potential for missing some pathway that evades the validation we're doing, so I don't know if I'm ever going to feel particularly easy about relying on the date/datetime constructor for validation...

pganssle · 2021-05-18T19:29:55Z

Lib/_strptime.py

@@ -183,13 +183,13 @@ def __init__(self, locale_time=None):
        base = super()
        base.__init__({
            # The " [1-9]" part of the regex is to make %c from ANSI C work


This comment seems to indicate that %c might break if we don't allow for a space before a single digit, so I think we actually want (?P<d>\d{2}| {0,1}\d)" or (?P<d>\d{1,2}| \d).

That said, datetime.strptime(datetime(2021, 1, 1).strftime("%c"), "%c") seems to work OK with this change, so 🤷, maybe we just need to remove this comment? Chesterton's Fence indicates that we should probably try and figure out whether something has changed between when this comment was written and now.

pganssle · 2021-05-18T19:34:21Z

Lib/test/datetimetester.py

+
+        strptime = self.theclass.strptime
+
+        self.assertRaisesRegex(ValueError, "month must be in 1..12", strptime, '0', '%m')


As a general rule I don't like assertRaisesRegex because I don't think the specific error message raised should be part of the public interface. That said, I don't see a better way to test this.

Can we add a comment saying something like, "These tests are regression tests to ensure we have a good error message. We do not intend to guarantee any one specific error message."

Edit: ...On second thought, this might be more of a problem than I originally thought, because this is actually simply propagating the error from the datetime.datetime constructor. This feels quite fragile in a way that I would usually not be super happy with.

I'd love any suggestions on how we can test this without relying on that particular implementation detail (I am very much open to the possibility of just not putting a test for this at all).

The idea behind the original ticket was basically, "the regex is giving a really opaque error, and if we simply rely on the datetime constructor instead, we will get a much more informative error"

but if the problem is "we're throwing an opaque error", maybe I can instead make a more useful error at the regex level.

pganssle · 2021-05-18T19:34:54Z

Lib/test/datetimetester.py

+        strptime = self.theclass.strptime
+
+        self.assertRaisesRegex(ValueError, "month must be in 1..12", strptime, '0', '%m')
+        strptime('10', '%m')


Presumably this is already covered by another test somewhere and is not necessary here?

Definitely not required - I meant for clarity line them up in a (too low, error) - (just right, no error) - (too high, error) triplet - can remove

pganssle · 2021-05-18T19:44:30Z

Lib/test/datetimetester.py

@@ -2611,6 +2611,19 @@ def test_strptime(self):
        with self.assertRaises(ValueError): strptime("-000", "%z")
        with self.assertRaises(ValueError): strptime("z", "%z")

+    def test_strptime_detailed_error(self):


A few things about these tests:

I realize that the rest of this module uses this same style of "a bunch of asserts in a single file", but I think most of them were added before subTest was added to the standard library. I think it would be preferable to use subtests here, since failure in any one of the tests doesn't mean a prerequisite for the other tests would also fail.

I really prefer the context manager style of assertRaises, since I think it's much easier to read.

If it were me, I'd probably put these tests in a loop, like so:

bad_month_re = "month must be in 1..12" bad_day_re = "day is out of range for month" tests = [ ("0", "%m", bad_month_re), ("13", "%m", bad_month_re), ("0", "%d", bad_day_re), ("32", "%d", bad_day_re), ] for dt_str, fmt_str, error_re in tests: with self.subTest(dt_str, fmt_str): with self.assertRaisesRegex(error_re): self.theclass.strptime(dt_str, fmt_str)

The use of a for loop for parameterization is something of a personal choice, though, feel free to skip that part. I find it a bit nicer because you can easily and cleanly add cases, but every control structure in a test is a possible source of error, so it's a bit of a toss-up.

bedevere-bot · 2021-05-18T20:02:46Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

github-actions · 2021-06-18T00:04:08Z

This PR is stale because it has been open for 30 days with no activity.

catherinedevlin requested review from abalkin and pganssle as code owners May 18, 2021 19:08

the-knights-who-say-ni added the CLA signed label May 18, 2021

bedevere-bot added the awaiting review label May 18, 2021

📜🤖 Added by blurb_it.

7724244

pganssle requested changes May 18, 2021

View reviewed changes

bedevere-bot removed the awaiting review label May 18, 2021

bedevere-bot added the awaiting changes label May 18, 2021

github-actions bot added the stale Stale PR or inactive for long period of time. label Jun 18, 2021

SteveYeung mannequin mentioned this pull request Apr 10, 2022

_strptime.TimeRE should not enforce range in regex #69117

Open

ezio-melotti removed the CLA signed label Jul 13, 2022

github-actions bot removed the stale Stale PR or inactive for long period of time. label Aug 8, 2022

erlend-aasland changed the title ~~bpo-24929: Stop checking for valid month and day in strptime regex~~ gh-69117: Stop checking for valid month and day in strptime regex Jan 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-69117: Stop checking for valid month and day in strptime regex #26215

gh-69117: Stop checking for valid month and day in strptime regex #26215

Uh oh!

catherinedevlin commented May 18, 2021 •

edited by bedevere-app bot

Loading

Uh oh!

pganssle left a comment

Uh oh!

pganssle May 18, 2021

Uh oh!

pganssle May 18, 2021

Uh oh!

catherinedevlin May 18, 2021

Uh oh!

pganssle May 18, 2021

Uh oh!

catherinedevlin May 18, 2021

Uh oh!

pganssle May 18, 2021

Uh oh!

bedevere-bot commented May 18, 2021

Uh oh!

github-actions bot commented Jun 18, 2021

Uh oh!

Uh oh!


		strptime = self.theclass.strptime

		self.assertRaisesRegex(ValueError, "month must be in 1..12", strptime, '0', '%m')

Uh oh!

gh-69117: Stop checking for valid month and day in strptime regex #26215

Are you sure you want to change the base?

gh-69117: Stop checking for valid month and day in strptime regex #26215

Uh oh!

Conversation

catherinedevlin commented May 18, 2021 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pganssle left a comment

Choose a reason for hiding this comment

Uh oh!

pganssle May 18, 2021

Choose a reason for hiding this comment

Uh oh!

pganssle May 18, 2021

Choose a reason for hiding this comment

Uh oh!

catherinedevlin May 18, 2021

Choose a reason for hiding this comment

Uh oh!

pganssle May 18, 2021

Choose a reason for hiding this comment

Uh oh!

catherinedevlin May 18, 2021

Choose a reason for hiding this comment

Uh oh!

pganssle May 18, 2021

Choose a reason for hiding this comment

Uh oh!

bedevere-bot commented May 18, 2021

Uh oh!

github-actions bot commented Jun 18, 2021

Uh oh!

Uh oh!

catherinedevlin commented May 18, 2021 •

edited by bedevere-app bot

Loading