Skip to content

gh-69117: Stop checking for valid month and day in strptime regex #26215

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

catherinedevlin
Copy link
Contributor

@catherinedevlin catherinedevlin commented May 18, 2021

Checking for legitimate month and day numbers in strptime's regex is unneeded,
since invalid values will raise an error anyway during datetime conversion
anyway, and that error will be more informative than the somewhat cryptic
error the regex is currently throwing.

See https://bugs.python.org/issue24929
Original patch by Steve Yeung

https://bugs.python.org/issue24929

Checking for legitimate month and day numbers in strptime's regex is unneeded,
since invalid values will raise an error anyway during datetime conversion
anyway, and that error will be more informative than the somewhat cryptic
error the regex is currently throwing.

See https://bugs.python.org/issue24929
Original patch by Steve Yeung
Copy link
Member

@pganssle pganssle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the original bpo-24929, it seems like the feedback there was that this change doesn't actually make sense, because time.strptime doesn't do any further validation (and thus this would improperly allow things like time.strptime("0", "%d"). From manually testing it, though, it seems that time.strptime now has a stage that passes through a datetime.date or datetime.datetime constructor for some reason, so this is technically safe to do.

That said, strptime is shockingly slow, and I would not be surprised if we wanted to refactor to avoid constructing at datetime.date as part of time.strptime, which would cause problems for us.

I don't see any tests that obviously cover the case of time.strptime("2021-04-33", "%Y-%m-%d") or time.strptime("2021-13-01", "%Y-%m%d"). Regardless of whether or not we do the validation in the regex, we should add a few test cases that ensure that we don't loosen the validation here (just assertRaises, not assertRaisesRegex).

Unfortunately the logic is a bit complicated, and moving the validation step from very early on (i.e. in the regex) to almost the end seems like it significantly increases the potential for missing some pathway that evades the validation we're doing, so I don't know if I'm ever going to feel particularly easy about relying on the date/datetime constructor for validation...

@@ -183,13 +183,13 @@ def __init__(self, locale_time=None):
base = super()
base.__init__({
# The " [1-9]" part of the regex is to make %c from ANSI C work
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems to indicate that %c might break if we don't allow for a space before a single digit, so I think we actually want (?P<d>\d{2}| {0,1}\d)" or (?P<d>\d{1,2}| \d).

That said, datetime.strptime(datetime(2021, 1, 1).strftime("%c"), "%c") seems to work OK with this change, so 🤷, maybe we just need to remove this comment? Chesterton's Fence indicates that we should probably try and figure out whether something has changed between when this comment was written and now.


strptime = self.theclass.strptime

self.assertRaisesRegex(ValueError, "month must be in 1..12", strptime, '0', '%m')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general rule I don't like assertRaisesRegex because I don't think the specific error message raised should be part of the public interface. That said, I don't see a better way to test this.

Can we add a comment saying something like, "These tests are regression tests to ensure we have a good error message. We do not intend to guarantee any one specific error message."

Edit: ...On second thought, this might be more of a problem than I originally thought, because this is actually simply propagating the error from the datetime.datetime constructor. This feels quite fragile in a way that I would usually not be super happy with.

I'd love any suggestions on how we can test this without relying on that particular implementation detail (I am very much open to the possibility of just not putting a test for this at all).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea behind the original ticket was basically, "the regex is giving a really opaque error, and if we simply rely on the datetime constructor instead, we will get a much more informative error"

but if the problem is "we're throwing an opaque error", maybe I can instead make a more useful error at the regex level.

strptime = self.theclass.strptime

self.assertRaisesRegex(ValueError, "month must be in 1..12", strptime, '0', '%m')
strptime('10', '%m')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably this is already covered by another test somewhere and is not necessary here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely not required - I meant for clarity line them up in a (too low, error) - (just right, no error) - (too high, error) triplet - can remove

@@ -2611,6 +2611,19 @@ def test_strptime(self):
with self.assertRaises(ValueError): strptime("-000", "%z")
with self.assertRaises(ValueError): strptime("z", "%z")

def test_strptime_detailed_error(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things about these tests:

  1. I realize that the rest of this module uses this same style of "a bunch of asserts in a single file", but I think most of them were added before subTest was added to the standard library. I think it would be preferable to use subtests here, since failure in any one of the tests doesn't mean a prerequisite for the other tests would also fail.
  2. I really prefer the context manager style of assertRaises, since I think it's much easier to read.

If it were me, I'd probably put these tests in a loop, like so:

bad_month_re = "month must be in 1..12"
bad_day_re = "day is out of range for month"

tests = [
    ("0", "%m", bad_month_re),
    ("13", "%m", bad_month_re),
    ("0", "%d", bad_day_re),
    ("32", "%d", bad_day_re),
]

for dt_str, fmt_str, error_re in tests:
    with self.subTest(dt_str, fmt_str):
        with self.assertRaisesRegex(error_re):
            self.theclass.strptime(dt_str, fmt_str)

The use of a for loop for parameterization is something of a personal choice, though, feel free to skip that part. I find it a bit nicer because you can easily and cleanly add cases, but every control structure in a test is a possible source of error, so it's a bit of a toss-up.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@github-actions
Copy link

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label Jun 18, 2021
@github-actions github-actions bot removed the stale Stale PR or inactive for long period of time. label Aug 8, 2022
@erlend-aasland erlend-aasland changed the title bpo-24929: Stop checking for valid month and day in strptime regex gh-69117: Stop checking for valid month and day in strptime regex Jan 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants