WIP: bpo-1100942: Add datetime.time.strptime and datetime.date.strptime #5578
Conversation
This PR is for a very long issue, since 2005. We have a PR in 2018 |
I restarted the travis job. It still did not do the full CPython test suite. So please rebase :) |
Thanks, I didn't see your message, works on this issue today. |
@Mariatta rebased and the tests pass on the CIs |
the number of microseconds based on the input string and the | ||
format string.""" | ||
format string, and the GMT offset.""" |
pganssle
May 15, 2018
•
Member
We should probably consistently use "UTC offset", though I suppose it doesn't matter much since it's not a public-facing docstring.
matrixise
Oct 25, 2018
Author
Member
@pganssle I am not really confident with the offsets and the datetime. Do you think we could keep it like that and propose an other bpo ?
vstinner
Mar 4, 2019
Member
Yeah, just replace "GMT" with "UTC". datetime.tzinfo has a utcoffset() method, so "UTC" is preferred in datetime.
(And there are some subtle differences between GMT and UTC that I forgot.)
@@ -565,6 +566,10 @@ def _strptime(data_string, format="%a %b %d %H:%M:%S %Y"): | |||
hour, minute, second, | |||
weekday, julian, tz, tzname, gmtoff), fraction, gmtoff_fraction | |||
|
|||
date_specs = ('%a', '%A', '%b', '%B', '%c', '%d', '%j', '%m', '%U', |
pganssle
May 15, 2018
Member
I think this is missing %G
, %u
and %V
, the ISO 8601 week calendar directives.
@@ -565,6 +566,10 @@ def _strptime(data_string, format="%a %b %d %H:%M:%S %Y"): | |||
hour, minute, second, | |||
weekday, julian, tz, tzname, gmtoff), fraction, gmtoff_fraction | |||
|
|||
date_specs = ('%a', '%A', '%b', '%B', '%c', '%d', '%j', '%m', '%U', | |||
'%w', '%W', '%x', '%y', '%Y',) | |||
time_specs = ('%T', '%R', '%H', '%I', '%M', '%S', '%f', '%i', '%s',) |
pganssle
May 15, 2018
Member
I don't see any references to %T
, %R
, %i
or %s
in the docs or elsewhere in the code. What do these represent?
_time = _strptime_datetime(datetime_datetime, data_string, format) | ||
return _time.time() | ||
|
||
def _check_invalid_datetime_specs(fmt, specs, msg): |
pganssle
May 15, 2018
Member
If I'm understanding this correctly, this function seems to have the inverted sense of what I would expect. It seems that it checks if fmt
is a valid spec.
From the names of the variables and functions, I was thinking that this would be a whitelist not a blacklist. Does it make sense to switch to a whitelist approach? If not, can we maybe change specs
to be blacklist_specs
or something?
@@ -565,6 +566,10 @@ def _strptime(data_string, format="%a %b %d %H:%M:%S %Y"): | |||
hour, minute, second, | |||
weekday, julian, tz, tzname, gmtoff), fraction, gmtoff_fraction | |||
|
|||
date_specs = ('%a', '%A', '%b', '%B', '%c', '%d', '%j', '%m', '%U', | |||
'%w', '%W', '%x', '%y', '%Y',) | |||
time_specs = ('%T', '%R', '%H', '%I', '%M', '%S', '%f', '%i', '%s',) |
pganssle
May 15, 2018
Member
I don't have strong opinions about this, but my intuition is that these should be set
or frozenset
rather than tuple
. Are these tuples for performance reasons (I am not sure I know when exactly it's faster to use a set for "lookup membership" rather than a tuple or list).
def _strptime_datetime_date(data_string, format): | ||
"""Return a date based on the input string and the format string.""" | ||
if not format: | ||
raise ValueError("Date format is not valid.") |
pganssle
May 15, 2018
Member
This line and its equivalent in _time
are not being hit. If I understand correctly this branch is only hit if format
is empty?
tests = [('2004-12-01 13:02:47.197', | ||
'%Y-%m-%d %H:%M:%S.%f'), | ||
('2004-12-01', '%Y-%m-%d'),] | ||
for date_string, date_format in tests: |
pganssle
May 15, 2018
Member
I think it would be good to use with self.subTest
for these parametrized tests.
Also, per the other comment I guess you need to add something like (
'12:30:15', '')` to get full coverage.
pganssle
Nov 4, 2018
Member
I think you need two more test cases:
('1900-01-01 12:30', '%Y-%m-%d %H:%M'),
('12:30:15', ''),
date.strptime
has similarly missing tests.
matrixise
Nov 11, 2018
Author
Member
the test with ('1900-01-01 12:30', '%Y-%m-%d %H:%M')
does not raise an exception but returns datetime.time(12, 30)
For the other test, yep, there is an issue.
pganssle
Nov 11, 2018
Member
The fact that it doesn't raise an exception is an issue. I'm a bit surprised that it doesn't raise an exception on pure Python, that's a bug, because I'm pretty sure that:
datetime.time.strptime("1901-01-01 12:30", "%Y-%m-%d %H:%M")
does raise an exception.
*/ | ||
if (emptyDatetime == NULL) { | ||
PyObject *emptyStringPair = Py_BuildValue("ss", "", ""); | ||
if (emptyStringPair == NULL) |
pganssle
May 15, 2018
Member
If I'm interpreting PEP 7 correctly, these if
statements need curly braces. The relevant section is Code layout.
Hi @pganssle thank you for your review, I am going to fix it asap but I am not the author of the code, just the author of the PR. so, maybe I would need your help. Thanks |
@pganssle I just rebased my branch with master. I am going to work on this PR. Do you want to help me because you are mister dateutil ;-) |
@matrixise Sorry this is on my list but probably can't get to it until the end of the month. |
@pganssle ok, in this case, I will try to fix all the issues alone ;-) but I am not worried ;-) |
Hi, I just updated this PR with master. |
@pganssle when you have time, could you review this PR, we started together, just comment when you find a mistake, thanks |
ping @pganssle ;-) |
I haven't had a chance to look at the PR thoroughly, but I have a first-pass review of things that should be changed. I also haven't checked exactly how you are doing it, but I believe we may want to / be able to refactor the tests a bit to take advantage of the existing test suite for Additionally, I should note that it's unfortunate that if this is merged, we'll have |
Return a :class:`date` corresponding to *date_string*, parsed according to | ||
*format*. :exc:`ValueError` is raised if the date string and format can't be | ||
parsed by `time.strptime`, or if it returns a value where the time part is | ||
nonzero. |
pganssle
Nov 4, 2018
•
Member
Here and below, you have nonzero
instead of non-zero
. If we keep this wording, the hyphen needs to be added.
However, this PR does not parse to a timetuple
or something and check if certain components were zero, it checks to see if the format string contains time
components, and fails in that case (Edit: I was looking at just the pure python implementation - I now realize that this is precisely what the C implementation is doing, but I think it's the wrong thing to do anyway). The way the docs are currently worded, you would expect this to work:
from datetime import date
date.strptime("2018-01-01 00:00", "%Y-%m-%d %H:%M")
But it will fail (rightly so, I think).
I believe you can change the last part of the last sentence as such:
- parsed by `time.strptime`, or if it returns a value where the time part is
- nonzero.
+ parsed by `time.strptime`, or if time components are present in the format string.
Also, I think it needs to be
:meth:`time.strptime`
pganssle
Nov 4, 2018
Member
One other note on the documentation, the datetime.strptime
documentation links to
:ref:`strftime-strptime-behavior`.
I think these should too.
{ | ||
return new_date(GET_YEAR(self), | ||
GET_MONTH(self), | ||
GET_DAY(self)); | ||
} | ||
|
||
static PyObject * | ||
datetime_gettime(PyDateTime_DateTime *self, PyObject *Py_UNUSED(ignored)) |
pganssle
Nov 4, 2018
Member
I don't know why this PyObject *Py_UNUSED(ignored)
is here, so per Chesterton's Fence, I'm not comfortable removing it. Any insight as to why it is here and what the consequences will be in removing it?
Possibly it will be a breaking change in the C ABI?
pganssle
Nov 4, 2018
•
Member
According to git blame
, this is to silence a warning in gcc8
. Is this still relevant @siddhesh @serhiy-storchaka?
Edit: Looking closer, the merged PR is from April, so I'm guessing it is, but now I'm wondering if this will break the C ABI the other way.
serhiy-storchaka
Nov 5, 2018
Member
Yes, datetime_gettime
should have two arguments, the second is ignored.
return NULL; | ||
} | ||
|
||
if (DATE_GET_HOUR(datetime) || |
pganssle
Nov 4, 2018
Member
Hm, if I understand correctly, this is inconsistent with the pure Python version. The pure Python version checks if there are any time components in the format string, whereas the C version parses to a datetime and then checks to see if there are any time components.
I think checking the format string is the better way to do this, as I mention in another comment, this approach seems to indicate that date.strptime("2018-01-01 00:00", "%Y-%m-%d %H:%M")
would succeed, which is not the right thing to do. I will comment on the test suite to add a test for this.
matrixise
Nov 11, 2018
Author
Member
ok, I will check the Python implementation, but in this case, we could migrate the Python implementation to the C layer?
matrixise
Nov 11, 2018
Author
Member
lol ;-) I am not an expert with the C-API, but I could try, it's a good exercise for my comprehension of the C-API of Python
vadmium
Mar 23, 2019
Member
Why does it have to be implemented in C? There are already calls out to the _strptime Python-language module here.
matrixise
Mar 24, 2019
Author
Member
not sure, but because there is an other test in C for the date, and maybe we could put the verification process in only one function. @vstinner do you confirm?
tests = [('2004-12-01 13:02:47.197', | ||
'%Y-%m-%d %H:%M:%S.%f'), | ||
('2004-12-01', '%Y-%m-%d'),] | ||
for date_string, date_format in tests: |
pganssle
Nov 4, 2018
Member
I think you need two more test cases:
('1900-01-01 12:30', '%Y-%m-%d %H:%M'),
('12:30:15', ''),
date.strptime
has similarly missing tests.
tests = [ | ||
('2004-12-01 13:02:47.197', '%Y-%m-%d %H:%M:%S.%f'), | ||
('01', '%M'), | ||
('02', '%H'), |
pganssle
Nov 4, 2018
Member
This needs at least two more test cases:
('2018-01-01 00:00', '%Y-%m-%d %H:%M'),
('2018-01-01', ''),
Both should fail.
@pganssle thanks for your review, I am going to update this PR asap. |
ok, rebased with the last master. |
ok, I will check and try to fix that, but because I don't know this part
and the main patch was not mine, that will be difficult.
but okay for the fix. thanks for the review.
|
@pganssle Are you ready for a new review of this PR? I will continue my PR ;-) |
@matrixise Yes sorry about that, I will put it on my list for the next few days, possibly this weekend (though I have an event I'm mentoring at that may take a lot of my time). |
The C implementation is wrong. date.strptime() cannot simply check if time is 00:00, since ("%H:%M", "00:00") would defeat the test. You have to have a similar implementation in C and Python: reject invalid formats. |
*format*. :exc:`ValueError` is raised if the date string and format can't be | ||
parsed by :meth:`time.strptime`, or if time components are present in the | ||
format string. | ||
|
vstinner
Mar 4, 2019
Member
Can you please add something like "For a complete list of formatting directives, see strftime() and strptime() Behavior." with a link? (copy the sentence from datetime.datetime.strptime) Just from this doc, I have no idea of what is the expected format.
Return a :class:`time` corresponding to *date_string, parsed according to | ||
*format*. :exc:`ValueError` is raised if the date string and format can't be | ||
parsed by :meth:`time.strptime`, if it returns a value which isn't a time tuple, | ||
or if the date part is nonzero. |
vstinner
Mar 4, 2019
Member
Ditto: can you please add "For a complete list of formatting directives, see strftime() and strptime() Behavior."?
the number of microseconds based on the input string and the | ||
format string.""" | ||
format string, and the GMT offset.""" |
vstinner
Mar 4, 2019
Member
Yeah, just replace "GMT" with "UTC". datetime.tzinfo has a utcoffset() method, so "UTC" is preferred in datetime.
(And there are some subtle differences between GMT and UTC that I forgot.)
if spec in fmt: | ||
found_invalid_specs.append(spec) | ||
if found_invalid_specs: | ||
suffix = "are" if len(found_invalid_specs) > 1 else "is" |
vstinner
Mar 4, 2019
Member
nitpick: English is not my first language, so I'm not sure, but the plural rule should be len() != 1, no? ... it's just a theorical question, in practical len cannot be zero :-D Sorry.
pganssle
Mar 5, 2019
Member
@vstinner As a native speaker, I think you are right:
"There are zero apples"
"There is one apple"
"There are two apples"
Though in this case we can just get rid of the "suffix" concept entirely, because the message is:
msg = "'{!s}' {} not valid in date format specification.`
It's fine to just say "x, y and z not found in date format specification" in an error message.
@@ -0,0 +1 @@ | |||
Add datetime.date.strptime and datetime.time.strptime class methods. |
vstinner
Mar 4, 2019
Member
To get links in the HTML rendered Changelog:
Add datetime.date.strptime and datetime.time.strptime class methods. | |
Add :func:`datetime.date.strptime` and :func:`datetime.time.strptime` class methods. |
vstinner
Mar 4, 2019
Member
Can you please add an entry in Doc/whatsnew/3.8.rst as well? IHMO it deserves to be documented there.
@@ -2896,6 +2900,36 @@ date_fromordinal(PyObject *cls, PyObject *args) | |||
return result; | |||
} | |||
|
|||
|
|||
/* Return new date from time.strptime(). */ |
PyObject *datetime; | ||
|
||
datetime = datetime_strptime((PyObject *)&PyDateTime_DateTimeType, args); | ||
|
return NULL; | ||
} | ||
|
||
if (DATE_GET_HOUR(datetime) || |
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
@vstinner I started to rewrite the |
Add datetime.date.strptime and datetime.time.strptime. Fix the documentation of _strptime._strptime, the documentation was wrong, return a 3-tuple and not a 2-tuple Co-authored-by: Alexander Belopolsky <alexander.belopolsky@gmail.com> Co-authored-by: Amaury Forgeot d'Arc <amauryfa@gmail.com> Co-authored-by: Berker Peksag <berker.peksag@gmail.com> Co-authored-by: Josh-sf <josh-sf@users.sourceforge.net> Co-authored-by: Juarez Bochi <jbochi@gmail.com> Co-authored-by: Maciej Szulik <soltysh@gmail.com> Co-authored-by: Stéphane Wirtel <stephane@wirtel.be> Co-authored-by: Matheus Vieira Portela <matheus.v.portela@gmail.com>
I suggest you check the discussion about the previous patches. I noticed I repeated some old comments and ideas. |
Return a :class:`date` corresponding to *date_string*, parsed according to | ||
*format*. :exc:`ValueError` is raised if the date string and format can't be | ||
parsed by :meth:`time.strptime`, or if time components are present in the | ||
format string. For a complete list of formatting directives, see |
vadmium
Mar 23, 2019
Member
Perhaps clarify “time.strptime” does not refer to the strptime function in the time module. The reader may not see the meth RST code, HTML links, etc.
Return a :class:`time` corresponding to *date_string, parsed according to | ||
*format*. :exc:`ValueError` is raised if the date string and format can't be | ||
parsed by :meth:`time.strptime`, if it returns a value which isn't a time | ||
tuple, or if the date part is nonzero. For a complete list of formatting |
vadmium
Mar 23, 2019
Member
“By time.strptime” is redundant, unless you mean to refer to the strptime function in the time module.
What does “the date part is nonzero” mean? I would expect the time class to work without specifying a date, zero or otherwise.
is substituted for the year, and ``1`` for the month and day. | ||
The :meth:`date.strptime` class method creates a :class:`date` object from a | ||
string representing a date and a corresponding format string. :exc:`ValueError` | ||
raised if the format codes for hours, minutes, seconds, and microseconds are used. |
vadmium
Mar 23, 2019
Member
ValueError is raised . . .
And microseconds should be or microseconds, unless you must combine all four codes to get the error.
But wouldn’t these details be better placed directly under the date and time classes, not in this common section about format codes in general?
matrixise
Mar 24, 2019
Author
Member
I have fixed the grammar issues, but for the details I don't know.
@@ -2023,13 +2046,13 @@ equivalent to ``datetime(*(time.strptime(date_string, format)[0:6]))``, except | |||
when the format includes sub-second components or timezone offset information, | |||
which are supported in ``datetime.strptime`` but are discarded by ``time.strptime``. |
vadmium
Mar 23, 2019
Member
Need to clarify time.strptime refers to the time module, not your new strptime method.
|
||
.. classmethod:: time.strptime(date_string, format) | ||
|
||
Return a :class:`time` corresponding to *date_string, parsed according to |
datetime | ||
-------- | ||
|
||
Added :func:`~datetime.date.strptime` and :func:`~datetime.time.strptime`. |
vadmium
Mar 23, 2019
Member
I expect you lose the reference to date and time, so all you are saying is you added two functions with the same name repeated.
matrixise
Mar 24, 2019
Author
Member
@vadmium yep, here, we just add the class methods datetime.date.strptime
and datetime.time.strptime
return NULL; | ||
} | ||
|
||
assert(PyTuple_CheckExact(specs) || PyList_CheckExact(specs)); |
matrixise
Mar 24, 2019
Author
Member
the first version was only for the tuples. After, I wanted to accept the list for my tests. but I am not against to only accept the tuples.
return NULL; | ||
} | ||
|
||
if (DATE_GET_HOUR(datetime) || |
vadmium
Mar 23, 2019
Member
Why does it have to be implemented in C? There are already calls out to the _strptime Python-language module here.
"""Return a date based on the input string and the format string.""" | ||
msg = "'{!s}' {} not valid in date format specification." | ||
from _datetime import _check_invalid_datetime_specs | ||
if _check_invalid_datetime_specs(format, time_specs, msg): |
vadmium
Mar 23, 2019
Member
Why not move the msg string into the check function, and just pass the bit that varies ('date' or 'time') as an argument? It would make the code easier to read.
Looks like the check function either raises an exception or returns True. It would be clearer to not use an if statement here.
matrixise
Mar 24, 2019
Author
Member
at the beginning, this PR was a submitted patch by other contributors, I just wanted to convert it to a PR. and now, I try to fix all the issues.
@matrixise, @pganssle There's been a lot of work done on this one. Is it something we should try to move forward on? Thanks! |
Add datetime.date.strptime and datetime.time.strptime.
Fix the documentation of _strptime._strptime, the documentation was
wrong, return a 3-tuple and not a 2-tuple
Co-authored-by: Alexander Belopolsky alexander.belopolsky@gmail.com
Co-authored-by: Amaury Forgeot d'Arc amauryfa@gmail.com
Co-authored-by: Berker Peksag berker.peksag@gmail.com
Co-authored-by: Josh-sf josh-sf@users.sourceforge.net
Co-authored-by: Juarez Bochi jbochi@gmail.com
Co-authored-by: Maciej Szulik soltysh@gmail.com
Co-authored-by: Stéphane Wirtel stephane@wirtel.be
Co-authored-by: Matheus Vieira Portela matheus.v.portela@gmail.com
https://bugs.python.org/issue1100942