Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: bpo-34990: year 2038 problem in compileall.py #9892

Open
wants to merge 12 commits into
base: master
from

Conversation

@matrixise
Copy link
Member

@matrixise matrixise commented Oct 15, 2018

Use 'L' as an unsigned long for the timestamp

https://bugs.python.org/issue34990

Use 'L' as an unsigned long for the timestamp
@bmwiedemann
Copy link
Contributor

@bmwiedemann bmwiedemann commented Oct 15, 2018

Lib/test/test_compileall.py also contains 2 instances of <4sll that probably should be updated accordingly.
and you could add the test code provided by xtreak in the bug.

@matrixise
Copy link
Member Author

@matrixise matrixise commented Oct 15, 2018

@bmwiedemann yep, I was in the plane from PySS and I just commited my code, and now, I just saw the test of @tirkarthi . I am going to include it.

matrixise added 2 commits Oct 15, 2018
…-01-20

Co-authored-by: Karthikeyan Singaravelan <tir.karthi@gmail.com>
@tirkarthi
Copy link
Member

@tirkarthi tirkarthi commented Oct 15, 2018

@matrixise Thanks for the PR. I think it's a NEWS entry worthy fix to add a short note that compileall now supports files with modification time greater than or equal to 19-01-2038 (I guess)

@tirkarthi
Copy link
Member

@tirkarthi tirkarthi commented Oct 15, 2018

@matrixise Sorry, I just noticed you have added one :)

@matrixise
Copy link
Member Author

@matrixise matrixise commented Oct 15, 2018

@tirkarthi please, could you check my PR with the last commits.

@bmwiedemann
Copy link
Contributor

@bmwiedemann bmwiedemann commented Oct 15, 2018

btw: the first bad mtime is 0x80000000 or 2147483648 or 2038-01-19T03:14:08+00:00

def test_compile_all_2038(self):
with open(self.source_path, 'r') as f:
os.utime(f.name, (2147558400, 2147558400)) # Jan 20, 2038 as touch
self.assertTrue(compileall.compile_file(pathlib.Path(self.source_path)))

This comment has been minimized.

@bmwiedemann

bmwiedemann Oct 15, 2018
Contributor

not sure, but maybe this is not a good place in this file to add the test in, because SOURCE_DATE_EPOCH changes the default .pyc type to be hash-based, which would not hit the error condition. See #9607

This comment has been minimized.

@matrixise

matrixise Oct 15, 2018
Author Member

I am not sure, just copy the code of @tirkarthi . I will wait for a review from @vstinner or an other core-dev.

This comment has been minimized.

@vstinner

vstinner Oct 17, 2018
Member

It should be moved to CompileallTestsBase.

This comment has been minimized.

@matrixise

matrixise Oct 17, 2018
Author Member

done

@tirkarthi
Copy link
Member

@tirkarthi tirkarthi commented Oct 15, 2018

@matrixise Thanks, Looks good to me from the context of the issue reported but I have limited understanding about the code so I hope someone else might have a better look. Seems there was some discussion about handling 2038 problem as part of the original PR where this code was added : #4575 (comment)

@bmwiedemann
Copy link
Contributor

@bmwiedemann bmwiedemann commented Oct 15, 2018

I think the discussion in PR 4575 was about future-proofing the new .pyc format, but it failed to notice that by 2038, only 31 bits are exhausted and bit 32 (which was erroneously interpreted as the sign-bit here), allows to keep the format alive until 2106 - which we can safely assume out-of-scope for today.

@matrixise
Copy link
Member Author

@matrixise matrixise commented Oct 15, 2018

This is the reason why I move to 'L' instead of 'l'

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Check also other parts of the stdlib that work with mtime in pyc files. For example zipimport and checkpyc.py need updates. Maybe there are other parts.

Lib/compileall.py Show resolved Hide resolved
@bedevere-bot
Copy link

@bedevere-bot bedevere-bot commented Oct 16, 2018

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

And if you don't make the requested changes, you will be poked with soft cushions!

@@ -139,7 +139,7 @@ def compile_file(fullname, ddir=None, force=False, rx=None, quiet=0,
if not force:
try:
mtime = int(os.stat(fullname).st_mtime)
expect = struct.pack('<4sll', importlib.util.MAGIC_NUMBER,
expect = struct.pack('<4slL', importlib.util.MAGIC_NUMBER,

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Oct 16, 2018
Member

Also, it would be better to handle the second field as an unsigned integer too.

This comment has been minimized.

@vstinner

vstinner Oct 16, 2018
Member

importlib uses int(mtime) & 0xFFFFFFFF, I suggest... int(mtime) & 0xFFFF_FFFF for readability :-D

@matrixise
Copy link
Member Author

@matrixise matrixise commented Oct 17, 2018

@serhiy-storchaka last update for Tools/scripts/cherrypyc.py by in June 2013, we only read the magic number and timestamp, just that: https://github.com/python/cpython/blob/master/Tools/scripts/checkpyc.py#L45-L46

In https://github.com/python/cpython/blob/master/Lib/zipimport.py#L602 you convert the dword as a uint32 and LGTM.

@matrixise matrixise requested a review from Oct 17, 2018
@matrixise
Copy link
Member Author

@matrixise matrixise commented Oct 17, 2018

@vstinner and @serhiy-storchaka I have fixed mtime with 0xFFFF_FFFF and use an unsigned long, where I found <4slL but for the other files, I am not an expert of the API and I don't know where I could search, zipimport and checkpy.py seem to be good.

@@ -51,7 +51,7 @@ def main():
print('Bad MAGIC word in ".pyc" file', end=' ')
print(repr(name_c))
continue
mtime = get_long(mtime_str)
mtime = get_long(mtime_str) & 0xFFFF_FFFF

This comment has been minimized.

@bmwiedemann

bmwiedemann Oct 17, 2018
Contributor

This is actually a NOP, because mtime_str is just 4 bytes, that cannot encode anything larger than 0xFFFF_FFFF

This comment has been minimized.

@matrixise

matrixise Oct 17, 2018
Author Member

thanks, I am not familiar with the operations on the bits.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

zipimport needs truncating the timestamp to 32 bit for comparison.

Also update _code_to_timestamp_pyc() in _bootstrap_external.

@@ -51,7 +51,7 @@ def main():
print('Bad MAGIC word in ".pyc" file', end=' ')
print(repr(name_c))
continue
mtime = get_long(mtime_str)
mtime = get_long(mtime_str) & 0xFFFF_FFFF
if mtime in {0, -1}:

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Oct 17, 2018
Member

mtime can't be -1 after adding & 0xFFFF_FFFF.

This comment has been minimized.

@matrixise

matrixise Oct 17, 2018
Author Member

maybe add the 0xFFFF_FFFF on the line 57?

This comment has been minimized.

@@ -40,7 +40,7 @@ def make_pyc(co, mtime, size):
else:
mtime = int(-0x100000000 + int(mtime))
pyc = (importlib.util.MAGIC_NUMBER +

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Oct 17, 2018
Member

I think the above few lines can be removed.

This comment has been minimized.

@matrixise

matrixise Oct 17, 2018
Author Member

the lines from 38->41?

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Oct 18, 2018
Member

Yes. They are roughly equivalent to mtime = int(mtime) & 0xFFFF_FFFF if use an unsigned integer format.

@@ -40,7 +40,7 @@ def make_pyc(co, mtime, size):
else:
mtime = int(-0x100000000 + int(mtime))
pyc = (importlib.util.MAGIC_NUMBER +
struct.pack("<iii", 0, int(mtime), size & 0xFFFFFFFF) + data)

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Oct 17, 2018
Member

Signed format units are used.

@@ -51,7 +51,7 @@ def main():
print('Bad MAGIC word in ".pyc" file', end=' ')
print(repr(name_c))
continue
mtime = get_long(mtime_str)
mtime = get_long(mtime_str) & 0xFFFF_FFFF

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Oct 17, 2018
Member

Sees the definition of get_long(). It returns an unsigned 32-bit integer, and returns -1 only on error. You need to convert not it, but the file timestamp.

@matrixise matrixise closed this Oct 17, 2018
@matrixise matrixise reopened this Oct 17, 2018
@@ -40,7 +40,7 @@ def make_pyc(co, mtime, size):
else:
mtime = int(-0x100000000 + int(mtime))
pyc = (importlib.util.MAGIC_NUMBER +
struct.pack("<iii", 0, int(mtime) & 0xFFFF_FFFF, size & 0xFFFF_FFFF) + data)
struct.pack("<iii", 0, int(mtime), size & 0xFFFF_FFFF) + data)

This comment has been minimized.

@bmwiedemann

bmwiedemann Oct 17, 2018
Contributor

why this revert?

This comment has been minimized.

@matrixise

matrixise Oct 17, 2018
Author Member

just because @serhiy-storchaka indicated that the signed was used here.

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Oct 18, 2018
Member

I meant that you need to update the format string.

@@ -586,7 +586,7 @@ def _code_to_timestamp_pyc(code, mtime=0, source_size=0):
"Produce the data for a timestamp-based pyc."
data = bytearray(MAGIC_NUMBER)
data.extend(_pack_uint32(0))
data.extend(_pack_uint32(mtime))
data.extend(_pack_uint32(mtime) & 0xFFFF_FFFF)

This comment has been minimized.

@vstinner

vstinner Oct 17, 2018
Member

This change is useless, _pack_uint32() already does "& 0xFFFFFFFF". importlib can be left unchanged.

@@ -53,8 +53,8 @@ def add_bad_source_file(self):
def timestamp_metadata(self):
with open(self.bc_path, 'rb') as file:
data = file.read(12)
mtime = int(os.stat(self.source_path).st_mtime)
compare = struct.pack('<4sll', importlib.util.MAGIC_NUMBER, 0, mtime)
mtime = int(os.stat(self.source_path).st_mtime) & 0xFFFF_FFF

This comment has been minimized.

@vstinner

vstinner Oct 17, 2018
Member

That's why I like the underscore: it's easier to spot bugs. You missed something :-)

def test_compile_all_2038(self):
with open(self.source_path, 'r') as f:
os.utime(f.name, (2147558400, 2147558400)) # Jan 20, 2038 as touch
self.assertTrue(compileall.compile_file(pathlib.Path(self.source_path)))

This comment has been minimized.

@vstinner

vstinner Oct 17, 2018
Member

It should be moved to CompileallTestsBase.

@@ -0,0 +1 @@
Use an unsigned long for the timestamp for the .pyc file in compileall.py

This comment has been minimized.

@vstinner

vstinner Oct 17, 2018
Member

You describe the fix, but you should describe the fixed bug instead: compileall now supports timestamps newer than year 2038.

This comment has been minimized.

@matrixise

matrixise Oct 17, 2018
Author Member

+1

@@ -54,7 +54,7 @@ def main():
mtime = get_long(mtime_str)
if mtime in {0, -1}:
print('Bad ".pyc" file', repr(name_c))
elif mtime != st[ST_MTIME]:

This comment has been minimized.

@vstinner

vstinner Oct 17, 2018
Member

I don't know if this code should be modified, but it is modified: should we also cast to int()?

This comment has been minimized.

@matrixise

matrixise Oct 17, 2018
Author Member

like that?

elif int(mtime) != int(st[ST_MTIME] & 0xFFFF_FFFF)

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Oct 18, 2018
Member

mtime is already int. But st[ST_MTIME] is float.

matrixise added 3 commits Oct 17, 2018
@matrixise
Copy link
Member Author

@matrixise matrixise commented Oct 17, 2018

@serhiy-storchaka can you continue the review with me, because I am a little bit like a chicken without head ;-) I try to understand with 0xFFFF_FFFF ;-)

matrixise added 2 commits Oct 17, 2018
_pack_uint32() already does "& 0xFFFFFFFF"
@@ -40,7 +40,7 @@ def make_pyc(co, mtime, size):
else:
mtime = int(-0x100000000 + int(mtime))
pyc = (importlib.util.MAGIC_NUMBER +
struct.pack("<iii", 0, int(mtime) & 0xFFFF_FFFF, size & 0xFFFF_FFFF) + data)
struct.pack("<iii", 0, int(mtime), size & 0xFFFF_FFFF) + data)

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Oct 18, 2018
Member

I meant that you need to update the format string.

@@ -51,7 +51,7 @@ def main():
print('Bad MAGIC word in ".pyc" file', end=' ')
print(repr(name_c))
continue
mtime = get_long(mtime_str)
mtime = get_long(mtime_str) & 0xFFFF_FFFF
if mtime in {0, -1}:

This comment has been minimized.

@@ -54,7 +54,7 @@ def main():
mtime = get_long(mtime_str)
if mtime in {0, -1}:
print('Bad ".pyc" file', repr(name_c))
elif mtime != st[ST_MTIME]:

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Oct 18, 2018
Member

mtime is already int. But st[ST_MTIME] is float.

@@ -0,0 +1 @@
compileall now supports timestamps newer than year 2038

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Oct 18, 2018
Member

Please use punctuation.

@@ -40,7 +40,7 @@ def make_pyc(co, mtime, size):
else:
mtime = int(-0x100000000 + int(mtime))
pyc = (importlib.util.MAGIC_NUMBER +

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Oct 18, 2018
Member

Yes. They are roughly equivalent to mtime = int(mtime) & 0xFFFF_FFFF if use an unsigned integer format.


class CompileallTestsWithSourceEpoch(CompileallTestsBase,
unittest.TestCase,
metaclass=SourceDateEpochTestMeta,
source_date_epoch=True):
pass


This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Oct 18, 2018
Member

Unrelated change.

@matrixise matrixise changed the title bpo-34990: year 2038 problem in compileall.py WIP: bpo-34990: year 2038 problem in compileall.py Oct 25, 2018
@brettcannon
Copy link
Member

@brettcannon brettcannon commented Nov 2, 2018

Due to the WIP in the title I added the DO-NOT-MERGE label.

@matrixise
Copy link
Member Author

@matrixise matrixise commented Nov 3, 2018

@brettcannon Thank you, I will continue to work on this issue on the next week, this week I am at the hospital with my daughter.

@bmwiedemann
Copy link
Contributor

@bmwiedemann bmwiedemann commented Jan 21, 2019

Only 19 years to go... Can we get this moving again?

@vstinner
Copy link
Member

@vstinner vstinner commented Jan 22, 2019

Ping @matrixise: they are still some comments of @serhiy-storchaka that you didn't address.

@matrixise
Copy link
Member Author

@matrixise matrixise commented Feb 4, 2019

@vstinner pong, ok thanks for the reminder.

@matrixise
Copy link
Member Author

@matrixise matrixise commented Mar 25, 2019

So, I am not serious with this bug, I am going to continue on this bug, I would like to close all my PRs.

@vstinner
Copy link
Member

@vstinner vstinner commented Mar 26, 2019

@vstinner pong, ok thanks for the reminder.

FYI I ignore this issue since it's still tagged as "WIP". I prefer to wait until it's no longer a WIP to review it.

@brettcannon brettcannon removed the request for review from Apr 17, 2019
@brettcannon
Copy link
Member

@brettcannon brettcannon commented Apr 17, 2019

Removing the import team from reviewing as this is still a WIP. We can be added back when it's ready to go.

@bmwiedemann
Copy link
Contributor

@bmwiedemann bmwiedemann commented Aug 22, 2019

@matrixise any chance to get this finished this year?

@vstinner
Copy link
Member

@vstinner vstinner commented Aug 22, 2019

@matrixise any chance to get this finished this year?

It's ok as soon as it's fixed before 2038 :-D

@bmwiedemann
Copy link
Contributor

@bmwiedemann bmwiedemann commented Aug 22, 2019

It's ok as soon as it's fixed before 2038 :-D

I am afraid there are way too many places where software is used unchanged for a long time. Even 18 years. And of course this is not the only timebomb waiting to trigger.

@vstinner
Copy link
Member

@vstinner vstinner commented Aug 22, 2019

It was a joke. Hint: see the emoticon.

@ammaraskar
Copy link
Member

@ammaraskar ammaraskar commented Apr 22, 2020

Since Stéphane said:

I would like to close all my PRs.

I opened up #19651 with the more permanent solution of using a 64-bit timestamp value as Benjamin originally suggested in their PEP 552 PR.

@ammaraskar
Copy link
Member

@ammaraskar ammaraskar commented Apr 25, 2020

I pinged @matrixise about this and they said I can feel free to take this on since they're too busy to move forward with it. #19708 is a continuation of this PR to make compileall use an unsigned int.

It should be possible to compare the code for both the solutions now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

9 participants