New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-88339: enable fast seeking of uncompressed unencrypted zipfile.ZipExtFile #27737
Conversation
Does the test_zipfile.py
unittest explicitly cover this case of seeking within a ZIP_STORED entry? If so, mention that and drop a link to where in a comment here.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
bc54dce
to
3c76400
Compare
How about adding an explicit seek
+ read
test case to Lib/test/test_zipfile.py
's StoredTestsWithSourceFile
class that checks that the data read matches what is expected at the seek'ed to offset?
I don't get why this test fails. Do you maybe have any idea? |
Will try to do, thanks! |
There is a unittest explicitly covering this case. The one which fails. |
3c76400
to
518d886
Compare
You are right! It's that unit test. I added one assert to it. There were problems with read buffer which I fixed now. I rebased and added Please run CI and tell me if I'm still missing something. ;) BTW. Thanks for your work guys! |
Previous tests passed (some asyncio on windows x64 failed, but it's probably not my fault). |
750d8ec
to
6e4a3c7
Compare
Can I ask for review please? EDIT: Just rebased my branch. |
6e4a3c7
to
0c4a434
Compare
0c4a434
to
293792e
Compare
I have made the requested changes; please review again (test is already here, I added some tell asserts) |
Thanks for making the requested changes! @gpshead: please review the changes made to this pull request. |
c9417aa
to
e891ae4
Compare
Oh and I again forgot to tag you guys: @gpshead, @serhiy-storchaka |
I have made the requested changes; please review again |
Thanks for making the requested changes! @gpshead: please review the changes made to this pull request. |
8f28620
to
cf56601
Compare
Guys, what's wrong. It's over a year here now, I fixed everything you mentioned. Can I do something more to make this faster? |
I feel like this could use more explicit testing around all possible seek edge cases, but that is true in general of the zipfile module and not specific to this PR. It appears to do the right thing and the existing tests that hopefully represent the more likely usages continue to pass.
If I haven't merged this within a few days, please ping me again. I'm giving Serhiy some time to respond if desired. |
@gpshead ;) Also I've a question - merging it now to main branch means this changes will be available from python 3.12, not 3.11? |
@gpshead ;) |
This reverts commit 13438487cb14a9947f3fb79faefd61c6ea0123ca.
cf56601
to
420ae62
Compare
ping @gpshead |
ping @gpshead |
bump @gpshead |
yep, this is in 3.12. the feature freeze window for a release closes upon the first beta. |
thanks for the improvement! |
At the moment stored ZipExtFile is being read to the place of seek like all other compressed variants.
It's not needed as it's possible to freely seek uncompressed file inside zip without this penalty.
Lots of apps depend on ZipExtFile seeking ability and this patch would increase performance and lower IO penalty significantly.
It disables CRC checking after first seek as it's impossible to check CRC if we are not reading whole file.
I've been using patched zipfile for almost a year now and I can say it works good ;)
https://bugs.python.org/issue44173 / #88339
redo of #26227 after a little fail