Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zipfile.testzip() using progressive file reads #45108

Closed
gradha mannequin opened this issue Jun 19, 2007 · 14 comments
Closed

zipfile.testzip() using progressive file reads #45108

gradha mannequin opened this issue Jun 19, 2007 · 14 comments
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir tests Tests in the Lib/test dir

Comments

@gradha
Copy link
Mannequin

gradha mannequin commented Jun 19, 2007

BPO 1739648
Nosy @pitrou, @serhiy-storchaka
Files
  • testzip_in_chunks.diff
  • zipfile_testzip_amcintyre.diff: updated patch
  • testzip-patch2.diff: Addition of test for ZipFile.testzip
  • testzip-patch3.diff: patch updated against python3 trunk
  • testzip-patch4.patch: Updated and reloaded for review
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2007-06-19.10:56:48.000>
    labels = ['tests', 'library', 'performance']
    title = 'zipfile.testzip() using progressive file reads'
    updated_at = <Date 2014-02-03.19:53:56.455>
    user = 'https://bugs.python.org/gradha'

    bugs.python.org fields:

    activity = <Date 2014-02-03.19:53:56.455>
    actor = 'BreamoreBoy'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)', 'Tests']
    creation = <Date 2007-06-19.10:56:48.000>
    creator = 'gradha'
    dependencies = []
    files = ['8057', '9057', '14847', '18533', '25402']
    hgrepos = []
    issue_num = 1739648
    keywords = ['patch']
    message_count = 12.0
    messages = ['52784', '55306', '59226', '87724', '92325', '111534', '111541', '113930', '143972', '159551', '159583', '176742']
    nosy_count = 4.0
    nosy_names = ['gradha', 'alanmcintyre', 'pitrou', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'languishing'
    superseder = None
    type = 'resource usage'
    url = 'https://bugs.python.org/issue1739648'
    versions = ['Python 2.7', 'Python 3.2', 'Python 3.3', 'Python 3.4']

    @gradha
    Copy link
    Mannequin Author

    gradha mannequin commented Jun 19, 2007

    testzip() still fails for big files. This patch implements reading small buffers. Also corrected main() where the return value of testzip() is ignored, hiding errors in zipfiles.

    @gradha gradha mannequin added stdlib Python modules in the Lib dir labels Jun 19, 2007
    @gradha
    Copy link
    Mannequin Author

    gradha mannequin commented Aug 26, 2007

    Any progress report on this issue, please?

    @alanmcintyre
    Copy link
    Mannequin

    alanmcintyre mannequin commented Jan 4, 2008

    This sort of change definitely needs to be made to avoid reading huge
    files into memory for testzip(). I've attached a modified patch as
    zipfile_testzip_amcintyre.diff; it didn't seem appropriate to have a
    module-level variable for the size of the test read, and I made a few
    other minor tweaks.

    @devdanzin devdanzin mannequin added performance Performance or resource usage labels May 13, 2009
    @alanmcintyre
    Copy link
    Mannequin

    alanmcintyre mannequin commented May 13, 2009

    It would appear that the change to testzip has already been made:

    http://mail.python.org/pipermail/python-checkins/2008-August/072892.html
    http://mail.python.org/pipermail/python-3000-checkins/2008-August/004310.html

    I can add some tests, though. I'll submit another patch shortly.

    @alanmcintyre
    Copy link
    Mannequin

    alanmcintyre mannequin commented Sep 6, 2009

    Attached is a patch that makes the zipfile module check the result of
    testzip when run as __main__ as well as in test_zipfile.py, and adds
    some tests in test_zipfile64.py. The changes to test_zipfile64.py
    increased its runtime from ~671 sec to ~1060 sec on my machine; I'm not
    sure if that's too much.

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Jul 25, 2010

    Tried the patch against 2.7 and got "RuntimeError: Attempt to read ZIP archive that was already closed" for TestsWithSourceFile test_deflated and test_stored. The patch needs updating for py3k.

    @alanmcintyre
    Copy link
    Mannequin

    alanmcintyre mannequin commented Jul 25, 2010

    Ok, I'll see if I can update that in the next week or so.

    @alanmcintyre
    Copy link
    Mannequin

    alanmcintyre mannequin commented Aug 14, 2010

    Here's an updated patch against the py3k trunk (testzip-patch3.diff) that passes all tests (including test_zipfile64) on Linux on a 64-bit machine.

    I can backport this to 2.x, but I'll wait until somebody indicates a need for it before I spend any time on it.

    @alanmcintyre
    Copy link
    Mannequin

    alanmcintyre mannequin commented Sep 13, 2011

    I re-checked testzip-patch3.diff since some time has passed since I last commented on it, and it still seems to work ok (the small test_zipfile.py block failed to apply, but that's easy enough to do manually). Passes full test run, test_zipfile64.py, etc., on Linux x86_64.

    If there is something about the patch that still needs to be addressed before it can be committed, let me know and I'll see what I can do.

    @alanmcintyre
    Copy link
    Mannequin

    alanmcintyre mannequin commented Apr 29, 2012

    I'd be glad to do some code reviews or something in exchange for the time of somebody with commit rights. :-) If anybody is interested in getting this change committed, please let me know and I'll check that the patch is still valid. Thanks!

    @alanmcintyre alanmcintyre mannequin added stale Stale PR or inactive for long period of time. labels Apr 29, 2012
    @serhiy-storchaka
    Copy link
    Member

    I'm not sure about new tests, but the changes in the docstring and old tests look necessary.

    @serhiy-storchaka
    Copy link
    Member

    I added comments in Rietveld.

    @serhiy-storchaka serhiy-storchaka added tests Tests in the Lib/test dir labels Dec 1, 2012
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @iritkatriel
    Copy link
    Member

    I think this was done in

    r65761 | antoine.pitrou | 2008-08-17 08:06:29 -0500 (Sun, 17 Aug 2008) | 3 lines

      fix ZipFile.testzip() to work with very large embedded files
    

    (commit 4cd6a95).

    @iritkatriel iritkatriel added the pending The issue will be closed if no feedback is provided label Aug 16, 2022
    @iritkatriel iritkatriel closed this as not planned Won't fix, can't repro, duplicate, stale Aug 23, 2022
    @iritkatriel iritkatriel moved this to Done in Zipfile issues Aug 23, 2022
    @serhiy-storchaka
    Copy link
    Member

    Yes, the change to testzip has already been made. as was said in #45108 (comment). But the issue was kept open for other patch, which added some tests and improved the docstring.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage stdlib Python modules in the Lib dir tests Tests in the Lib/test dir
    Projects
    Status: Done
    Development

    No branches or pull requests

    2 participants