Skip to content

GzipFile should rewind fileobj to starting position #105426

Open
@mdruiter

Description

@mdruiter

I have files with a small header (8 bytes, say zrxxxxxx), followed by a gzipped stream of data. Reading such files works fine most of the time. However in very specific cases, seeking backwards fails. This is a simple way to reproduce:

from gzip import GzipFile

f = open('test.bin', 'rb')
f.read(8)  # Read zrxxxxxx

h = GzipFile(fileobj=f, mode='rb')
h.seek(8192)
h.seek(8191)  # gzip.BadGzipFile: Not a gzipped file (b'zr')

Unfortunately I cannot share my files, but it looks like any similar file will do.

Debugging the situation, I noticed that DecompressReader.seek (in Lib/_compression.py) sometimes rewinds the original file, which I suspect causes the issue:

#...
# Rewind the file to the beginning of the data stream.
def _rewind(self):
    self._fp.seek(0)
    #...

def seek(self, offset, whence=io.SEEK_SET):
    #...
    # Make it so that offset is the number of bytes to skip forward.
    if offset < self._pos:
        self._rewind()
    else:
        offset -= self._pos
    #...

Apparently GzipFile does want to support seeking regular files, even though it isn't fast. I think it can, except that it should rewind the file to the starting position, not 0!

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibPython modules in the Lib dirtype-bugAn unexpected behavior, bug, or error

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions