When trying to write a large amount of data (2.5 GB uncompressed, 250 MB compressed) with the gzip library, the CRC WRITTEN seems to be off. Smaller sizes of data did work fine. When reading the file with gzip.open it throws a gzip.BadGzipFile, with any other program, it basically says the file is corrupt. When circumventing the CRC check, the file unzips fine.
importejsonimportgzip# users.json is about 2.5 GBwithopen('users.json', 'r', encoding='utf-8') asfile:
contents=ejson.loads(file.read())
# the resulting file is about 250 MB big which seems right and decompresses fine when suppressing the CRC checkwithgzip.open('users_compressed.json.gz', 'w') asfile:
file.write(ejson.dumps(contents).encode('utf-8'))
opening the newly written File '', I get the following: gzip.BadGzipFile: CRC check failed.
Trying to only put in half the data with the following seems to work, too:
importejsonimportgzipwithopen('users.json', 'r', encoding='utf-8') asfile:
contents=ejson.loads(file.read())
# Produces a file with about 250 MB that has a bad CRCwithgzip.open('users_compressed.json.gz', 'w') asfile:
file.write(ejson.dumps(dict(list(contents.items()))).encode('utf-8'))
# Produces a file with about 125 MB that opens finewithgzip.open('users_compressed_1.json.gz', 'w') asfile:
file.write(ejson.dumps(dict(list(contents.items())[len(contents)//2:])).encode('utf-8'))
# Produces a file with about 125 MB that opens finewithgzip.open('users_compressed_2.json.gz', 'w') asfile:
file.write(ejson.dumps(dict(list(contents.items())[:len(contents)//2])).encode('utf-8'))
environment
Python 3.10.8 on a M1 mac (macos 12.6)
Not sure how to debug it further at this point. Has anyone had similar problems?
The text was updated successfully, but these errors were encountered:
thomasf1 commentedDec 15, 2022
•
edited
Bug report
When trying to write a large amount of data (2.5 GB uncompressed, 250 MB compressed) with the gzip library, the CRC WRITTEN seems to be off. Smaller sizes of data did work fine. When reading the file with gzip.open it throws a gzip.BadGzipFile, with any other program, it basically says the file is corrupt. When circumventing the CRC check, the file unzips fine.
opening the newly written File '', I get the following: gzip.BadGzipFile: CRC check failed.
Trying to only put in half the data with the following seems to work, too:
environment
Python 3.10.8 on a M1 mac (macos 12.6)
Not sure how to debug it further at this point. Has anyone had similar problems?
The text was updated successfully, but these errors were encountered: