Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gzip module writes file with bad CRC when saving large files #100260

Open
thomasf1 opened this issue Dec 15, 2022 · 0 comments
Open

gzip module writes file with bad CRC when saving large files #100260

thomasf1 opened this issue Dec 15, 2022 · 0 comments
Labels
type-bug An unexpected behavior, bug, or error

Comments

@thomasf1
Copy link

thomasf1 commented Dec 15, 2022

Bug report

When trying to write a large amount of data (2.5 GB uncompressed, 250 MB compressed) with the gzip library, the CRC WRITTEN seems to be off. Smaller sizes of data did work fine. When reading the file with gzip.open it throws a gzip.BadGzipFile, with any other program, it basically says the file is corrupt. When circumventing the CRC check, the file unzips fine.

import ejson
import gzip

# users.json is about 2.5 GB
with open('users.json', 'r', encoding='utf-8') as file:
	contents = ejson.loads(file.read())

# the resulting file is about 250 MB big which seems right and decompresses fine when suppressing the CRC check
with gzip.open('users_compressed.json.gz', 'w') as file:
	file.write(ejson.dumps(contents).encode('utf-8'))

opening the newly written File '', I get the following: gzip.BadGzipFile: CRC check failed.

Trying to only put in half the data with the following seems to work, too:

import ejson
import gzip

with open('users.json', 'r', encoding='utf-8') as file:
	contents = ejson.loads(file.read())

# Produces a file with about 250 MB that has a bad CRC
with gzip.open('users_compressed.json.gz', 'w') as file:
	file.write(ejson.dumps(dict(list(contents.items()))).encode('utf-8'))

# Produces a file with about 125 MB that opens fine
with gzip.open('users_compressed_1.json.gz', 'w') as file:
	file.write(ejson.dumps(dict(list(contents.items())[len(contents)//2:])).encode('utf-8'))

# Produces a file with about 125 MB that opens fine
with gzip.open('users_compressed_2.json.gz', 'w') as file:
	file.write(ejson.dumps(dict(list(contents.items())[:len(contents)//2])).encode('utf-8'))

environment

Python 3.10.8 on a M1 mac (macos 12.6)

Not sure how to debug it further at this point. Has anyone had similar problems?

@thomasf1 thomasf1 added the type-bug An unexpected behavior, bug, or error label Dec 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

1 participant